Planet Python

Subscribe to Planet Python feed
Planet Python -
Updated: 21 hours 2 min ago

Real Python: Using Astropy for Astronomy With Python

Tue, 2024-08-27 10:00

This course covers two problems from introductory astronomy to help you play with some Python libraries. You’ll use Astropy, NumPy, Matplotlib, and pandas to find planet conjunctions, and graph the best viewing times for a star.

In this course you’ll learn about:

  • Astronomy concepts of conjunction and optimal viewing
  • The Python package Astropy
  • Using pandas to process data
  • Building graphs with Matplotlib
  • Python’s warning module

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Armin Ronacher: MiniJinja: Learnings from Building a Template Engine in Rust

Mon, 2024-08-26 20:00

Given that I can't stop creating template engines, I figured I might write a bit about my learnings of creating MiniJinja which is an implementation of my Jinja2 template engine for Rust. Disclaimer: this post might be a bit more technical.

There is a good chance you have come across Jinja2 templates before as they became quite common place in various places over the years. They look a bit like this:

{% extends "layout.html" %} {% block body %} <p>Hello {{ name }}!</p> {% endblock %}

If you want to play around it yourself, here are some links:

Why MiniJinja?

Maybe we start with the initial question of why I wrote MiniJinja. It's the year 2024 and people don't create a ton of HTML with server side rendered template engines any more. While there is some resurgence of that model thanks to HTMX, hotwire and livewire, I personally use SolidJS for my internal UI needs. There is however always a need to generate some form of text and so somehow Jinja2's need never really went away. When I originally created it, it was clearly meant for generating HTML with some JavaScript sprinkled on top, but in the years since I have encountered Jinja templates in many more places, primarily for generating YAML and similar formats. Lately it comes up for LLM prompt generation.

My personal need for MiniJinja came out of an experiment I built for infrastructure automation. Since the templates had to be loaded dynamically I could not use a system like Askama. Askama has type-safe templates that just generate Rust code. On the other hand most Jinja inspired template engines that are dynamic in Rust really do not try very hard to be Jinja compatible. Because writing template engines is also fun, I figured I might give it another try.

Over the last two years I kept adding to the engine until it got to the point where it's at almost feature parity with Jinja2 and quite enjoyable to use.

Runtime Values

When building a template engine for Rust you end up building a little dynamic programming language that is optimized for text generation. Consequently you pull in most of the challenges of building a dynamic language. Particularly when working in Rust the immediate challenge is memory management and exposing native Rust objects to the embedded language. So the interesting bit here is how to create a system that allows interactions between the template engine and the Rust world around it.

MiniJinja, unlike Jinja2 does not use code generation but has a basic stack based VM and a AST based bytecode compiler. Since MiniJinja follows Jinja2 it inherits a lot of the realities of the underlying object system that Jinja2 inherits from Python. For instance macros (functions) are first class objects and they can have closures. This has challenges because it's easy to create cycles and Rust has no garbage collector that can help with this problem.

The core object model in MiniJinja is a Value type which is represented by an enum that looks as follows (some less important variants removed):

#[derive(Clone)] pub struct Value(ValueRepr); #[derive(Clone)] pub(crate) enum ValueRepr { Undefined, None, Bool(bool), U64(u64), I64(i64), F64(f64), String(Arc<str>, StringType), SmallStr(SmallStr), Invalid(Arc<Error>), Object(DynObject), }

Externaly everything is a Value. If you Clone it, you usually bump a reference count or you make a cheap memcopy. Values are either primitives such as strings, numbers etc. or objects.

For objects MiniJinja provides a tait called Object which can be implemented by most Rust types. The engine provides a DynObject wrapper is a fancy Arc<dyn Object> which supports borrowing and object safety. I wrote about this before. What you will notice is that quite a few of the types involved have an Arc. That's because these values are for the most part reference counted. Since values here are really fat (they are 24 bytes in memory) a SmallStr type is used to hold up to 22 bytes of string data inline. One byte is used to encode the length of the string, and another byte is then used by the ValueRepr to mark which enum variant is in use. In pure theory this is all wrong. We never use weak references, so the weak count in the Arc is not used and clever bit hackery could be used to greatly reduce the size of the value type. I think one could get the whole thing down to 16 bytes trivially or even 8 bytes with NaN tagging. However I did not want to walk into the world of unsafe code more than feels appropriate.

MiniJinjia is also plenty fast.

One variant that is worth calling out is Invalid. That's a value that can exist in the system but it carries an error. When you're trying to interact with it in most cases it will propagate this error. That's used in the engine in places where the API assumes infallability (particularly during iteration) but it needs a way to emit an error. This concept is quite common when writing an engine in C though typically the actual error is carried out of bounds. For instance in QuickJS there is a marker value that indicates a failure, but the actual error is held on the interpreter runtime.

The trait definition for objects looks like this:

pub trait Object: Debug + Send + Sync { fn repr(self: &Arc<Self>) -> ObjectRepr { ... } fn get_value(self: &Arc<Self>, key: &Value) -> Option<Value> { ... } fn enumerate(self: &Arc<Self>) -> Enumerator { ... } fn enumerator_len(self: &Arc<Self>) -> Option<usize> { ... } fn is_true(self: &Arc<Self>) -> bool { ... } fn call( self: &Arc<Self>, state: &State<'_, '_>, args: &[Value], ) -> Result<Value, Error> { ... } fn call_method( self: &Arc<Self>, state: &State<'_, '_>, method: &str, args: &[Value], ) -> Result<Value, Error> { ... } fn render(self: &Arc<Self>, f: &mut Formatter<'_>) -> Result where Self: Sized + 'static { ... } }

Some of these methods are implemented automatically. For instance many of the methods such as is_true or enumerator_len have a default implementation that is based on object repr and the return value from enumerate. But they can be overridden to change the default behavior or to add some potential optimizations.

One of the most important types in Jinja is a map as it holds the template context. They are implemented as you can imagine as Object. The implementation is in fact pretty trivial:

impl<V> Object for BTreeMap<Value, V> where V: Into<Value> + Clone + Send + Sync + fmt::Debug + 'static, { fn get_value(self: &Arc<Self>, key: &Value) -> Option<Value> { self.get(key).cloned().map(|v| v.into()) } fn enumerate(self: &Arc<Self>) -> Enumerator { self.mapped_enumerator(|this| Box::new(this.keys().cloned())) } }

This reveals two interesting aspects of the object model: First that Value implements Hash. That means any value can be used as the key in a value. While this is untypical for Rust and even not what happens in Python, it simplifies the system greatly. When in the template engine you write {{ object.key }}, behind the scenes object.get_value(Value::from("key")) is called. Since most keys are typically less than 22 characters, creating a dummy Value wrapper around is not too problematic.

The second and probably more interesting part here is that you can sort of borrow out of an object for the enumerator. The mapped_enumerator helper takes a reference to self and invokes a closure which itself can borrow from self. This adjacent borrowing is implemented with unsafe code as there is no other way to make it work. The combination of repr (defaults to Map), get_value and enumerate gives the object the behavior, shape and contents.

Vectors look quite similar:

impl<T> Object for Vec<T> where T: Into<Value> + Clone + Send + Sync + fmt::Debug + 'static, { fn repr(self: &Arc<Self>) -> ObjectRepr { ObjectRepr::Seq } fn get_value(self: &Arc<Self>, key: &Value) -> Option<Value> { self.get(key.as_usize()?).cloned().map(|v| v.into()) } fn enumerate(self: &Arc<Self>) -> Enumerator { Enumerator::Seq(self.len()) } } Enumerators and Object Behaviors

Enumeration in MiniJinja is a way to allow an object to describe what's inside of it. In combination with the return values from repr() the engine changes how iteration is performed. These are possible enumerators:

pub enum Enumerator { NonEnumerable, Empty, Iter(Box<dyn Iterator<Item = Value> + Send + Sync>), Seq(usize), Values(Vec<Value>), }

It's probably easier to explain how enumerators turn into iterators by showing you the try_iter method in the engine:

impl DynObject { fn try_iter(self: &Self) -> Option<Box<dyn Iterator<Item = Value> + Send + Sync>> where Self: 'static, { match self.enumerate() { Enumerator::NonEnumerable => None, Enumerator::Empty => Some(Box::new(None::<Value>.into_iter())), Enumerator::Seq(l) => { let self_clone = self.clone(); Some(Box::new((0..l).map(move |idx| { self_clone.get_value(&Value::from(idx)).unwrap_or_default() }))) } Enumerator::Iter(iter) => Some(iter), Enumerator::Values(v) => Some(Box::new(v.into_iter())), } } }

Some of the trivial enumerators are quick to explain: Enumerator::NonEnumerable just does not support iteration and Enumerator::Empty does but won't yield any values. The more interesting one is Enumerator::Seq(n) which basically tells the engine to call get_value from 0 to n to yield items from the object. This is how sequences are implemented. The rest are enumerators that just directly yield values.

So when you want to iterate over a map, you will usually use something like Enumerator::Iter and iterate over all the keys in the map.

The engine then uses ObjectRepr to figure out what to do with it. For a value marked as ObjectRepr::Seq it will display like a sequence, you can index it with integers, and that it iterates over the values in the sequence. If the repr is ObejctRepr::Map then the expectation is that it will be indexable by key and it will iterate over the keys when used in a loop. Its default rendering also is a key-value pair list wrapped in curly braces.

Now quite frankly I don't like that iteration protocol. I think it's more sensible for maps to naturally iterate over the key-value pairs, but since MiniJinja follows Jinja2 and Jinja2 follows Python emulating was important.

Enumerators are a bit different than iterators because they might only define how iteration is performed (see: Enumerator::Seq). To actually create an iterator, the object is then passed to it. They are also asked to provide a length. When an enumerator provides a length it's an indication to the engine that the object can be iterated over more than once (you can re-create the enumerator). This is why objects land in a MiniJinja template that looks like a list, but is actually just an iterable object with a known length. For this MiniJinja uses a trick where it will inspect the size hint of the iterator to make assumptions about it. Internally every enumerator allows the engine to query the length of it:

impl Enumerator { fn query_len(&self) -> Option<usize> { Some(match self { Enumerator::Empty => 0, Enumerator::Values(v) => v.len(), Enumerator::Iter(i) => match i.size_hint() { (a, Some(b)) if a == b => a, _ => return None, }, Enumerator::RevIter(i) => match i.size_hint() { (a, Some(b)) if a == b => a, _ => return None, }, Enumerator::Seq(v) => *v, Enumerator::NonEnumerable => return None, }) } }

The important part here is the call to size_hint. If the upper bound is known, and the lower bound matches the upper bound then MiniJinja will assume the iterator will always have that length (for as long as not iterated). As a result it will change the way the object is interacted with. This for instance means that if you run range(10) in a template it looks like a list when printed even though iteration and number creation is lazy. On the other hand if you use the Value::make_one_shot_iterator API the length hint will always be disabled and MiniJinja will not attempt to interact with the iterator when printing it:

{{ range(4) }} -> prints [0, 1, 2, 3] {{ a_real_iterator }} -> prints <iterator> Building a VM

Lexing and parsing I think is not too puzzling in Rust, but making an AST and making a VM is kinda unusual. The first thing is that Rust is just not particularly amazing at tree structures. In MiniJinja I really wanted to avoid having the AST at all, but it does come in in handy to implement some of the functionality that Jinja2 requires. For instance to establish closures it will just walk the AST to figure out which names are looked up within a function. I tried a few things to improve how memory allocations work with the AST. There are great crates out there for doing this, but I really wanted MiniJinja to be light on dependencies so I ended up opting against all of them.

For the AST design I went with large enums that hold Spanned<T> values:

pub enum Expr<'a> { Var(Spanned<Var<'a>>), Const(Spanned<Const>), ... } pub struct Var<'a> { pub id: &'a str, } pub struct Const { pub value: Value, }

You might now be curious what Spanned<T> is. It's a wrapper type that does two things: it boxes the inner node and it stores and adjacent Span which is basically the code location in the original input template for debugging:

pub struct Spanned<T> { node: Box<T>, span: Span, }

It implements Deref like a smart pointer so you can poke right through it to interact with the node. The code generator just walks the AST and emits instructions for it.

The instructions themselves are a large enum but the number of arguments to the variants is kept rather low to not waste too much memory. The base size of the instruction is dominated by it being able to hold a Value which as we have established is a pretty hefty thing:

pub enum Instruction<'source> { EmitRaw(&'source str), StoreLocal(&'source str), Lookup(&'source str), LoadConst(Value), Jump(usize), JumpIfFalse(usize), JumpIfFalseOrPop(usize), JumpIfTrueOrPop(usize), ... }

The VM keeps most of the runtime state on a State object that is passed to a few places. For instance you have already seen this in the call signature further up. The state for instance holds the loaded instructions or the template context. The VM itself maintains a stack of values and then just steps through a list of instructions on the state in a loop. Since there are a lot of instructions you can have a look on GitHub to see it in its entirety. Here however is a small part that shows roughly how this works:

let mut pc = 0; loop { let instr = state.instructions.get(pc) { Some(instr) => instr, None => break, }; let a; let b; match instr { Instruction::EmitRaw(val) => { out.write_str(val).map_err(Error::from)?; } Instruction::Emit => { self.env.format(&stack.pop(), state, out)?; } Instruction::StoreLocal(name) => {, stack.pop()); } Instruction::Lookup(name) => { stack.push(assert_valid!(state .lookup(name) .unwrap_or(Value::UNDEFINED))); } Instruction::GetAttr(name) => { a = stack.pop(); stack.push(match a.get_attr_fast(name) { Some(value) => value, None => undefined_behavior.handle_undefined(a.is_undefined())?, }); } Instruction::LoadConst(value) => { stack.push(value.clone()); } Instruction::Jump(jump_target) => { pc = *jump_target; continue; } Instruction::JumpIfFalse(jump_target) => { a = stack.pop(); if !undefined_behavior.is_true(&a)? { pc = *jump_target; continue; } } // ... } pc += 1; }

Basically the current instruction is held in pc (short for program counter), normally it's advanced by one but jump instructions can change the pc to any other location. If you run out of instructions the evaluation ends.

One piece of complexity in the VM comes down to macros. That's because lifetimes make that really tricky. A macro is just a Value that holds a Macro Object internally. So how can that macro reference the instructions, if the instructions themselves have a lifetime to the template 'source? The answer is that they can't (at least I have not found a reasonable way). So instead a macro has an ID which acts as a handle to look up the instructions dynamically from the execution state. Additionally each state has a unique ID so the engine can assert that nothing funny was happening. The downside of this is that a macro cannot be "returned" from a template. They can however be imported from one template into another.

Here is what a macro object looks like in code (abbreviated):

pub(crate) struct Macro { pub name: Value, pub arg_spec: Vec<Value>, pub macro_ref_id: usize, // id of the macro pub state_id: isize, pub closure: Value, pub caller_reference: bool, } impl Object for Macro { fn call(self: &Arc<Self>, state: &State<'_, '_>, args: &[Value]) -> Result<Value, Error> { // we can only call macros that point to loaded template state. // if a template would be returned from a template this will // fail. if != self.state_id { return Err(Error::new( ErrorKind::InvalidOperation, "cannot call this macro. template state went away.", )); } // ... argument parsing let arg_values = ...; // find referenced instructions let (instructions, offset) = &state.macros[self.macro_ref_id]; // created a nested vm and evaluate the macro let vm = Vm::new(state.env()); let mut rv = String::new(); let mut out = Output::with_string(&mut rv); let closure = self.closure.clone(); ok!(vm.eval_macro( instructions, *offset, self.closure.clone(), state.ctx.clone_base(), caller, &mut out, state, arg_values )); // return rendered template as string from the call Ok(if !matches!(state.auto_escape(), AutoEscape::None) { Value::from_safe_string(rv) } else { Value::from(rv) }) } }

Additionally the closure is a good source of cycles. For that reason the engine keeps track of all closures during the execution and breaks cycles caused by closures manually by clearning them out.

Cool APIs

The last part that I want to go over is the magic that makes this work:

fn slugify(value: String) -> String { value.to_lowercase().split_whitespace().collect::<Vec<_>>().join("-") } fn timeformat(state: &State, ts: f64) -> String { let configured_format = state.lookup("TIME_FORMAT"); let format = configured_format .as_ref() .and_then(|x| x.as_str()) .unwrap_or("HH:MM:SS"); format_unix_timestamp(ts, format) } let mut env = Environment::new(); env.add_filter("slugify", slugify); env.add_filter("timeformat", timeformat);

You might have seem something like this in Rust before, but it's still a bit magical. How can you make functions with seemingly different signatures register with the add_filter function? How does the engine perform the type conversions (as we know the engine has Value types, so where does the String conversion take place?). This is a topic for a blog post on its own but the answer behind this lies in a a lot of clever trait hackery. The add_filter function reveals a bit of that hackery:

pub fn add_filter<N, F, Rv, Args>(&mut self, name: N, f: F) where N: Into<Cow<'source, str>>, F: Filter<Rv, Args> + for<'a> Filter<Rv, <Args as FunctionArgs<'a>>::Output>, Rv: FunctionResult, Args: for<'a> FunctionArgs<'a>, { let filter = BoxedFilter(Arc::new(move |state, args| -> Result<Value, Error> { f.apply_to(Args::from_values(Some(state), args)?).into_result() })); self.filters.insert(name.into(), filter); }

Hidden behind this rather complex set of traits are some basic ideas:

  1. FunctionArgs is a helper trait for type conversions. It's implemented for tuples of different sizes made of ArgType values. These tuples represent the signature of the function. It has a method called from_values which performs that conversion via ArgType.
  2. ArgType which you can't really see in the code above, is a trait that knows how to convert a Value into whatever the function desires as argument.
  3. Filter is a trait implemented for function with qualifying FunctionArgs signatures returning a FunctionResult.
  4. A FunctionResult is a trait that represents potential return values from the function such as a Value, something that can be converted into a Value or a Result.
  5. The BoxedFilter type is what converts the passed closure into a reference counted object that is held in the environment.

I think a lot of the patterns in MiniJinja are useful for projects outside of MiniJinja. Quite is quite a bit more hidden in it that I have talked about before such as how MiniJinja is abusing serde. If you have a need for a Jinja2 compatible template engine I would love if you get some use out of it. If you're curious about how to build a runtime and object system in Rust, you might also find some utility in the codebase.

I myself learned quite a bit about what creative API design can look like in Rust by building it. At this point I am incredibly happy with how the public API of the engine shaped out to be. The engine is extensively documented both internally and publicly and you can read all about it in the API docs.

Categories: FLOSS Project Planets

Real Python: How to Install Python on Your System: A Guide

Mon, 2024-08-26 10:00

Installing the latest version of Python on your computer could be a common requirement for you as a Python programmer. Fortunately, you’ll have a multitude of installation options. For example, you can download the official Python installer from, use your operating system’s package manager or app store, and more.

In this tutorial, you’ll focus on official CPython distributions, which are generally the best option for learning to program with the language. However, you’ll also learn about a few other distributions, like the one available on Homebrew for macOS users.

In this tutorial, you’ll learn how to:

  • Check whether a version of Python is installed on your system
  • Install or update to the latest Python on Windows, macOS, and Linux
  • Install Python on mobile devices like phones or tablets
  • Use Python on your browser with online interpreters

This tutorial covers installing the latest Python on the most important platforms or operating systems, such as Windows, macOS, Linux, iOS, and Android. However, it doesn’t cover all the existing Linux distributions, which would be a huge task. Anyway, you’ll find instructions for the most popular distros nowadays.

To get the most out of this tutorial, you should be comfortable using your operating system’s terminal or command line.

Free Bonus: Click here to get a Python Cheat Sheet and learn the basics of Python 3, like working with data types, dictionaries, lists, and Python functions.

Take the Quiz: Test your knowledge with our interactive “Python Installation and Setup” quiz. You’ll receive a score upon completion to help you track your learning progress:

Interactive Quiz

Python Installation and Setup

In this quiz, you'll test your understanding of how to install or update Python on your computer. With this knowledge, you'll be able to set up Python on various operating systems, including Windows, macOS, and Linux.

Windows: How to Check or Get Python

In this section, you’ll learn to check whether Python is installed on your Windows operating system (OS) and which version you have. You’ll also explore three installation options that you can use on Windows.

Note: In this tutorial, you’ll focus on installing the latest version of Python in your current operating system (OS) rather than on installing multiple versions of Python. If you want to install several versions of Python in your OS, then check out the Managing Multiple Python Versions With pyenv tutorial. Note that on Windows machines, you’d have to use pyenv-win instead of pyenv.

For a more comprehensive guide on setting up a Windows machine for Python programming, check out Your Python Coding Environment on Windows: Setup Guide.

Checking the Python Version on Windows

To check whether you already have Python on your Windows machine, open a command-line application like PowerShell or the Windows Terminal.

Follow the steps below to open PowerShell on Windows:

  1. Press the Win key.
  2. Type PowerShell.
  3. Press Enter.

Alternatively, you can right-click the Start button and select Windows PowerShell or Windows PowerShell (Admin). In some versions of Windows, you’ll find Terminal or Terminal (admin).

Note: To learn more about your options for the Windows terminal, check out Using the Terminal on Windows.

With the command line open, type in the following command and press the Enter key:

Windows PowerShell PS> python --version Python 3.x.z Copied!

Using the --version switch will show you the installed version. Note that the 3.x.z part is a placeholder here. In your machine, x and z will be numbers corresponding to the specific version you have installed.

Alternatively, you can use the -V switch:

Windows PowerShell PS> python -V Python 3.x.z Copied!

Using the python -V or python—-version command, you can check whether Python is installed on your system and learn what version you have. If Python isn’t installed on your OS, you’ll get an error message.

Knowing the Python Installation Options on Windows Read the full article at »

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Ned Batchelder: Coverage branches instead of arcs

Mon, 2024-08-26 09:18

As I mentioned in a few recent posts, I’ve been working on some significant work in to take advantage of new capabilities in Python.

Mark Shannon has been improving the sys.monitoring API so that branch coverage can be done with low overhead. I want to take advantage of that in, but I needed to do some refactoring work first. The tests were focused on mapping the complete set of code pathways (which I called arcs), but using low-overhead branch monitoring won’t provide those complete pathways. If the tests continued to focus on them, they would fail with sys.monitoring.

But the complete pathways aren’t actually needed. The useful information is where the branches are, and which branches were taken. That can be measured with sys.monitoring. So a first step was to refactor the tests to focus on branches instead of arcs. That took a while, but is now done.

Not needing all those arcs also meant I could simplify the AST-based parser that found the arcs, removing about 150 lines. I suspect there’s more that could be removed. Maybe it will happen over time. Also, the new code.co_branches() method might make it all obsolete over time.

If you read Coverage at a crossroads on this blog, I talked about using ideas from SlipCover like inserting fake lines with an import hook. Those exotic ideas were appealing in their way, but are no longer needed, and they would have brought a bunch of complexity. With the two new sys.monitoring events, we can get the branch information directly without advanced shenanigans.

There’s more work to do, including attending to incoming bug reports. If you’d like to help, or learn more about any of this, we have a #coverage-py channel in the Python Discord.

Categories: FLOSS Project Planets

Python Bytes: #398 Open source makes you rich? (and other myths)

Mon, 2024-08-26 04:00
<strong>Topics covered in this episode:</strong><br> <ul> <li><strong>Open Source Myths</strong></li> <li><a href=""><strong>uv 0.3.0 and all the excitement</strong></a></li> <li><a href=""><strong>Top pytest Plugins</strong></a></li> <li><strong><a href="">A comparison of hosts / providers for Python serverless functions</a><a href=""> </a><a href="">(aka</a><a href=""> Faas)</a></strong></li> <li><strong>Extras</strong></li> <li><strong>Joke</strong></li> </ul><a href='' style='font-weight: bold;'data-umami-event="Livestream-Past" data-umami-event-episode="398">Watch on YouTube</a><br> <p><strong>About the show</strong></p> <p>Sponsored by us! Support our work through:</p> <ul> <li>Our <a href=""><strong>courses at Talk Python Training</strong></a></li> <li><a href=""><strong>pytest courses and community at</strong></a></li> <li><a href=""><strong>Patreon Supporters</strong></a></li> </ul> <p><strong>Connect with the hosts</strong></p> <ul> <li>Michael: <a href=""><strong></strong></a></li> <li>Brian: <a href=""><strong></strong></a></li> <li>Show: <a href=""><strong></strong></a></li> </ul> <p>Join us on YouTube at <a href=""><strong></strong></a> to be part of the audience. Usually <strong>Monday</strong> at 10am PT. Older video versions available there too.</p> <p>Finally, if you want an artisanal, hand-crafted digest of every week of the show notes in email form? Add your name and email to <a href="">our friends of the show list</a>, we'll never share it. </p> <p><strong>Brian #1:</strong> <strong>Open Source Myths</strong></p> <ul> <li><a href="">Josh Bressers</a></li> <li><a href="">Mastodon post kicking off a list of open source myths</a></li> <li><a href="">Feedback and additional myths compiled to a doc</a></li> <li>Some favorites <ul> <li>All open source developers live in Nebraska</li> <li>It’s all run by hippies</li> <li>Everything is being rewritten in rust</li> <li>Features are planned</li> <li>If the source code is available, it’s open source</li> <li>A project with no commits for 12 months is abandoned</li> <li>Many eyes make all bugs shallow</li> <li>Open source has worse UX</li> <li>Open source has better UX</li> <li>Open source makes you rich</li> </ul></li> </ul> <p><strong>Michael #2:</strong> <a href=""><strong>uv 0.3.0 and all the excitement</strong></a></p> <ul> <li>Thanks to Skyler Kasko and John Hagen for the emails.</li> <li><a href="">Additional write up</a> by Simon Willison</li> <li><a href="">Additional write up</a> by Armin Ronacher</li> <li>End-to-end project management: uv run, uv lock, and uv sync</li> <li>Tool management: uv tool install and uv tool run (aliased to uvx)</li> <li>Python installation: uv python install</li> <li>Script execution: uv can now manage hermetic, single-file Python scripts with inline dependency metadata based on PEP 723.</li> </ul> <p><strong>Brian #3:</strong> <a href=""><strong>Top pytest Plugins</strong></a></p> <ul> <li>Inspired by (and assisted by) Hugo’s <a href="">Top PyPI Packages</a></li> <li>Write up for <a href="">Finding the top pytest plugins</a></li> <li>BTW, <a href="">pytest-check</a> has made it to 25.</li> <li>Same day, <a href="">Jeff Triplett throws my code into Claude 3.5 Sonnet and refactors it</a></li> <li>Thanks <a href="">Jeff Triplett &amp; Hugo for answering how to add Summary and other info</a></li> </ul> <p><strong>Michael #4:</strong> <a href="">A comparison of hosts / providers for Python serverless functions</a><a href=""> </a><a href="">(aka</a><a href=""> Faas)</a></p> <ul> <li>Nice feature matrix of all the options, frameworks, costs, and more</li> <li>The WASM ones look particularly interesting to me.</li> </ul> <p><strong>Extras</strong> </p> <p>Brian:</p> <ul> <li><a href="#">When is the next live episode of Python Bytes?</a> - via <ul> <li>Thanks to Hugo van Kemenade</li> </ul></li> <li>Some <a href="">more cool projects by Hugo</a> <ul> <li><a href="">Python Logos</a></li> <li><a href="">PyPI Downloads</a> by Python version for various Python tools, in pretty colors</li> <li><a href="">Python Core Developers </a>over time</li> </ul></li> </ul> <p>Michael:</p> <ul> <li><a href="">Code in a Castle Course event</a> - just a couple of weeks left</li> <li><a href="">Ladybird</a>: A truly independent browser</li> <li>“I'm also interested in your video recording setup, would be nice to have that in the extras too :D” <ul> <li><a href="">OBS Studio</a></li> <li><a href="">Elgato Streamdeck</a></li> <li><a href="">Elgato Key light</a></li> <li><a href="">DaVinci Resolve</a></li> </ul></li> </ul> <p><strong>Joke:</strong> DevOps Support Group</p> <p>via Blaise</p> <ul> <li>Hi, my name is Bob</li> <li><em>Group</em>: Hi Bob</li> <li>I's been 42 days since I last ssh'd into production.</li> <li><em>Group</em>: Applause</li> <li>But only 4 days since I accidentally took down the website</li> <li><em>Someone in back</em>: Oh Bob…</li> </ul>
Categories: FLOSS Project Planets

Zato Blog: Integrating with Jira APIs

Mon, 2024-08-26 04:00
Integrating with Jira APIs 2024-08-26, by Dariusz Suchojad Overview

Continuing in the series of articles about newest cloud connections in Zato 3.2, this episode covers Atlassian Jira from the perspective of invoking its APIs to build integrations between Jira and other systems.

There are essentially two use modes of integrations with Jira:

  1. Jira reacts to events taking place in your projects and invokes your endpoints accordingly via WebHooks. In this case, it is Jira that explicitly establishes connections with and sends requests to your APIs.
  2. Jira projects are queried periodically or as a consequence of events triggered by Jira using means other than WebHooks.

The first case is usually more straightforward to conceptualize - you create a WebHook in Jira, point it to your endpoint and Jira invokes it when a situation of interest arises, e.g. a new ticket is opened or updated. I will talk about this variant of integrations with Jira in a future instalment as the current one is about the other situation, when it is your systems that establish connections with Jira.

The reason why it is more practical to first speak about the second form is that, even if WebHooks are somewhat easier to reason about, they do come with their own ramifications.

To start off, assuming that you use the cloud-based version of Jira (e.g., you need to have a publicly available endpoint for Jira to invoke through WebHooks. Very often, this is undesirable because the systems that you need to integrate with may be internal ones, never meant to be exposed to public networks.

Secondly, your endpoints need to have a TLS certificate signed by a public Certificate Authority and they need to be accessible on port 443. Again, both of these are something that most enterprise systems will not allow at all or it may take months or years to process such a change internally across the various corporate departments involved.

Lastly, even if a WebHook can be used, it is not always a given that the initial information that you receive in the request from a WebHook will already contain everything that you need in your particular integration service. Thus, you will still need a way to issue requests to Jira to look up details of a particular object, such as tickets, in this way reducing WebHooks to the role of initial triggers of an interaction with Jira, e.g. a WebHook invokes your endpoint, you have a ticket ID on input and then you invoke Jira back anyway to obtain all the details that you actually need in your business integration.

The end situation is that, although WebHooks are a useful concept that I will write about in a future article, they may very well not be sufficient for many integration use cases. That is why I start with integration methods that are alternative to WebHooks.

Alternatives to WebHooks

If, in our case, we cannot use WebHooks then what next? Two good approaches are:

  1. Scheduled jobs
  2. Reacting to emails (via IMAP)

Scheduled jobs will let you periodically inquire with Jira about the changes that you have not processed yet. For instance, with a job definition as below:

Now, the service configured for this job will be invoked once per minute to carry out any integration works required. For instance, it can get a list of tickets since the last time it ran, process each of them as required in your business context and update a database with information about what has been just done - the database can be based on Redis, MongoDB, SQL or anything else.

Integrations built around scheduled jobs make most sense when you need to make periodic sweeps across a large swaths of business data, these are the "Give me everything that changed in the last period" kind of interactions when you do not know precisely how much data you are going to receive.

In the specific case of Jira tickets, though, an interesting alternative may be to combine scheduled jobs with IMAP connections:

The idea here is that when new tickets are opened, or when updates are made to existing ones, Jira will send out notifications to specific email addresses and we can take advantage of it.

For instance, you can tell Jira to CC or BCC an address such as Now, Zato will still run a scheduled job but instead of connecting with Jira directly, that job will look up unread emails for it inbox ("UNSEEN" per the relevant RFC).

Anything that is unread must be new since the last iteration which means that we can process each such email from the inbox, in this way guaranteeing that we process only the latest updates, dispensing with the need for our own database of tickets already processed. We can extract the ticket ID or other details from the email, look up its details in Jira and the continue as needed.

All the details of how to work with IMAP emails are provided in the documentation but it would boil down to this:

# -*- coding: utf-8 -*- # Zato from zato.server.service import Service class MyService(Service): def handle(self): conn ='My Jira Inbox').conn for msg_id, msg in conn.get(): # Process the message here .. process_message( # .. and mark it as seen in IMAP. msg.mark_seen()

The natural question is - how would the "process_message" function extract details of a ticket from an email?

There are several ways:

  1. Each email has a subject of a fixed form - "[JIRA] (ABC-123) Here goes description". In this case, ABC-123 is the ticket ID.
  2. Each email will contain a summary, such as the one below, which can also be parsed:
Summary: Here goes description Key: ABC-123 URL: Project: My Project Issue Type: Improvement Affects Versions: 1.3.17 Environment: Production Reporter: Reporter Name Assignee: Assignee Name
  1. Finally, each email will have an "X-Atl-Mail-Meta" header with interesting metadata that can also be parsed and extracted:
X-Atl-Mail-Meta: user_id="123456:12d80508-dcd0-42a2-a2cd-c07f230030e5", event_type="Issue Created", tenant=""

The first option is the most straightforward and likely the most convenient one - simply parse out the ticket ID and call Jira with that ID on input for all the other information about the ticket. How to do it exactly is presented in the next chapter.

Regardless of how we parse the emails, the important part is that we know that we invoke Jira only when there are new or updated tickets - otherwise there would not have been any new emails to process. Moreover, because it is our side that invokes Jira, we do not expose our internal system to the public network directly.

However, from the perspective of the overall security architecture, email is still part of the attack surface so we need to make sure that we read and parse emails with that in view. In other words, regardless of whether it is Jira invoking us or our reading emails from Jira, all the usual security precautions regarding API integrations and accepting input from external resources, all that still holds and needs to be part of the design of the integration workflow.

Creating Jira connections

The above presented the ways in which we can arrive at the step of when we invoke Jira and now we are ready to actually do it.

As with other types of connections, Jira connections are created in Zato Dashboard, as below. Note that you use the email address of a user on whose behalf you connect to Jira but the only other credential is that user's API token previously generated in Jira, not the user's password.

Invoking Jira

With a Jira connection in place, we can now create a Python API service. In this case, we accept a ticket ID on input (called "a key" in Jira) and we return a few details about the ticket to our caller.

This is the kind of a service that could be invoked from a service that is triggered by a scheduled job. That is, we would separate the tasks, one service would be responsible for opening IMAP inboxes and parsing emails and the one below would be responsible for communication with Jira.

Thanks to this loose coupling, we make everything much more reusable - that the services can be changed independently is but one part and the more important side is that, with such separation, both of them can be reused by future services as well, without tying them rigidly to this one integration alone.

# -*- coding: utf-8 -*- # stdlib from dataclasses import dataclass # Zato from zato.common.typing_ import cast_, dictnone from zato.server.service import Model, Service # ########################################################################### if 0: from zato.server.connection.jira_ import JiraClient # ########################################################################### @dataclass(init=False) class GetTicketDetailsRequest(Model): key: str @dataclass(init=False) class GetTicketDetailsResponse(Model): assigned_to: str = '' progress_info: dictnone = None # ########################################################################### class GetTicketDetails(Service): class SimpleIO: input = GetTicketDetailsRequest output = GetTicketDetailsResponse def handle(self): # This is our input data input = self.request.input # type: GetTicketDetailsRequest # .. create a reference to our connection definition .. jira =['My Jira Connection'] # .. obtain a client to Jira .. with jira.conn.client() as client: # Cast to enable code completion client = cast_('JiraClient', client) # Get details of a ticket (issue) from Jira ticket = client.get_issue(input.key) # Observe that ticket may be None (e.g. invalid key), hence this 'if' guard .. if ticket: # .. build a shortcut reference to all the fields in the ticket .. fields = ticket['fields'] # .. build our response object .. response = GetTicketDetailsResponse() response.assigned_to = fields['assignee']['emailAddress'] response.progress_info = fields['progress'] # .. and return the response to our caller. self.response.payload = response # ########################################################################### Creating a REST channel and testing it

The last remaining part is a REST channel to invoke our service through. We will provide the ticket ID (key) on input and the service will reply with what was found in Jira for that ticket.

We are now ready for the final step - we invoke the channel, which invokes the service which communicates with Jira, transforming the response from Jira to the output that we need:

$ curl localhost:17010/jira1 -d '{"key":"ABC-123"}' { "assigned_to":"", "progress_info": { "progress": 10, "total": 30 } } $

And this is everything for today - just remember that this is just one way of integrating with Jira. The other one, using WebHooks, is something that I will go into in one of the future articles.

More resources

➤ Python API integration tutorial
What is an integration platform?
Python Integration platform as a Service (iPaaS)
What is an Enterprise Service Bus (ESB)? What is SOA?

More blog posts
Categories: FLOSS Project Planets

Matt Layman: Layman's Guide to Python Built-in Functions

Sun, 2024-08-25 20:00
Quick Jump List A: abs, aiter, all, anext, any, ascii, B: bin, bool, breakpoint, bytearray, bytes, C: callable, chr, classmethod, compile, complex, D: delattr, dict, dir, divmod E: enumerate, eval, exec, F: filter, float, format, frozenset, G: getattr, globals, H: hasattr, hash, help, hex, I: id, input, int, isinstance, issubclass, iter, L: len, list, locals, M: map, max, memoryview, min, N: next, O: object, oct, open, ord, P: pow, print, property, R: range, repr, reversed, round, S: set, setattr, slice, sorted, staticmethod, str, sum, super, T: tuple, type, V: vars, Z: zip, _: __import__,
Categories: FLOSS Project Planets

Seth Michael Larson: 2024 Minnesota State Fair foods

Sat, 2024-08-24 20:00
2024 Minnesota State Fair foods AboutBlogNewsletterLinks 2024 Minnesota State Fair foods

Published 2024-08-25 by Seth Larson
Reading time: minutes

If you didn't know, I'm from Minnesota. Minnesotans love their State Fair, and I'm not an exception! My wife and I were lucky enough to go to a State Fair preview for LuLu's Public House for fried ranch dressing among a handful of new drinks. I shared my thoughts on Mastodon and a few folks seemed interested in hearing more: so here's more!

Cajun fried pickles from The Perfect Pickle

These are hands-down the best food at the Minnesota State Fair. You eat an order, ponder getting more (some years we do!) and then wonder to yourself why they put the best of the best right next to the shuttle entrance. Don't go out looking for answers lest they move these further away, sometimes it's best to leave sleeping “pickle dogs” lie.

Seriously, if you like pickles even a little bit, get these pickles. You can get them quick if you're lucky and other folks don't realize there are supposed to be six lines of people taking orders.

They're ripping hot right when they hand them to you, so if you're like me and enjoy food “biting back” then don't delay! 🔥

This year included a noticeable increase in the amount of Cajun seasoning, or we got lucky and someone behind the scenes gave us an extra coating (either way we're not complaining!)

Peanut Butter Bacon Cakes and Blue Cheese & Corn Fritz from The Blue Barn

Celebrating their 10th consecutive year at the Minnesota State Fair, The Blue Barn is always a fan favorite. Seriously, run over there if you get to the fair early to beat the massive lines for food and drinks.

We grabbed the new Peanut Butter Bacon Cakes along with the returning classic Blue Cheese & Corn Fritz which I had never tried before.

The Peanut Butter Bacon Cakes were really great, there was thick-cut bacon griddled inside of pancake batter strips along with jelly and a peanut butter whipped cream. Perfect combo of savory and sweet, and you're in complete control of the ratios. The bacon and pancake flavors reminded me of learning to make pancakes with my late grandfather. Although that bacon was microwave-ready Hormel bacon... I promise this one's delish!

The Blue Cheese & Corn Fritz was really great, I missed out on this one last year. Perfect amount of sweetness from the corn, really well-balanced cheesy little bite! Wish I could have had more than one of these, we were sharing amongst a big group!

Wrangler Waffle Burger, Bacon-Wrapped Pickle Dog, and “Kind of a Big Dill” Lemonade from Nordic Waffles

Another vendor that fills up immediately after opening, Nordic Waffle should be top of your list because of two returning new foods from 2023: the Bacon-Wrapped Pickle Dog and the Pickle Lemonade. Both of these are really great, the lemonade sounds strange but works really well (even if you don't love pickles). The subtle saltiness balances out the sweet and tartness which makes for a dangerously drinkable item.

The Wrangler was good, it's one of those winning combinations of flavors that is really hard to mess up: beef, cheese, caramelized onions, and a mayo-based sauce. The onions being grilled into the waffle was fun but didn't do much flavor-wise (they might as well have been a topping), honestly wish they went all-out on the onions to the point of being noticeable texture-wise in the waffle. The bacon-wrapped pickle dog is as awesome as it sounds, so much more interesting flavor-wise!

I'm also not a fan of their choice of sauce, they went with Whataburger, a famously mid-tier burger joint in Texas, of all places? This is a grave error by Nordic Waffles because Minnesota and Texas have serious State Fair beef. Minnesota seeing the highest single-day attendance over 12 days, where the Texas State Fair sees the highest total attendance over 24 days (it might be obvious which State Fair I think is the true champion).

Sweet Corn Cola Float from Blue Moon Dine-in Theater

This one was interesting! Sweet Corn icecream and house-made “corn Cola”, so I take that to mean corn syrup Cola? Not sure. The flavor definitely gave a “not-too-sweet” vibe which was nice, there was a good amount of a corny and almost “earthy” flavor in the float.

The texture of the corn icecream was a little less smooth than a normal icecream, which landed somewhere between novel and “interesting”. I actually recommend giving this one a good mix before you drink it to blend the flavors together better, you're only given a boba straw to drink it.

Overall, would I get it again? Probably not, because Lift Bridge root beer floats exist and are much better. But worth a try!

Sweet Heat Bacon Crunch from RC's BBQ

Had this one side-by-side with my typical order from RC's which is a bunch of ribs and yeah, it was fine, but if I'm buying barbecue I want ribs or brisket. There was some chili crisp (but not much, maybe because it's Minnesota) and hot honey that got a bit lost in the dish. Can't recommend this one, RC's usual items are much better.

Spam breakfast sandwich from SPAM

Attention all SPAM-lovers at the fair! The SPAM booth has moved from under the Grandstand bridge to the southern edge of the DNR building. I nearly had a heart-attack when I saw the SPAM booth wasn't in its usual spot, I had to sneak away with a fellow SPAM-lover from our group to snag this item.

We got ours with pickles (surprise!) and jalapeños, a little bit of kick and acid to cut through the lovely fatty grilled SPAM. Pretty sure this little sandwich was gone in 4 bites, highly recommend finding this stand if you're a long-time-enjoyer or first-timer of SPAM!

That's all for this year. At this point we kept trying new items, but I suspect not being hungry started to impact my opinions of the foods, so you'll have to try them yourself! :)

Thanks for reading! ♡ Did you find this article helpful and want more content like it? Get notified of new posts by subscribing to the RSS feed or the email newsletter.

This work is licensed under CC BY-SA 4.0

Categories: FLOSS Project Planets

Brian Okken: Finding the top pytest plugins

Sat, 2024-08-24 17:00
What are the top downloaded pytest plugins? I want to know this. And I’d like the answer updated regularly. So today I decided to write a script to do that for me. Grab data Let’s start with Top PyPI Packages from Hugo van Kemenade. This list is “A monthly dump of the 8,000 most-downloaded packages from PyPI.” Perfect. Parse Now: Filter for “pytest” in the package name Remove “pytest” itself.
Categories: FLOSS Project Planets

Talk Python to Me: #475: Python Language Summit 2024

Sat, 2024-08-24 04:00
Every year the core developers meet to discuss and propose the major changes and trends in Python itself. This invite-only conference of about 50 people happens inside PyCon in the US. Because it's private, we rarely get detailed looks inside this event. On this episode, we have Seth Michael Larson here to give us his account of the sessions and proposals. It's a unique look into the zeitgeist of CPython.<br/> <br/> <strong>Episode sponsors</strong><br/> <br/> <a href=''>Posit</a><br> <a href=''>Talk Python Courses</a><br/> <br/> <strong>Links from the show</strong><br/> <br/> <div><b>Seth on Mastodon</b>: <a href="" target="_blank" ></a><br/> <b>Seth on Twitter</b>: <a href="" target="_blank" >@sethmlarson</a><br/> <b>Seth on Github</b>: <a href="" target="_blank" ></a><br/> <br/> <b>The Python Language Summit 2024</b>: <a href="" target="_blank" ></a><br/> <b>PEP 2026: Calendar versioning for Python</b>: <a href="" target="_blank" ></a><br/> <b>PSF authorized as a CVE Numbering Authority</b>: <a href="" target="_blank" ></a><br/> <b>Recommends Memory-Safe Programming Languages</b>: <a href="" target="_blank" ></a><br/> <b>Watch this episode on YouTube</b>: <a href="" target="_blank" ></a><br/> <b>Episode transcripts</b>: <a href="" target="_blank" ></a><br/> <br/> <b>--- Stay in touch with us ---</b><br/> <b>Subscribe to us on YouTube</b>: <a href="" target="_blank" ></a><br/> <b>Follow Talk Python on Mastodon</b>: <a href="" target="_blank" ><i class="fa-brands fa-mastodon"></i>talkpython</a><br/> <b>Follow Michael on Mastodon</b>: <a href="" target="_blank" ><i class="fa-brands fa-mastodon"></i>mkennedy</a><br/></div>
Categories: FLOSS Project Planets

Real Python: The Real Python Podcast – Episode #218: Exploring Robotics and Python Through Electronic Projects

Fri, 2024-08-23 08:00

Are you interested in learning robotics with Python? Can physical electronics-based projects grow a child's interest in coding? This week on the show, we speak with author Marwan Alsabbagh about his book "Build Your Own Robot - Using Python, CRICKIT, and Raspberry Pi."

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Matt Layman: Golang Middleware and DBs - Building SaaS #199

Thu, 2024-08-22 20:00
In this episode, we continued the break from JourneyInbox to look through more of the Go standard library. In this session, we talked about middleware, request context, and using databases.
Categories: FLOSS Project Planets

Python Engineering at Microsoft: Announcing the General Availability of the VS Code extension for Azure Machine Learning

Thu, 2024-08-22 13:00

Machine learning and artificial intelligence are transforming the world as we know it. With the power of data, you will have countless opportunities to create something new, unique, and exciting. Whether you are a seasoned data scientist or a curious beginner, you need a platform that can help you build, train, deploy, and manage your machine learning models with ease and efficiency. Azure Machine Learning has always been the backbone for machine learning tasks, and we want to further help you in your machine learning journey by improving the way you write code.

The VS Code extension for Azure Machine Learning has been in preview for a while and we are excited to announce the general availability of the VS Code extension for Azure Machine Learning. You can use your favorite VS Code setup, either desktop or web, to build, train, deploy, debug, and manage machine learning models with Azure Machine Learning from within VS Code. This means that the extension is stable, reliable, ready for production use, and comes with additional features, such as VNET support.


“We have been using the VS Code extension for Azure Machine Learning since its preview release, and it has significantly streamlined our workflow. The ability to manage everything from building to deploying models directly within our preferred VS Code environment has been a game-changer. The seamless integration and robust features like interactive debugging and VNET support have enhanced our productivity and collaboration. We are thrilled about its general availability and look forward to leveraging its full potential in our AI projects.” – Ornaldo Ribas Fernandes: Co-founder and CEO, Fashable

Azure Machine Learning

Azure Machine Learning (Azure ML) is a cloud-based service that enables you to build, train, deploy, and manage machine learning models.

With Azure Machine Learning service, you can:

  • Build and train machine learning models faster, and easily deploy to the cloud or the edge.
  • Use the latest open-source technologies such as TensorFlow, PyTorch, or Jupyter.
  • Experiment locally and then quickly scale up or out with large GPU-enabled clusters in the cloud.
  • Interactively debug experiments, pipelines, and deployments using the built-in VS Code debugger.
  • Speed up data science with automated machine learning and hyper-parameter tuning.
  • Track your experiments, manage models, and easily deploy with integrated CI/CD tooling.

With this extension installed, you can accomplish much of this workflow directly from Visual Studio Code. The VS Code extension provides a user interface to create and manage Azure ML resources, such as experiments, compute targets, environments, and deployments. It also supports the Azure ML 2.0 CLI, which is the new command-line tool that simplifies the specification and execution of machine learning tasks.

Get Started with Azure Machine Learning Extension One click Connect to VS Code from Azure ML Studio

To get started with VS Code, navigate to the compute section of your Azure Machine Learning Studio. Find the desired compute instance and click on the VS Code (Web) or VS Code (Desktop) links under the “Applications” section.

Don’t have an Azure ML workspace or compute instance? Check out the guide here: Tutorial: Create workspace resources – Azure Machine Learning | Microsoft Learn

VS Code Desktop

After clicking on the link for VS Code desktop, the browser will ask you for your permission to launch the VS Code Desktop application. VS Code desktop will ask you to sign in using your Microsoft/Azure account.

Follow the sign-in prompts, then you should be all set up to develop your own machine learning models using your favorite VS Code set up!

VS Code Web

After clicking on the link, VS Code (Web) will open to a new tab on your browser. It may ask you to sign in using your Microsoft/Azure account, so VS Code will have permission to access your Azure subscription and workspace. Note the connection process may take a few minutes.

After signing in, you should now be connected to your Azure Machine Learning workspace inside of VS Code. Time to build your own machine learning model using the full power of VS Code!


Give the Azure Machine Learning extension a try and let us know what you think. If you have any questions or feedback, please let us know your thoughts in this survey! You can also file an issue on our public GitHub repo with any questions or concerns you may have.

Need a guide to help you get started or documentation? Check out the tutorials here: Azure Machine Learning documentation | Microsoft Learn

The post Announcing the General Availability of the VS Code extension for Azure Machine Learning appeared first on Python.

Categories: FLOSS Project Planets

EuroPython: EuroPython August 2024 Newsletter

Thu, 2024-08-22 07:50

Hello and welcome to the post-conference newsletter! We really hope you enjoyed EuroPython 2024, cause we sure did and are still recovering from all the fun and excitement :)

We have some updates to share with you, and also wanted to use this newsletter to nostalgically look back at all the good times we had  just last month, surrounded by old friends and new in the beautiful city of Prague ❤️.

&#x1F3DB;️ EuroPython Society (EPS)

This year we had a booth for the EuroPython Society at the conference. What is the EPS? The EPS is the running engine behind the EuroPython Conference. The EPS board is made up of up to 9 directors (including 1 chair and 1 vice chair). It runs the day-to-day business of the EuroPython Society, including running the EuroPython conference series, and supports the community through various initiatives such as our grants programme. The board collectively takes up the fiscal and legal responsibility of the Society.

For the next few weeks, the board is working with our accountant and auditor to get our financial reports in order. As soon as that is finalised, we will be excited to call for the next Annual General Assembly (GA); the actual GA will be held at least 14 days after our formal notice.

General Assembly is a great opportunity to hear about EuroPython Society&aposs developments and updates in the last year and a new board will also be elected at the end of the GA.

All EPS members are invited to attend the GA and have voting rights. Find out how to sign up to become an EPS member for free here:

At the moment, running the annual EuroPython conference is a major task for the EPS. As such, the board members are expected to invest significant time and effort towards overseeing the smooth execution of the conference, ranging from venue selection, contract negotiations, and budgeting, to volunteer management. Every board member has the duty to support one or more EuroPython teams to facilitate decision-making and knowledge transfer.

In addition, the Society prioritises building a close relationship with local communities. Board members should not only be passionate about the Python community but have a high-level vision and plan for how the EPS could best serve the community.

How can you become an EPS 2024 board member?

Any EPS member can nominate themselves for the EPS 2024 board. Nominations will be published prior to the GA.

Though the formal deadline for self-nomination is at the GA, it is recommended that you send in yours as early as possible (yes, now is a good time!) to

We look forward to your email :)

&#x1F4DD; Feedback & Numbers

Thanks to everyone who filled in the feedback form! In total, 157 attendees gave their feedback, which represents around 13% of the onsite attendees and around 11% of total attendees. One caveat when reading the results below: it’s difficult to say whether this sample was representative of all attendees as we didn’t collect demographic data.

Satisfaction with the conference

On average, attendees let us know that they were very satisfied with the conference, with a mean overall satisfaction rating of 4.3. Moreover, attendees were satisfied with most specific aspects of the conference, including the venue (mean = 4.6), food (mean = 4.0), and the social event (mean = 4.0). Prague was a particularly popular choice of location, getting a mean rating of 4.7.

We also had a look to see which of these aspects were most strongly related to overall satisfaction with the conference. Using a Spearman correlation, we found that satisfaction with the food (rs = 0.20) and the social event (rs = 0.17) had the highest relationship with overall satisfaction with the conference. However, any fellow stats nerds reading this might have noticed that these are not particularly strong relationships, likely meaning that other factors we didn’t explicitly measure are driving how much people liked the conference.

If you’re interested in seeing more of the results we got from the feedback form, we published a blog post where we deep dive into everything we found in much more detail. And we promise there will be lots of pretty graphs!

&#x1F992; Speaker’s Mentorship Programme

It was another successful year for our Speaker Mentorship Programme! Here are some key highlights from this year:

  • Each mentee had the opportunity to receive personalized feedback, support, and guidance on their talk or proposal from an experienced mentor. We successfully supported 29 mentees, most from underrepresented communities, by pairing them with 29 seasoned mentors!
  • Six mentees were given the opportunity to attend a public speaking workshop to further enhance their skills.
  • On June 3rd, we held a fantastic first-time speakers&apos workshop where attendees engaged with experienced speakers, receiving valuable advice and feedback for their presentations.

Last but not least, a huge THANK YOU to all our mentors who volunteered their time to guide mentees in submitting their proposals and delivering their talks

&#x1F40D; PyLadies day

EuroPython this year had an entire day dedicated to PyLadies events. We started with Moderni Soberana giving a workshop on how to establish boundaries and stop abusive behaviour in society. This was followed up by the PyLadies lunch, sponsored by Kraken Technologies, that had 120 allies joining us for a truly empowerment session.


The afternoon had a #IAmRemarkable workshop hosted by Lola Onipko! We also had a Meet & Greet session where beginners and experienced PyLadies shared knowledge and insights of the tech industry.

Picture by Deborah Foroni (PyLadies SP)&#x1F4AC; Python Organisers Discussion

We had +35 community members joining us to discuss how the EuroPython Society can better support Python Communities.

✍️ Community write-ups

It warms our hearts to see posts from the community about their experience and stories this year! Here are some of them, please feel free to share yours by tagging us on socials @europython or mailing us at

Anwesha Das about EuroPython 2024:

A conference that believes community matters, human values and feelings matter, and not afraid to walk the talk. And how the conference stood up to my expectations in every bit.

Keep reading here:

Grete Tungla, PyCon Estonia’s Head Organiser shares her insights from EuroPython 2024:

Jakub Cervinka shares how he was to participate in the Operations team organising EuroPython 2024:;ervinka-eusme

❤️ Thank you Volunteers & Sponsors

Year after year EuroPython shines because of the hard work of our amazing team of volunteers!

But beyond the logistics and the schedules, it&aposs your smiles, your enthusiasm, and your genuine willingness to go the extra mile that truly made EuroPython 2024 truly special. Your efforts have not only fostered a sense of belonging among first time attendees but also exemplified the power of community and collaboration that lies at the heart of this conference. (And if you check out our blog post about the post-conference feedback, you&aposll see that community was the thing people reported liking most about EuroPython this year!)

Once again, thank you for being the backbone of EuroPython, for your dedication, and for showing the world yet again why people who come for the Python language end up staying for the amazing community :)

We built a page on our website to thank everyone for their effort on making EuroPython 2024 what it was! Check it out:

And a special thank you to all of the Sponsors for all of their support!

Yay sponsors!

Special thanks to StickerApp for the awesome stickers, Evolabel for shipping, Pretalx for the partnership and Kraken Technologies for the PyLadies lunch!

&#x1F3A5; Conference Photos & Videos

The official conference photos are up on Flickr! Do not forget to tag us when you share your favourite clicks on your socials &#x1F609;.

While our team edits the conference videos, we&aposve put together a EuroPython 2024 livestream playlist with all the daily links. We hope this helps you easily find and enjoy the talks you want to catch up on Youtube.

We also have a sweet video featuring the amazing humans of EuroPython sharing why they volunteer!

&#x1F91D; Code of Conduct

Code of Conduct Transparency Report is now published on our website:

&#x1F40D;  Note from The PSF

The Python Software Foundation is proud to support EuroPython Prague 2024 with a grant in support of our mission to promote, protect, and advance the Python programming language and to support and facilitate the growth of a diverse and international community of Python programmers. We send congratulations and thanks to the organizers for their work to create a wonderful experience for the Python community!

The PSF is the non-profit charitable organization behind the Python language. We empower the Python community in a variety of ways including paying developers to work directly on CPython, PyPI, and security, hosting projects like PyLadies and Pallets, organizing PyCon US, and awarding community grants like this one. We welcome you to be a part of the PSF by signing up for PSF membership or supporting our mission and initiatives with a one-time, monthly, or annual donation. If your company uses Python and wants to support our community, you can find more information and submit a sponsor application on our website. We’re happy to answer any questions at;️ Upcoming Events in the Python Community

EuroPython might over but fret not there are a bunch of more PyCons happening

Enjoy a 30% discount for PyCon Estonia 2024 on Late Snake tickets with the code "EPSXPYCONEST24" over here:

$ pip install pyjokes
$ pyjoke
Hardware: The part of a computer that you can kick.

Categories: FLOSS Project Planets

Python Anywhere: Belated announcement of latest updates

Wed, 2024-08-21 10:03

Here is a slightly delayed (and short) run-down of the new stuff that we deployed recently.

The main change for this update is that we have updated the underlying OS running PythonAnywhere to Ubuntu 22.04. This is an LTS release so it will be supported for some time to come. This will not affect user environments, but it is setting us up for a new user environment that should be coming soon.

We have also:

  • Started the process of updating our file servers to be more robust
  • Improved our alerting so that we are alerted to many new forms of failure on PythonAnywhere
  • Made some improvements to the ASGI beta systems and their documentation
  • Fixed a number of security issues
  • Fixed various bugs
Categories: FLOSS Project Planets

Real Python: Primer on Jinja Templating

Wed, 2024-08-21 10:00

Templates are an essential ingredient in full-stack web development. With Jinja, you can build rich templates that power the front end of your Python web applications.

But you don’t need to use a web framework to experience the capabilities of Jinja. When you want to create text files with programmatic content, Jinja can help you out.

In this tutorial, you’ll learn how to:

  • Install the Jinja template engine
  • Create your first Jinja template
  • Render a Jinja template in Flask
  • Use for loops and conditional statements with Jinja
  • Nest Jinja templates
  • Modify variables in Jinja with filters
  • Use macros to add functionality to your front end

You’ll start by using Jinja on its own to cover the basics of Jinja templating. Later you’ll build a basic Flask web project with two pages and a navigation bar to leverage the full potential of Jinja.

Throughout the tutorial, you’ll build an example app that showcases some of Jinja’s wide range of features. To see what it’ll do, skip ahead to the final section.

You can also find the full source code of the web project by clicking on the link below:

Source Code: Click here to download the source code that you’ll use to explore Jinja’s capabilities.

Take the Quiz: Test your knowledge with our interactive “Primer on Jinja Templating” quiz. You’ll receive a score upon completion to help you track your learning progress:

Interactive Quiz

Primer on Jinja Templating

In this quiz, you'll test your understanding of Jinja templating. Jinja is a powerful tool for building rich templates in Python web applications, and it can also be used to create text files with programmatic content.

This tutorial is for you if you want to learn more about the Jinja template language or if you’re getting started with Flask.

Get Started With Jinja

Jinja is not only a city in the Eastern Region of Uganda and a Japanese temple, but also a template engine. You commonly use template engines for web templates that receive dynamic content from the back end and render it as a static page in the front end.

But you can use Jinja without a web framework running in the background. That’s exactly what you’ll do in this section. Specifically, you’ll install Jinja and build your first templates.

Install Jinja

Before exploring any new package, it’s a good idea to create and activate a virtual environment. That way, you’re installing any project dependencies in your project’s virtual environment instead of system-wide.

Select your operating system below and use your platform-specific command to set up a virtual environment:

Windows PowerShell PS> python -m venv venv PS> .\venv\Scripts\activate (venv) PS> Copied! Shell $ python -m venv venv $ source venv/bin/activate (venv) $ Copied!

With the above commands, you create and activate a virtual environment named venv by using Python’s built-in venv module. The parentheses (()) surrounding venv in front of the prompt indicate that you’ve successfully activated the virtual environment.

After you’ve created and activated your virtual environment, it’s time to install Jinja with pip:

Shell (venv) $ python -m pip install Jinja2 Copied!

Don’t forget the 2 at the end of the package name. Otherwise, you’ll install an old version that isn’t compatible with Python 3.

It’s worth noting that although the current major version is actually greater than 2, the package that you’ll install is nevertheless called Jinja2. You can verify that you’ve installed a modern version of Jinja by running pip list:

Shell (venv) $ python -m pip list Package Version ---------- ------- Jinja2 3.x ... Copied!

To make things even more confusing, after installing Jinja with an uppercase J, you have to import it with a lowercase j in Python. Try it out by opening the interactive Python interpreter and running the following commands:

Read the full article at »

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Real Python: Quiz: Primer on Jinja Templating

Wed, 2024-08-21 08:00

In this quiz, you’ll test your understanding of Jinja templating. Jinja is a powerful tool for building rich templates in Python web applications, and it can also be used to create text files with programmatic content.

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

PyCharm: How to Build Chatbots With LangChain

Wed, 2024-08-21 06:06

This is a guest post from Dido Grigorov, a deep learning engineer and Python programmer with 17 years of experience in the field.

Chatbots have evolved far beyond simple question-and-answer tools. With the power of large language models (LLMs), they can understand the context of conversations and generate human-like responses, making them invaluable for customer support applications and other types of virtual assistance. 

LangChain, an open-source framework, streamlines the process of building these conversational chatbots by providing tools for seamless model integration, context management, and prompt engineering.

In this blog post, we’ll explore how LangChain works and how chatbots interact with LLMs. We’ll also guide you step by step through building a context-aware chatbot that delivers accurate, relevant responses using LangChain and GPT-3.

What are the chatbots in the realm of LLMs?

Chatbots in the field of LLMs are cutting-edge software that simulate human-like conversations with users through text or voice interfaces. These chatbots exploit the advanced capabilities of LLMs, which are neural networks trained on huge amounts of text data which allows them to produce human-like responses to a wide range of input prompts.

One among all other matters is that LLM-based chatbots can take a conversation’s context into account when generating a response. This means they can keep coherence across several exchanges and can process complex queries to produce outputs that are in line with the users’ intentions. Additionally, these chatbots assess the emotional tone of a user’s input and adjust their responses to match the user’s sentiments.

Chatbots are highly adaptable and personalized. They learn from how users interact with them thus improving on their responses by adjusting them according to individual preferences and needs. 

What is LangChain?

LangChain is a framework that’s open-source developed for creating apps that use large language models (LLMs). It comes with tools and abstractions to better personalize the information produced from these models while maintaining accuracy and relevance. 

One common term you can see when you read about LLMs is “prompt chains”. A prompt chain refers to a sequence of prompts or instructions used in the context of artificial intelligence and machine learning, with the purpose to guide the AI model through a multi-step process to generate more accurate, detailed, or refined outputs. This method can be employed for various tasks, such as writing, problem-solving, or generating code. 

Developers can create new prompt chains using LangChain, which is one of the strongest sides of the framework. They can even modify existing prompt templates without needing to train the model again when using new datasets.

How does LangChain work?

LangChain is a framework designed to simplify the development of applications that utilize language models. It offers a suite of tools that help developers efficiently build and manage applications that involve natural language processing (NLP) and Large Language Models. By defining the steps needed to achieve the desired outcome (this might be a chatbot, task automation, virtual assistant, customer support, and even more), developers can adapt language models flexibly to specific business contexts using LangChain. 

Here’s a high-level overview of how LangChain works.

Model integration

LangChain supports various Language models including those from OpenAI, Hugging Face, Cohere, Anyscale, Azure Models, Databricks, Ollama, Llama, GPT4All, Spacy, Pinecone, AWS Bedrock, MistralAI, among others. Developers can easily switch between different models or use multiple models in one application. They can build custom-developed model integration solutions, which allow developers to take advantage of specific capabilities tailored to their specific applications.


The core concept of LangChain is chains, which bring together different AI components for context-aware responses. A chain represents a set of automated actions between a user prompt and the final model output. There are two types of chains provided by LangChain:

  • Sequential chains: These chains enable the output of a model or function to be used as an input for another one. This is particularly helpful in making multi-step processes that depend on each other.
  • Parallel chains: It allows for simultaneous running of multiple tasks, with their outputs merged at the end. This makes it perfect for doing tasks that can be divided into subtasks that are completely independent.

LangChain facilitates the storage and retrieval of information across various interactions. This is essential where there is need for persistence of context such as with chat-bots or interactive agents. There are also two types of memory provided:

  • Short-term memory – Helps keep track of recent sessions.
  • Long-term memory – Allows retention of information from previous sessions enhancing system recall capability on past chats and user preferences.
Tools and utilities

LangChain provides many tools, but the most used ones are Prompt Engineering, Data Loaders and Evaluators.  When it comes to Prompt Engineering, LangChain contains utilities to develop good prompts, which are very important in getting the best responses from language models.

If you want to load up files like csv, pdf or other format, Data Loaders are here to help you to load and pre-process different types of data hence making them usable in model interactions.

Evaluation is an essential part of working with machine learning models and large language models. That’s why LangChain provides Evaluators – tools used for testing language models and chains so that generated results meet the required criteria, which might include:

Datasets criteria:

  • Manually curated examples: Start with high-quality, diverse inputs.
  • Historical logs: Use real user data and feedback.
  • Synthetic data: Generate examples based on initial data.

Types of evaluations:

  • Human: Manual scoring and feedback.
  • Heuristic: Rule-based functions, both reference-free and reference-based.
  • LLM-as-judge: LLMs score outputs based on encoded criteria.
  • Pairwise: Compare two outputs to pick the better one.

Application evaluations:

  • Unit tests: Quick, heuristic-based checks.
  • Regression testing: Measure performance changes over time.
  • Back-testing: Re-run production data on new versions.
  • Online evaluation: Evaluate in real-time, often for guardrails and classifications.

LangChain agents are essentially autonomous entities that leverage LLMs to interact with users, perform tasks, and make decisions based on natural language inputs.

Action-driven agents use language models to decide on optimal actions for predefined tasks. On the other side interactive agents or interactive applications such as chatbots make use of these agents, which also take into account user input and stored memory when responding to queries.

How do chatbots work with LLMs?

LLMs underlying chatbots use Natural Language Understanding (NLU) and Natural Language Generation (NLG), which are made possible through pre-training of models on vast textual data.

Natural Language Understanding (NLU)
  • Context awareness: LLMs can understand the subtlety and allusions in a conversation, and they can keep track of the conversation from one turn to the next. This makes it possible for the chatbots to generate logical and contextually appropriate responses to the clients.
  • Intent recognition: These models should be capable of understanding the user’s intent from their queries, whether the language is very specific or quite general. They can discern what the user wants to achieve and determine the best way to help them reach that goal.
  • Sentiment analysis: Chatbots can determine the emotion of the user through the tone of language used and adapt to the user’s emotional state, which increases the engagement of the user.
Natural Language Generation (NLG)
  • Response generation: When LLMs are asked questions, the responses they provide are correct both in terms of grammar and the context. This is because the responses that are produced by these models mimic human communication, due to the training of the models on vast amounts of natural language textual data.
  • Creativity and flexibility: Apart from simple answers, LLM-based chatbots can tell a story, create a poem, or provide a detailed description of a specific technical issue and, therefore, can be considered to be very flexible in terms of the provided material.
Personalization and adaptability
  • Learning from interactions: Chatbots make the interaction personalized because they have the ability to learn from the users’ behavior, as well as from their choices. It can be said that it is constantly learning, thereby making the chatbot more effective and precise in answering questions.
  • Adaptation to different domains: The LLMs can be tuned to particular areas or specialties that allow the chatbots to perform as subject matter experts in customer relations, technical support, or the healthcare domain.

LLMs are capable of understanding and generating text in multiple languages, making them suitable for applications in diverse linguistic contexts.

Building your own chatbot with LangChain in five steps

This project aims to build a chatbot that leverages GPT-3 to search for answers within documents. First, we scrape content from online articles, split them into small chunks, compute their embeddings, and store them in Deep Lake. Then, we use a user query to retrieve the most relevant chunks from Deep Lake, which are incorporated into a prompt for generating the final answer with the LLM.

It’s important to note that using LLMs carries a risk of generating hallucinations or false information. While this may be unacceptable for many customer support scenarios, the chatbot can still be valuable for assisting operators in drafting answers that they can verify before sending to users.

Next, we’ll explore how to manage conversations with GPT-3 and provide examples to demonstrate the effectiveness of this workflow

Step 1: Project creation, prerequisites, and required library installation

First create your PyCharm project for the chatbot. Open up Pycharm and click on “new project”. Then give a name of your project.

Once ready with the project set up, generate your `OPENAI_API_KEY` on the OpenAI API Platform Website, once you are logged in (or sign up on the OpenAI website for that purpose). To do that go to the “API Keys” section on the left navigation menu and then click on the button “+Create new secret key”. Don’t forget to copy your key.

After that get your `ACTIVELOOP_TOKEN` by signing up on the Activeloop website. Once logged in, just click on the button “Create API Token” and you’ll be navigated to the token creation page. Copy this token as well.

Once you have both the token and the key, open your configuration settings in PyCharm, by clicking on the 3 dots button next to the run and debug buttons, and choose “Edit”. You should see the following window:

Now locate the field “Environment variables” and find the icon on the right side of the field. Then click there – you’ll see the following window:

And now by clicking the + button start adding your environmental variables and be careful with their names. They should be the same as mentioned above: `OPENAI_API_KEY` and `ACTIVELOOP_TOKEN`. When ready just click OK on the first window and then “Apply” and “OK” on the second one.

That’s a very big advantage of PyCharm and I very much love it, because it handles the environment variables for us automatically without the requirement for additional calls to them, allowing us to think more about the creative part of the code.

Note: ActiveLoop is a technology company that focuses on developing data infrastructure and tools for machine learning and artificial intelligence. The company aims to streamline the process of managing, storing, and processing large-scale datasets, particularly for deep learning and other AI applications.

DeepLake is an ActiveLoop’s flagship product. It provides efficient data storage, management, and access capabilities, optimized for large-scale datasets often used in AI.

Install the required libraries

We’ll use the `SeleniumURLLoader` class from LangChain, which relies on the `unstructured` and `selenium` Python libraries. Install these using pip.  It is recommended to install the latest version, although the code has been specifically tested with version 0.7.7. 

To do that use the following command in your PyCharm terminal:

pip install unstructured selenium

Now we need to install langchain, deeplake and openai. To do that just use this command in your terminal (same window you used for Selenium) and wait a bit until everything is successfully installed:

pip install langchain==0.0.208 deeplake openai==0.27.8 psutil tiktoken

To make sure all libraries are properly installed, just add the following lines needed for our chatbot app and click on the Run button:

from langchain.embeddings.openai import OpenAIEmbeddings from langchain.vectorstores import DeepLake from langchain.text_splitter import CharacterTextSplitter from langchain import OpenAI from langchain.document_loaders import SeleniumURLLoader from langchain import PromptTemplate

Another way to install your libraries is through the settings of PyCharm. Open them and go to the section Project -> Project Interpreter. Then locate the + button, search for your package and hit the button “Install Package”. Once ready, close it, and on the next window click “Apply” and then “OK”.

Step 2: Splitting content into chunks and computing their embeddings

As previously mentioned, our chatbot will “communicate” with content coming out of online articles, that’s why I picked as my source of data and selected 8 articles to start. All of them are organized into a Python list and assigned to a variable called “articles”.

articles = ['', '', '', '', '']

We load the documents from the provided URLs and split them into chunks using the `CharacterTextSplitter` with a chunk size of 1000 and no overlap:

# Use the selenium to load the documents loader = SeleniumURLLoader(urls=articles) docs_not_splitted = loader.load() # Split the documents into smaller chunks text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0) docs = text_splitter.split_documents(docs_not_splitted)

If you run the code till now you should receive the following output, if everything works well:

[Document(page_content="techcrunch\n\ntechcrunch\n\nWe, TechCrunch, are part of the Yahoo family of brandsThe sites and apps that we own and operate, including Yahoo and AOL, and our digital advertising service, Yahoo Advertising.Yahoo family of brands.\n\n    When you use our sites and apps, we use \n\nCookiesCookies (including similar technologies such as web storage) allow the operators of websites and apps to store and read information from your device. Learn more in our cookie policy.cookies to:\n\nprovide our sites and apps to you\n\nauthenticate users, apply security measures, and prevent spam and abuse, and\n\nmeasure your use of our sites and apps\n\n    If you click '", metadata={'source': ……………]

Next, we generate the embeddings using OpenAIEmbeddings and save them in a DeepLake vector store hosted in the cloud. Ideally, in a production environment, we could upload an entire website or course lesson to a DeepLake dataset, enabling searches across thousands or even millions of documents. 

By leveraging a serverless Deep Lake dataset in the cloud, applications from various locations can seamlessly access a centralized dataset without the necessity of setting up a vector store on a dedicated machine.

Why do we need embeddings and documents in chunks?

When building chatbots with Langchain, embeddings and chunking documents are essential for several reasons that relate to the efficiency, accuracy, and performance of the chatbot.

Embeddings are vector representations of text (words, sentences, paragraphs, or documents) that capture semantic meaning. They encapsulate the context and meaning of words in a numerical form. This allows the chatbot to understand and generate responses that are contextually appropriate by capturing nuances, synonyms, and relationships between words.

Thanks to the embeddings, the chatbot can also quickly identify and retrieve the most relevant responses or information from a knowledge base, because they allow matching user queries with the most semantically relevant chunks of information, even if the wording differs.

Chunking, on the other side, involves dividing large documents into smaller, manageable pieces or chunks. Smaller chunks are faster to process and analyze compared to large, monolithic documents. This results in quicker response times from the chatbot.

Document chunking helps also with the relevancy of the output, because when a user asks a question, it is often only in a specific part of a document. Chunking allows the system to pinpoint and retrieve just the relevant sections and the chatbot can provide more precise and accurate answers.

Now let’s get back to our application and let’s update the following code by including your Activeloop organization ID. Keep in mind that, by default, your organization ID is the same as your username.

# TODO: use your organization id here. (by default, org id is your username) my_activeloop_org_id = "didogrigorov" my_activeloop_dataset_name = "jetbrains_article_dataset" dataset_path = f"hub://{my_activeloop_org_id}/{my_activeloop_dataset_name}" db = DeepLake(dataset_path=dataset_path, embedding_function=embeddings) # add documents to our Deep Lake dataset db.add_documents(docs)

Another great feature of PyCharm I love is the option TODO notes to be added directly in Python comments. Once you type TODO with capital letters, all notes go to a section of PyCharm where you can see them all:

# TODO: use your organization id here. (by default, org id is your username)

You can click on them and PyCharm directly shows you where they are in your code. I find it very convenient for developers and use it all the time:

If you execute the code till now you should see the following output, if everything works normal:

To find the most similar chunks to a given query, we can utilize the similarity_search method provided by the Deep Lake vector store:

# Check the top relevant documents to a specific query query = "how to check disk usage in linux?" docs = db.similarity_search(query) print(docs[0].page_content) Step 3: Let’s build the prompt for GPT-3

We will design a prompt template that integrates role-prompting, pertinent Knowledge Base data, and the user’s inquiry. This template establishes the chatbot’s persona as an outstanding customer support agent. It accepts two input variables: chunks_formatted, containing the pre-formatted excerpts from articles, and query, representing the customer’s question. The goal is to produce a precise response solely based on the given chunks, avoiding any fabricated or incorrect information.

Step 4: Building the chatbot functionality

To generate a response, we begin by retrieving the top-k (e.g., top-3) chunks that are most similar to the user’s query. These chunks are then formatted into a prompt, which is sent to the GPT-3 model with a temperature setting of 0.

# user question query = "How to check disk usage in linux?" # retrieve relevant chunks docs = db.similarity_search(query) retrieved_chunks = [doc.page_content for doc in docs] # format the prompt chunks_formatted = "\n\n".join(retrieved_chunks) prompt_formatted = prompt.format(chunks_formatted=chunks_formatted, query=query) # generate answer llm = OpenAI(model="gpt-3.5-turbo-instruct", temperature=0) answer = llm(prompt_formatted) print(answer)

If everything works fine, your output should be:

To upload a PDF to ChatGPT, first log into the website and click the paperclip icon next to the text input field. Then, select the PDF from your local hard drive, Google Drive, or Microsoft OneDrive. Once attached, type your query or question into the prompt field and click the upload button. Give the system time to analyze the PDF and provide you with a response.

Step 5: Build conversational history # Create conversational memory memory = ConversationBufferMemory(memory_key="chat_history", input_key="input") # Define a prompt template that includes memory template = """You are an exceptional customer support chatbot that gently answers questions. {chat_history} You know the following context information. {chunks_formatted} Answer the following question from a customer. Use only information from the previous context information. Do not invent stuff. Question: {input} Answer:""" prompt = PromptTemplate( input_variables=["chat_history", "chunks_formatted", "input"], template=template, ) # Initialize the OpenAI model llm = OpenAI(openai_api_key="YOUR API KEY", model="gpt-3.5-turbo-instruct", temperature=0) # Create the LLMChain with memory chain = LLMChain( llm=llm, prompt=prompt, memory=memory ) # User query query = "What was the 5th point about on the question how to remove spotify account?" # Retrieve relevant chunks docs = db.similarity_search(query) retrieved_chunks = [doc.page_content for doc in docs] # Format the chunks for the prompt chunks_formatted = "\n\n".join(retrieved_chunks) # Prepare the input for the chain input_data = { "input": query, "chunks_formatted": chunks_formatted, "chat_history": memory.buffer } # Simulate a conversation response = chain.predict(**input_data) print(response)

Let’s walk through the code in a more conversational manner.

To start with, we set up a conversational memory using `ConversationBufferMemory`. This allows our chatbot to remember the ongoing chat history, using `input_key=”input”` to manage the incoming user inputs.

Next, we design a prompt template. This template is like a script for the chatbot, including sections for chat history, the chunks of information we’ve gathered, and the current user question (input). This structure helps the chatbot know exactly what context it has and what question it needs to answer.

Then, we move on to initializing our language model chain, or `LLMChain`. Think of this as assembling the components: we take our prompt template, the language model, and the memory we set up earlier, and combine them into a single workflow.

When it’s time to handle a user query, we prepare the input. This involves creating a dictionary that includes the user’s question (`input`) and the relevant information chunks (`chunks_formatted`). This setup ensures that the chatbot has all the details it needs to craft a well-informed response.

Finally, we generate a response. We call the `chain.predict` method, passing in our prepared input data. The method processes this input through the workflow we’ve built, and out comes the chatbot’s answer, which we then display.

This approach allows our chatbot to maintain a smooth, informed conversation, remembering past interactions and providing relevant answers based on the context.

Another favorite trick with PyCharm that helped me a lot to build this functionality was the opportunity to put my cursor over a method, to hit the key “CTRL” and click on it.

In conclusion

GPT-3 excels at creating conversational chatbots capable of answering specific questions based on contextual information provided in the prompt. However, ensuring the model generates answers solely based on this context can be challenging, as it often tends to hallucinate (i.e., generate new, potentially false information). The impact of such false information varies depending on the use case.

In summary, we developed a context-aware question-answering system using LangChain, following the provided code and strategies. The process included splitting documents into chunks, computing their embeddings, implementing a retriever to find similar chunks, crafting a prompt for GPT-3, and using the GPT-3 model for text generation. This approach showcases the potential of leveraging GPT-3 to create powerful and contextually accurate chatbots while also emphasizing the importance of being vigilant about the risk of generating false information.

About the author Dido Grigorov

Dido is a seasoned Deep Learning Engineer and Python programmer with an impressive 17 years of experience in the field. He is currently pursuing advanced studies at the prestigious Stanford University, where he is enrolled in a cutting-edge AI program, led by renowned experts such as Andrew Ng, Christopher Manning, Fei-Fei Li and Chelsea Finn, providing Dido with unparalleled insights and mentorship.

Dido’s passion for Artificial Intelligence is evident in his dedication to both work and experimentation. Over the years, he has developed a deep expertise in designing, implementing, and optimizing machine learning models. His proficiency in Python has enabled him to tackle complex problems and contribute to innovative AI solutions across various domains.

Categories: FLOSS Project Planets

Armin Ronacher: Rye and uv: August is Harvest Season for Python Packaging

Tue, 2024-08-20 20:00

It has been a few months since I wrote about Rye here last. You might remember that in February I passed over stewardship of my Rye packaging too to Astral. The folks over there have been super busy in building a lot of amazing tooling for Python packaging in the last few months. If you have been using Rye in the last few months you will have noticed that the underlying resolver and installer uv got a lot better and faster.

As of the most recent release, uv also gained a lot of functionality that previously required Rye such as manipulating pyproject.toml files, workspace support, local package references and script installation. It now also can manage Python installations for you so it's getting much closer.

If you are using Rye today, consider this blog post as a reminder that you should probably starting having a closer look at uv and give feedback to the Astral folks.

I gave a talk just recently in Prague at EuroPython about my current view of the Python packaging, the lessons I learned when creating Rye and one of the things I mentioned there is that the goal of a packaging tool has to be that it will dominate the space. The tool that absolutely everybody uses has to be the best tool: it's the thing any new person to Python gets to see when they start their programming journey. After that talk a lot of people walked up to me and had a lot of questions about that in particular.

Python in the last two years has become an incredibly hot and popular platform for many new developers. That has in part been fueled by all the investments and interest that went into AI and ML. I really want everybody who gets to learn and experience Python not to remember it as an old language with bad tooling, but as an amazing language with a stellar developer experience. Unfortunately that's not the case today because there is so much choice, so many tools that are not quite compatible, and by the inconsistency everywhere. I have seen people walk down one tool, just to re-emerge moving their entire stack to conda and back because they hit some wall.

Domination is a goal because it means that most investment will go into one stack. I can only re-iterate my wish and desire that Rye (and with it a lot of other tools in the space) should cease to exist once the dominating tool has been established. For me uv is poised to be that tool. It's not quite there today yet for all cases, but it will be in no time, and now is the moment to step up as a community and start to start to rally around it. That doesn't mean that this tool will be the tool forever. Things come and go and maybe there is a future for some other tool.

But today I'm looking forward to the moment when there will be a final release of Rye that is no remaining functionality other than to just largely alias to uv, that retires Rye specific functionality and migrates you over to uv.

However I only have the power to retire one tool, and that won't be enough. Today we are using so many other package managing solutions for Python and we should be advertising fewer. I understand how much time and effort went into many of those, and everybody's contributions are absolutely appreciated. Software like Rye and uv were built on the advancements of the ecosystem underneath it. They leverage years and years of work that went into migrating the Python ecosystems from files to eggs and finally wheels. From not having a metadata standard to having one. From coupled to decoupled build systems. Much of what makes Rye so enjoyable were individuals that worked towards making redistributable and downloadable Python binaries a possibility. There was a lot of work that was put into building out an amazing ecosystem of Rust crates and Python libraries needed to make these tools work. All of that brought us to that point where we are today.

But it is my believe that we need to take the next step and be willing to say as a community that some tools are no longer recommended. Maybe not today, but that moment will come quicker than we think. I remember a time when many of us who maintained Python libraries pointed new developers to using and easy_install in our onboarding guides. Years later we removed the mentions of from our guides to replace them with pip. Some of us have pointed developers at pip-tools, at poetry or PDM. Many projects today even show 5 different installation guides because of that wild variety of tools available because they no longer feel like they can recommend one.

If you maintain an important Python project I would ask you to give uv a try and ask yourself if you would consider pointing people towards it. I think that this is our best shot in the community at finding ourselves in a much better position than we have ever been.

Have a look at the blog post that Charlie from Astral wrote about what uv can do today. It's a true accomplishment worth celebrating and enjoying.

Postscriptum: there is an elephant in the room which is that Astral is a VC funded company. What does that mean for the future of these tools? Here is my take on this: for the community having someone pour money into it can create some challenges. For the PSF and the core Python project this is something that should be considered. However having seen the code and what uv is doing, even in the worst possible future this is a very forkable and maintainable thing. I believe that even in case Astral shuts down or were to do something incredibly dodgy licensing wise, the community would be better off than before uv existed.

Categories: FLOSS Project Planets

Trey Hunner: 10-Week Hands-On Python Course

Tue, 2024-08-20 17:20

Ever wished you could take an Intro to Python training with me, but you don’t work for a company with a generous training budget? I’m running a Python-learning program just for this situation.

Python High Five is a 10-week Python jumpstart program that starts this September.

Set aside the time to learn ⌚

One of the biggest problems for folks starting to learn Python is setting aside the time. And even if you do manage to set aside the time, you’ll often hit a roadblock where you feel confused.

Python High Five is a way to keep a daily learning habit and to get help when find yourself stuck.

This program is based around daily practice. Monday through Friday you’ll pick 30 minutes from your schedule, at any time that works you. During those 30 minutes, you’ll watch a 5 minute video, work on the day’s exercise, and reflect on your progress.

The most effective learning is hands-on 🖐️

Python High Five is all about learning through writing Python code. Each week we’ll dive deeper into Python, building upon what we’ve learned so far.

When you find yourself stuck you can get help through an asynchronous group chat and weekly office hour sessions. In addition to our weekly office hours together, I’ll check the chat each day, respond to questions, and provide guidance.

Proven learning techniques behind the scenes 📝

The daily check-ins allow for daily accountability. The group chat also provides both a community of peers to rely on, and guidance from an experienced Python trainer (me).

We’ll also be using proven learning techniques behind the scenes:

  • Retrieval practice: you don’t learn by putting information into your head, but by trying to take it out; for Python learning, that means writing code.
  • Spaced repetition: cramming is less effective than learning spaced out over time, which is why we’ll spend 30 minutes each weekday instead of spending a few hours every week.
  • Interleaving: each day’s exercise isn’t predictably themed because a bit of unpredictability can be really improve learning outcomes.
  • Elaboration: your daily check-in isn’t just about reflection: it’s also a helpful learning tool!

Plus, we’ll be working through curriculum I’ve been developing and iterating on for many years. I have taught these topics in many different settings to folks from many different backgrounds.

Form a daily learning habit 🔁

Any 10-week program will be just the start of a Python learning habit. You’ll need to keep up your Python after Python High Five ends, either by promptly applying your skills to a new project or diving deeper into Python with continued daily practice.

That’s why I’m offering an 80% discount for High Five attendees on one year of Python Morsels, which is my skill-building service designed to help deepen your Python skills every week. You can see more details on that here.

Ready to start your Python journey? ⛰️

Are you ready to start your Python journey with a solid foundation?

Read more about Python High Five and decide whether this is for you.

Keep in mind that while the program begins on September 9, enrollment closes on August 31. So check the FAQs and if you have additional questions, be sure to email me soon!

Categories: FLOSS Project Planets
