Planet Python

Subscribe to Planet Python feed
Planet Python - http://planetpython.org/
Updated: 16 hours 37 min ago

Matt Layman: Post-launch Punchlist - Building SaaS with Python and Django #186

Wed, 2024-03-20 20:00
In this episode, we had a bunch of issues to resolve post-launch. I set the code that causes trials to expire, made updates to who receives prompt emails, and added some polish to the sign up process and interface to make it clear what will happen in the flow. After those modifications, we worked through a set of smaller changes like setting up Dependabot and adding a missing database index.
Categories: FLOSS Project Planets

Python⇒Speed: The wrong way to speed up your code with Numba

Wed, 2024-03-20 20:00

If your NumPy-based code is too slow, you can sometimes use Numba to speed it up. Numba is a compiled language that uses the same syntax as Python, and it compiles at runtime, so it’s very easy to write. And because it re-implements a large part of the NumPy APIs, it can also easily be used with existing NumPy-based code.

However, Numba’s NumPy support can be a trap: it can lead you to missing huge optimization opportunities by sticking to NumPy-style code. So in this article we’ll show an example of:

  • The wrong way to use Numba, writing NumPy-style full array transforms.
  • The right way to use Numba, namely for loops.
Read more...
Categories: FLOSS Project Planets

EuroPython: EuroPython 2024: Community Voting is now live! Go Vote!

Wed, 2024-03-20 18:00

Hey hey,

With 110 days remaining until the big day, the EuroPython programme team is working full steam ahead to put together a power-packed schedule. And what *YOU* want to see at the conference is our guiding light in the process.

With that, we are excited to announce the EuroPython 2024 Community Voting: https://ep2024.europython.eu/voting 🎉

All past EuroPython attendees between 2015-2024 & prospective speakers from this year are eligible to vote.

You can help us spread the word by forwarding this email to your fellow EuroPython friends.

The more votes we have, the better informed decisions the programme team can make!

Head over to https://ep2024.europython.eu/voting to make your voice heard!

Thank you for your continued support,
EuroPython 2024 Organisers

Categories: FLOSS Project Planets

Python Software Foundation: Announcing a PyPI Support Specialist

Wed, 2024-03-20 11:08

We launched the Python Package Index (PyPI) in 2003 and for most of its history a robust and dedicated volunteer community kept it running. Eventually, we put a bit of PSF staff time into the maintenance of the Index, and last year with support from AWS we hired Mike Fiedler to work full-time on PyPI’s urgent security needs.

PyPI has grown enormously in the last 20+ years, and in recent years it has reached a truly massive scale with growth only continuing upward. In 2022 alone, PyPI saw a 57% growth and as of this writing, there are over a half a million packages on PyPI. The impact PyPI has these days is pretty breathtaking. Running a free public service of that size has come with challenges, too. As PyPI has grown, the work of communicating with users and solving account issues here has grown in tandem and out-stripped our current volunteer plus one tenth of a staff person capacity. We also know that some community members have noticed and expressed frustration with the time-frame that goes with tasks that don't have sufficient staffing.

Much of this work is sensitive and complex such that it needs to be performed by a PSF staff person. It involves personal information and verification processes to make sure we’re giving access and names to the correct entities. Work like this needs to be done by a person who is here day after day to carry out multi-step verification procedures and is accountable to the PSF. 

We are very happy to share the news that we are hiring a person to help us manage the increased capacity and allow us to keep pace with PyPI’s seemingly unstoppable growth. This is an associate role that is 100% remote. Please take a look at this posting for a PyPI Support Specialist and share it with your networks.

Categories: FLOSS Project Planets

Real Python: Build a Python Turtle Game: Space Invaders Clone

Wed, 2024-03-20 10:00

In this tutorial, you’ll use Python’s turtle module to build a Space Invaders clone. The game Space Invaders doesn’t need any introduction. The original game was released in 1978 and is one of the most recognized video games of all time. It undeniably defined its own video game genre. In this tutorial, you’ll create a basic clone of this game.

The turtle module you’ll use to build the game is part of Python’s standard library, and it enables you to draw and move sprites on the screen. The turtle module is not a game-development package, but it gives instructions about creating a turtle game, which will help you understand how video games are built.

In this tutorial, you’ll learn how to:

  • Design and build a classic video game
  • Use the turtle module to create animated sprites
  • Add user interaction in a graphics-based program
  • Create a game loop to control each frame of the game
  • Use functions to represent key actions in the game

This tutorial is ideal for anyone who is familiar with the core Python topics and wants to use them to build a classic video game from scratch. You don’t need to be familiar with the turtle module to work through this tutorial. You can download the code for each step by clicking on the link below:

Get Your Code: Click here to download the free sample code that shows you how to build a Python turtle game.

In the next section, you can have a look at the version of the game you’ll build as you follow the steps outlined in this tutorial.

Demo: A Python Turtle Space Invaders Game

You’ll build a simplified version of the classic Space Invaders game and control the laser cannon with the keys on your keyboard. You’ll shoot lasers from the cannon by pressing the spacebar, and aliens will appear at regular intervals at the top of the screen and move downwards. Your task is to shoot the aliens before they reach the bottom of the screen. The game ends when one alien reaches the bottom.

This is what your turtle game will look like when you complete this tutorial:

Here you can see the main game play for this game, as the laser cannon moves back and forth and shoots the falling aliens. The game also displays the elapsed time and the number of aliens shot down on the screen.

Project Overview

In this project, you’ll start by creating the screen that will contain the game. In each step, you’ll create game components such as the laser cannon, lasers, and aliens, and you’ll add the features required to make a functioning game.

To create this turtle game, you’ll work through the following steps:

  1. Create the game screen and the laser cannon
  2. Move the cannon left and right using keys
  3. Shoot lasers with the spacebar
  4. Create aliens and move them towards the bottom of the screen
  5. Determine when a laser hits an alien
  6. End the game when an alien reaches the bottom
  7. Add a timer and a score
  8. Improve the cannon’s movement to make the game smoother
  9. Set the game’s frame rate

You’ll start with a blank screen, and then see the game come to life one feature at a time as you work through each step in this tutorial.

Prerequisites

To complete this tutorial, you should be comfortable with the following concepts:

You don’t need to be familiar with Python’s turtle to start this tutorial. However, you can read an overview of the turtle module to find out more about the basics.

If you don’t have all of the prerequisite knowledge before you start, that’s okay! In fact, you might learn more by going ahead and getting started! You can always stop and review the resources linked here if you get stuck.

Step 1: Set Up the Turtle Game With a Screen and a Laser Cannon

You can’t have a game without a screen where all the action happens. So, the first step is to create a blank screen. Then, you can add sprites to represent the items in the game. In this project, you can run your code at any point to see the game in its current state.

You can download the code as it’ll look at the end of this step from the folder named source_code_step_1/ in the link below:

Read the full article at https://realpython.com/build-python-turtle-game-space-invaders-clone/ »

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Python Morsels: Every dunder method in Python

Tue, 2024-03-19 17:30

An explanation of all of Python's 100+ dunder methods and 50+ dunder attributes, including a summary of each one.

Table of contents

  1. The 3 essential dunder methods 🔑
  2. Equality and hashability 🟰
  3. Orderability ⚖️
  4. Type conversions and string formatting ⚗️
  5. Context managers 🚪
  6. Containers and collections 🗃️
  7. Callability ☎️
  8. Arithmetic operators ➗
  9. In-place arithmetic operations ♻️
  10. Built-in math functions 🧮
  11. Attribute access 📜
  12. Metaprogramming 🪄
  13. Descriptors 🏷️
  14. Buffers 💾
  15. Asynchronous operations 🤹
  16. Construction and finalizing 🏭
  17. Library-specific dunder methods 🧰
  18. Dunder attributes 📇
  19. Every dunder method: a cheat sheet ⭐

The 3 essential dunder methods 🔑

There are 3 dunder methods that most classes should have: __init__, __repr__, and __eq__.

Operation Dunder Method Call Returns T(a, b=3) T.__init__(x, a, b=3) None repr(x) x.__repr__() str x == y x.__eq__(y) Typically bool

The __init__ method is the initializer (not to be confused with the constructor), the __repr__ method customizes an object's string representation, and the __eq__ method customizes what it means for objects to be equal to one another.

The __repr__ method is particularly helpful at the the Python REPL and when debugging.

Equality and hashability 🟰

In addition to the __eq__ …

Read the full article: https://www.pythonmorsels.com/every-dunder-method/
Categories: FLOSS Project Planets

Python Insider: Python 3.10.14, 3.9.19, and 3.8.19 is now available

Tue, 2024-03-19 16:39

Howdy!
Those are the boring security releases that aren’t supposed to bring anything new. But not this time! We do have a bit of news, actually. But first things first: go update your systems!

Python 3.10.14

Get it here: Python Release Python 3.10.14

26 commits since the last release.

Python 3.9.19

Get it here: Python Release Python 3.9.19

26 commits since the last release.

Python 3.8.19

Get it here: Python Release Python 3.8.19

28 commits since the last release.

Security content in this release
  • gh-115399 & gh-115398: bundled libexpat was updated to 2.6.0 to address CVE-2023-52425, and control of the new reparse deferral functionality was exposed with new APIs. Thanks to Sebastian Pipping, the maintainer of libexpat, who worked with us directly on incorporating those fixes!
  • gh-109858: zipfile is now protected from the “quoted-overlap” zipbomb to address CVE-2024-0450. It now raises BadZipFile when attempting to read an entry that overlaps with another entry or central directory
  • gh-91133: tempfile.TemporaryDirectory cleanup no longer dereferences symlinks when working around file system permission errors to address CVE-2023-6597
  • gh-115197: urllib.request no longer resolves the hostname before checking it against the system’s proxy bypass list on macOS and Windows
  • gh-81194: a crash in socket.if_indextoname() with a specific value (UINT_MAX) was fixed. Relatedly, an integer overflow in socket.if_indextoname() on 64-bit non-Windows platforms was fixed
  • gh-113659: .pth files with names starting with a dot or containing the hidden file attribute are now skipped
  • gh-102388: iso2022_jp_3 and iso2022_jp_2004 codecs no longer read out of bounds
  • gh-114572: ssl.SSLContext.cert_store_stats() and ssl.SSLContext.get_ca_certs() now correctly lock access to the certificate store, when the ssl.SSLContext is shared across multiple threads
Stay safe and upgrade!

Upgrading is highly recommended to all users of affected versions.

Source builds are moving to GitHub Actions

It’s not something you will notice when downloading, but 3.10.14 here is the first release we’ve done where the source artifacts were built on GHA and not on a local computer of one of the release managers. We have the Security Developer in Residence @sethmlarson to thank for that!

It’s a big deal since public builds allow for easier auditing and repeatability. It also helps with the so-called bus factor. In fact, to test this out, this build of 3.10.14 was triggered by me and not Pablo, who would usually release Python 3.10.

The artifacts are later still signed by the respective release manager, ensuring integrity when put on the downloads server.

Python now manages its own CVEs

The security releases you’re looking at are the first after the PSF became a CVE Numbering Authority. That’s also thanks to @sethmlarson. What being our own CNA allows us is to ensure the quality of the vulnerability reports is high, and that the severity estimates are accurate. Seth summarized it best in his announcement here.

What this also allows us to do is to combine announcement of CVEs with the release of patched versions of Python. This is in fact the case with two of the CVEs listed above (CVE-2023-6597 and CVE-2024-0450). And since Seth is now traveling, this announcement duty was fulfilled by the PSF’s Director of Infrastructure @EWDurbin. Thanks!

I’m happy to see us successfully testing bus factor resilience on multiple fronts with this round of releases.

Thank you for your support

Thanks to all of the many volunteers who help make Python Development and these releases possible! Please consider supporting our efforts by volunteering yourself or through organization contributions to the Python Software Foundation.

Python.org - the official home of the Python Programming Language.


Łukasz Langa @ambv
on behalf of your friendly release team,

Ned Deily @nad
Steve Dower @steve.dower
Pablo Galindo Salgado @pablogsal
Łukasz Langa @ambv
Thomas Wouters @thomas

Categories: FLOSS Project Planets

PyCoder’s Weekly: Issue #621 (March 19, 2024)

Tue, 2024-03-19 15:30

#621 – MARCH 19, 2024
View in Browser »

Visualizing Data in Python With Seaborn

In this tutorial, you’ll learn how to use the Python seaborn library to produce statistical data analysis plots to allow you to better visualize your data. You’ll learn how to use both its traditional classic interface and more modern objects interface.
REAL PYTHON

Does Python Have Pointers?

Depending on how you’re using the term “pointer” changes the answer to the question. Read on to better understand the programming terminology and whether Python has pointers.
NED BATCHELDER

GPT Pilot Is an OSS Dev Tool That Builds Apps From Scratch by Talking to You

GPT Pilot is a collection of AI agents that automate developer workflows to try offloading 95% of coding tasks from you to the AI. Right now, it can build apps up to ~3000 lines of code see examples here →
PYTHAGORA TECHNOLOGIES INC. sponsor

The Python Memory Model

This article introduces you to how Python stores things in memory. Learn about the heap, the stack, and how the interpreter sees Python objects.
PEPIJN BAKKER

Python 3.13.0 Alpha 5 Released

CPYTHON DEV BLOG

New Malware Reporting Tool on PyPI

PYPI

Articles & Tutorials Build an LLM RAG Chatbot With LangChain

Large language models (LLMs) have taken the world by storm, demonstrating unprecedented capabilities in natural language tasks. In this step-by-step tutorial, you’ll leverage LLMs to build your own retrieval-augmented generation (RAG) chatbot using synthetic data with LangChain and Neo4j.
REAL PYTHON

Use Weird Tests to Capture Tacit Knowledge

Sometimes adding code in one place means configuration elsewhere also needs to be updated. One way of ensuring this is happening properly in a large project is to use unit tests. This post covers a few examples, complete with pytest code.
JUSTIN DUKE

Blocked by Slow Code Reviews? Here’s How to Stop Waiting

Code reviews are great - but they shouldn’t slow down your development. Sourcery can automatically review every one of your PR’s so your team can keep moving fast →
SOURCERY sponsor

Insecurity and Python Pickles

The pickle module allows you to serialize arbitrary Python objects. Serializing them back means executing code, which has potential security issues. Read on to discover what they are and what software may be impacted.
DAROC ALDEN

Python Basics Exercises: Installing Packages With pip

In this Python Basics Exercises video course, you’ll practice installing packages with pip. You’ll also practice creating virtual environments, making lists of requirements, and recreating a development environment.
REAL PYTHON course

How to Create a Dashboard in Python From PostgreSQL

Accessing a database in a terminal is not the best solution for everyone. Mljar lets you build a dashboard from scratch using only Python.
MLJAR • Shared by Piotr

Regex Character “$” Doesn’t Mean “End-of-String”

Regular expression syntax is only somewhat uniform across programming languages. Seth ran into a surprise with “$” and Python.
SETH LARSON

Feature Flags Are Ruining Your Codebase

An opinion piece on feature flags and the dangers of letting PMs control them. Includes suggestions on what to do instead.
ANTON ZAIDES

The 2038 Problem

Learn how “The 2038 problem” could impact software, hardware, and more - and what can be done to prepare.
CODE RELIANT

Add Magic Link Sign-in Using Django

This is a step-by-step guide to adding email sign-in (and verification) to Django using Gmail and others.
PHOTON DESIGNER

Email Testing With Python’s smtpd Module

This post dives deep into Python’s smtpd module and explores how it can used for local testing.
MUHAMMAD

Projects & Code zakuchess: Chess Challenge Game

GITHUB.COM/OLIVIERPHI

mountaineer: Web Framework for Python and React

GITHUB.COM/PIERCEFREEMAN

magika: Detect File Content Types With Deep Learning

GITHUB.COM/GOOGLE

hyperdiv: Build Reactive Web UIs in Python

GITHUB.COM/HYPERDIV

UFO: A UI-Focused Agent for Windows OS Interaction

GITHUB.COM/MICROSOFT

Events Weekly Real Python Office Hours Q&A (Virtual)

March 20, 2024
REALPYTHON.COM

PyData Bristol Meetup

March 21, 2024
MEETUP.COM

PyLadies Dublin

March 21, 2024
PYLADIES.COM

Python Barcamp Karlsruhe

March 23 to March 25, 2024
BARCAMPS.EU

PyDelhi User Group Meetup

March 23, 2024
MEETUP.COM

Happy Pythoning!
This was PyCoder’s Weekly Issue #621.
View in Browser »

[ Subscribe to 🐍 PyCoder’s Weekly 💌 – Get the best Python news, articles, and tutorials delivered to your inbox once a week >> Click here to learn more ]

Categories: FLOSS Project Planets

Real Python: SQLite and SQLAlchemy in Python: Move Your Data Beyond Flat Files

Tue, 2024-03-19 10:00

All programs process data in one form or another, and many need to be able to save and retrieve that data from one invocation to the next. Python, SQLite, and SQLAlchemy give your programs database functionality, allowing you to store data in a single file without the need for a database server.

You can achieve similar results using flat files in any number of formats, including CSV, JSON, XML, and even custom formats. Flat files are often human-readable text files—though they can also be binary data—with a structure that can be parsed by a computer program. You’ll explore using SQL databases and flat files for data storage and manipulation and learn how to decide which approach is right for your program.

In this video course, you’ll learn how to use:

  • Flat files for data storage
  • SQL to improve access to persistent data
  • SQLite for data storage
  • SQLAlchemy to work with data as Python objects

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Robin Wilson: One reason for getting a ‘No HTTP triggers found’ error when using Azure Functions with Python V2 programming model

Tue, 2024-03-19 06:38

Summary: It might be because there is an exception raised when importing your function_app.py – for example, caused by one of your import statements raising an exception, or a parsing error caused by a syntax error.

I was deploying a FastAPI app to Azure Functions recently. Azure Function is the equivalent of AWS Lambda – it provides a way to run serverless functions.

Since I’d last used Azure Functions, Microsoft have introduced the Azure Functions Python V2 programming model which makes it easier and cleaner to do a number of common tasks, such as hooking up a FastAPI app to run on Functions.

However, it also led to an error that I hadn’t seen before, and that I couldn’t find documented very well online.

Specifically, I was getting an error at the end of my function deployment saying No HTTP triggers found. I was confused by this because I had followed the documented pattern for setting up a FastAPI app. For reference, my function_app.py file looked a bit like this:

import azure.functions as func from complex_fastapi_app import app app = func.AsgiFunctionApp(app=app, http_auth_level=func.AuthLevel.ANONYMOUS)

This was exactly as documented. But I kept getting this error – why?

I replaced the import of my complex_fastapi_app with a basic FastAPI app defined in function_app.py, this time copied directly from the documentation:

import azure.functions as func from fastapi import FastAPI, Request, Response fast_app = FastAPI() @fast_app.get("/return_http_no_body") async def return_http_no_body(): return Response(content="", media_type="text/plain") app = func.AsgiFunctionApp(app=fast_app, http_auth_level=func.AuthLevel.ANONYMOUS)

Everything worked fine now and I didn’t get the error.

After a lot of debugging, it turns out that if there is an exception raised when importing your function_app.py file then Functions won’t be able to establish what HTTP triggers you have, and will give this error.

In this case, I was getting an exception raised when I imported my complex_fastapi_app, and that stopped the whole file being processed. Unfortunately I couldn’t find anywhere that this error was actually being reported to me – I must admit that I find Azure logging/error reporting systems very opaque. I assume it would have been reported somewhere – if anyone reading this can point me to the right place then that’d be great!

I’m sure there are many other reasons that this error can occur, but this was one that I hadn’t found documented online – so hopefully this can be useful to someone.

Categories: FLOSS Project Planets

The Python Coding Blog: The Python Coding Book is Out in Paperback and EBook

Tue, 2024-03-19 04:24

The Python Coding Book is out—I published the First Edition in paperback and EBook, which is a revised version of the “Zeroth” Edition which you’ve been able to read here on this site for a while—just ask Google for a “python book” and it will recommend this one as one of it’s top entries!

Read on to see the table of contents, the back-cover blurb introducing the textbook, and testimonials from you, the readers!

Buy Paperback or EBook

The Python Coding Book • A relaxed and friendly programming textbook for beginners

Table of Contents

First Edition, 368 pages

  • 0 Preface • How to Learn to Code
  • Before You Start • Downloading Python and an IDE
  • 1 Getting Started: Your First Project
  • 2 Loops, Lists, and More Fundamentals
  • 3 Power-up Your Coding: Create Your Own Functions
  • 4 Data, Data Types, and Data Structures
  • Monty and The White Room: Understanding Programming
  • 5 Dealing With Errors and Bugs
  • 6 Functions Revisited
  • 7 Object-Oriented Programming
Blurb

Imagine a programming book that feels like a conversation with a friend who’s here to show you the ropes—that’s The Python Coding Book by Stephen Gruppetta. This isn’t a dry textbook. It’s a warm, engaging guide into the world of Python programming, designed with beginners in mind.

With an approach that emphasises clarity and the joy of learning, Stephen guides you through the core concepts of programming, breaking down the barriers that make coding seem inaccessible to many. This book is built on the premise that to truly grasp programming, you need to understand the ‘why’ just as much as the ‘how’. Through engaging explanations, thoughtful analogies, and practical projects, you’re not just learning to code—you’re learning to think and solve problems like a programmer.

Forget about overwhelming details and rapid leaps in complexity. The Python Coding Book introduces concepts at a pace that ensures comprehension, building a solid foundation that instils both knowledge and confidence.

Are you ready for a rewarding journey? The Python Coding Book is more than a book—it’s your first step towards mastering programming with Python. This book is an invitation to not only learn Python but to fall in love with coding.

Testimonials

“It’s the first time I’m understanding what everything does.”

“Your writing is succinct, easy to understand, and process oriented, which I really appreciate. I’m starting to realise that my first experiences with programming weren’t at all representative of my abilities to problem solve or structure my thinking. It has been a great confidence booster for me, and I’m sure other folks are also realising that they were never really the problem. It was the lack of resources or inaccessible information that was the issue. Thanks again for this wonderful resource.”

“The clarity of your writing has helped me understand Python at a deeper level.”

“Thank you for this great resource. I believe this book is the most comprehensive way to understand the material, going beyond the mere memorisation of code snippets or syntax. I recommend it to everyone I talk to who is looking for a Python resource for getting started and who wants to really understand what they are doing at a deeper level.”

Buy Paperback or EBook

The post The Python Coding Book is Out in Paperback and EBook appeared first on The Python Coding Book.

Categories: FLOSS Project Planets

Python Bytes: #375 Pointing at Countries

Tue, 2024-03-19 04:00
<strong>Topics covered in this episode:</strong><br> <ul> <li><a href="https://github.com/pycountry/pycountry">pycountry</a></li> <li><a href="https://nedbatchelder.com/blog/202403/does_python_have_pointers.html"><strong>Does Python have pointers?</strong></a></li> <li><a href="https://bruin-data.github.io/ingestr/">ingestr</a></li> <li><a href="https://davidism.com/starship-and-fish/"><strong>Make your terminal nice</strong></a></li> <li><strong>Extras</strong></li> <li><strong>Joke</strong></li> </ul><a href='https://www.youtube.com/watch?v=fFGXqNXnQR8' style='font-weight: bold;'data-umami-event="Livestream-Past" data-umami-event-episode="375">Watch on YouTube</a><br> <p><strong>About the show</strong></p> <p>Sponsored by ScoutAPM: <a href="https://pythonbytes.fm/scout"><strong>pythonbytes.fm/scout</strong></a></p> <p><strong>Connect with the hosts</strong></p> <ul> <li>Michael: <a href="https://fosstodon.org/@mkennedy"><strong>@mkennedy@fosstodon.org</strong></a></li> <li>Brian: <a href="https://fosstodon.org/@brianokken"><strong>@brianokken@fosstodon.org</strong></a></li> <li>Show: <a href="https://fosstodon.org/@pythonbytes"><strong>@pythonbytes@fosstodon.org</strong></a></li> </ul> <p>Join us on YouTube at <a href="https://pythonbytes.fm/stream/live"><strong>pythonbytes.fm/live</strong></a> to be part of the audience. Usually Tuesdays at 11am PT. Older video versions available there too.</p> <p><strong>Michael #1:</strong> <a href="https://github.com/pycountry/pycountry">pycountry</a></p> <ul> <li>A Python library to access ISO country, subdivision, language, currency and script definitions and their translations.</li> <li>pycountry provides the ISO databases for the standards: <ul> <li><a href="https://en.wikipedia.org/wiki/ISO_639-3">639-3</a> Languages</li> <li><a href="https://en.wikipedia.org/wiki/ISO_3166">3166</a> Codes for representation of names of countries and their subdivisions</li> <li><a href="https://en.wikipedia.org/wiki/ISO_3166-1">3166-1</a> Countries</li> <li><a href="https://en.wikipedia.org/wiki/ISO_3166-3">3166-3</a> Deleted countries</li> <li><a href="https://en.wikipedia.org/wiki/ISO_3166-2">3166-2</a> Subdivisions of countries</li> <li><a href="https://en.wikipedia.org/wiki/ISO_4217">4217</a> Currencies</li> <li><a href="https://en.wikipedia.org/wiki/ISO_15924">15924</a> Scripts</li> </ul></li> </ul> <p><strong>Brian #2:</strong> <a href="https://nedbatchelder.com/blog/202403/does_python_have_pointers.html"><strong>Does Python have pointers?</strong></a></p> <ul> <li>Ned Batchelder</li> <li>Turns out, this is really the description of “what’s a variable in Python?” that helps to make sense of the “variables as names” model in Python, especially for people coming from languages that use pointers a lot.</li> <li>You can use <code>id()</code> to find out what a variable points to</li> <li>You just can’t do the reverse of access it given an id.</li> <li>There’s no “dereference” operator.</li> <li>See also <a href="https://nedbatchelder.com/text/names1.html">Python Names and Values</a>, also by Ned <ul> <li>Should be required reading/viewing for all Python curriculum.</li> </ul></li> </ul> <p><strong>Michael #3:</strong> <a href="https://bruin-data.github.io/ingestr/">ingestr</a></p> <ul> <li>ingestr is a command-line application that allows ingesting or copying data from any source into any destination database.</li> <li>Works on both MongoDB and Postgres and <a href="https://bruin-data.github.io/ingestr/supported-sources/overview.html">many more</a>. </li> <li>incremental loading: <code>append</code>, <code>merge</code> or <code>delete+insert</code></li> </ul> <p><strong>Brian #4:</strong> <a href="https://davidism.com/starship-and-fish/"><strong>Make your terminal nice</strong></a></p> <ul> <li>David Lord</li> <li>David’s switched to <a href="https://fishshell.com">Fish</a> and <a href="https://starship.rs">Starship</a></li> <li>I tried switching to Fish several times, and I guess I’m good with zsh. <ul> <li>Although I admire the brave comic sans motto: “<strong>Finally, a command line shell for the 90s”</strong></li> </ul></li> <li>But I’m finally ready for Starship, and it takes <a href="https://starship.rs/guide/#%F0%9F%9A%80-installation">almost no time to set up</a></li> <li>Plus it’s fast. (Has it always been Rust?)</li> </ul> <p><strong>Extras</strong> </p> <p>Brian:</p> <ul> <li>Doing some groundwork for a SaaS project, using <a href="https://www.saaspegasus.com/?via=brian">SaaS Pegasus</a> <ul> <li>I just talked with Cory from <a href="https://www.saaspegasus.com/?via=brian">Pegasus</a> for an upcoming PythonTest episode</li> <li>I haven’t decided whether to save up SaaS episodes for one big series, or spread them out.</li> <li>But mostly I’m excited to get my project started.</li> </ul></li> </ul> <p>Michael:</p> <ul> <li>Excellent video about “<a href="https://youtu.be/a30vFpSaoZg?si=sdsOspRZB6Kg3Tpo">cloud exit</a>”</li> <li><a href="https://talkpython.fm/episodes/show/453/uv-the-next-evolution-in-python-packages">uv - The Next Evolution in Python Packages</a>?</li> <li><a href="https://pythoninsider.blogspot.com/2024/03/python-3130-alpha-5-is-now-available.html">Python 3.13 a5</a></li> <li><a href="https://tech.target.com/blog/open-source-fund">Target’s Open Source Fund</a> via Pat Decker</li> </ul> <p><strong>Joke:</strong> <a href="https://devhumor.com/media/anti-social-engineer">Anti-social engineer</a></p>
Categories: FLOSS Project Planets

Hynek Schlawack: You Can Build Portable Binaries of Python Applications

Mon, 2024-03-18 20:00

Contrary to popular belief, it’s possible to ship portable executables of Python applications without sending your users to Python packaging hell.

Categories: FLOSS Project Planets

Data School: Jupyter &amp; IPython terminology explained 💡

Mon, 2024-03-18 14:58

Are you trying to understand the differences between Jupyter Notebook, JupyterLab, IPython, Colab, and other related terms? You&aposre in the right place!

I&aposll explain by walking through a brief history of the IPython and Jupyter projects:

IPython

IPython was first released in 2006 as an "interactive" version of the Python shell. Whereas the Python shell uses the >>> prompt, you can recognize IPython from its use of In [1] and Out [1] notation to indicate input/output and line numbers:

IPython includes many features not present in the default Python shell, such as object introspection, "magic" commands, system shell access, and more.

IPython Notebook

In 2011, the IPython Notebook was released. It was known as a "computational notebook" because it allowed you to weave together code, plots, and narrative text into a single document:

It was called the IPython Notebook (and not the Python Notebook) because it used IPython as the "kernel", which is the language-specific process that runs the code in a notebook.

Jupyter Notebook

In 2015, the IPython Notebook introduced support for programming languages other than Python.

Also in 2015, IPython split into two projects: IPython (for Python-specific components) and Jupyter (for language-agnostic components).

As part of that split, the IPython Notebook was renamed the Jupyter Notebook. The name "Jupyter" was inspired by the open languages of science: Julia, Python, and R:

To be clear, "Jupyter Notebook" was the name of both the coding environment and the files created by that environment. In other words, you would open "the Jupyter Notebook" to create "a Jupyter notebook".

Jupyter notebook files used the extension ".ipynb", which was the extension (and file format) originally created for IPython notebooks.

JupyterLab

At this point, the Jupyter Notebook was a lightweight coding environment, with far less features than a traditional IDE (integrated development environment).

In 2018, JupyterLab (one word) was released as a more full-featured alternative to the Jupyter Notebook:

Notebooks created within JupyterLab are still called "Jupyter notebooks", they still use the extension ".ipynb", and they&aposre compatible with notebooks created by the Jupyter Notebook.

JupyterLab was originally designed to replace the Jupyter Notebook environment. However, due to the continued popularity of the "classic" Notebook environment, JupyterLab and Jupyter Notebook continue to be developed as separate applications (as of 2024).

Summary
  • The Jupyter Notebook is a lightweight coding environment for creating and editing Jupyter notebooks.
  • JupyterLab is more full-featured IDE for creating and editing Jupyter notebooks.
  • IPython is the Python kernel for Jupyter Notebook and JupyterLab, and is also a standalone Python shell. IPython is the reason that magic commands and other enhancements are available within Jupyter Notebook and JupyterLab.
  • Jupyter notebooks are computational documents that can contain code, plots, and text. They use the extension ".ipynb" and are compatible with both the Jupyter Notebook and JupyterLab environments.

Here are a few related terms that I didn&apost mention above:

  • JupyterLab Desktop is a cross-platform desktop application that allows you to create and manage multiple JupyterLab sessions and Python environments.
  • JupyterLite is a JupyterLab distribution that runs entirely in the browser, without you having to launch a Jupyter server from a terminal.
  • Google Colab, Kaggle Code, and Deepnote are a few of the many web-based services that provide a Jupyter-like interface for creating notebooks that are compatible with Jupyter. (More specifically, they can import and export files that use the ".ipynb" format.)

Are there any other Jupyter-related terms you want me to explain? Please let me know if the comments! &#x1F447;

Categories: FLOSS Project Planets

Real Python: Model-View-Controller (MVC) in Python Web Apps: Explained With Lego

Mon, 2024-03-18 10:00

If you’re curious about web development, then you’ve likely encountered the abbreviation MVC, which stands for Model-View-Controller. You may know that it’s a common design pattern that’s fundamental to many Python web frameworks and even desktop applications.

But what exactly does it mean? If you’ve had a hard time wrapping your head around the concept, then keep on reading.

In this tutorial, you’ll:

  • Approach understanding the MVC pattern through a Lego-based analogy
  • Learn what models, views, and controllers are conceptually
  • Tie your conceptual understanding back to concrete web development examples
  • Investigate Flask code snippets to drive the point home

Maybe you built things with Lego as a kid, or maybe you’re still a Lego-aficionado today. But even if you’ve never pieced two Lego blocks together, keep on reading because the analogy might still be a good building block for your understanding.

Get Your Code: Click here to download an example Flask app that will help you understand MVC in Python web apps.

Take the Quiz: Test your knowledge with our interactive “Model-View-Controller (MVC) in Python Web Apps: Explained With Lego” quiz. Upon completion you will receive a score so you can track your learning progress over time:

Take the Quiz »

Explaining the Model-View-Controller Pattern With Lego

Imagine that you’re ten years old and sitting on your family room floor. In front of you is a big bucket of Lego, or similar modular building blocks. There are blocks of all different shapes and sizes:

  • 🟦🟦🟦 Some are blue, tall, and long.
  • 🟥 Some are red and cube-shaped.
  • 🟨🟨 Some are yellow, big, and wide.

With all of these different Lego pieces, there’s no telling what you could build!

Just as your mind is filling with the endless possibilities, you hear something coming from the direction of the couch. It’s your older brother, voicing a specific request. He’s saying, “Hey! Build me a spaceship!”

“Alright,” you think, “that could actually be pretty cool.” A spaceship it is!

So you get to work. You start pulling out the Lego blocks that you think you’re going to need. Some big, some small. Different colors for the outside of the spaceship, different colors for the engines.

Now that you have all of your building blocks in place, it’s time to assemble the spaceship. And after a few hours of hard work, you now have in front of you—a spaceship:

🟦 🟦🟥🟦 🟦🟥🟥🟥🟦 🟦🟥🟥🟥🟥🟥🟦 🟦🟥🟥🟥🟥🟥🟦 🟦🟥🟩🟩🟩🟥🟦 🟦🟥🟩🟦🟩🟥🟦 🟦🟥🟩🟩🟩🟥🟦 🟦🟥🟥🟥🟥🟥🟦 🟦🟥🟥🟥🟥🟥🟦 🟦🟥🟥🟥🟥🟥🟦 🟦🟥🟥🟥🟥🟥🟥🟥🟥🟥🟦 🟦🟥🟥🟥🟥🟥🟥🟥🟥🟥🟦 🟦🟥🟨🟨🟥🟥🟥🟨🟨🟥🟦 🟨🟨 🟨🟨

You run to find your brother and show him the finished product. “Wow, nice work!”, he says. Then he quietly adds:

Huh, I just asked for that a few hours ago, I didn’t have to do a thing, and here it is. I wish everything was that easy.

Your Brother

What if I told you that building a web application using the MVC pattern is exactly like building something with Lego blocks?

User Sends a Request Read the full article at https://realpython.com/lego-model-view-controller-python/ »

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

ListenData: Python for Data Science: Beginner's Guide

Mon, 2024-03-18 08:32

This tutorial would help you to learn Data Science with Python by examples. It is designed for beginners who want to get started with Data Science in Python. Python is an open source language and it is widely used as a high-level programming language for general-purpose programming. It has gained high popularity in data science world. In the PyPL Popularity of Programming language index, Python leads with a 29 percent share. In advanced analytics and predictive analytics market, it is ranked among top 3 programming languages for advanced analytics.

Data Science with Python Tutorial Table of Contents Introduction Python is widely used and very popular for a variety of software engineering tasks such as website development, cloud-architecture, back-end etc. It is equally popular in data science world. In advanced analytics world, there has been several debates on R vs. Python. There are some areas such as number of libraries for statistical analysis, where R wins over Python but Python is catching up very fast.With popularity of big data and data science, Python has become first programming language of data scientists.

There are several reasons to learn Python. Some of them are as follows -

  1. Python runs well in automating various steps of a predictive model.
  2. Python has awesome robust libraries for machine learning, natural language processing, deep learning, big data and artificial Intelligence.
  3. Python wins over R when it comes to deploying machine learning models in production.
  4. It can be easily integrated with big data frameworks such as Spark and Hadoop.
  5. Python has a great online community support.
Do you know these sites are developed in Python?
  1. YouTube
  2. Instagram
  3. Reddit
  4. Dropbox
  5. Disqus
Python 2 vs. 3 Google yields thousands of articles on this topic. Some bloggers opposed and some in favor of 2.7. If you filter your search criteria and look for only recent articles, you would find Python 2 is no longer supported by the Python Software Foundation. Hence it does not make any sense to learn 2.7 if you start learning it today. Python 3 supports all the packages. Python 3 is cleaner and faster. It is a language for the future. It fixed major issues with versions of Python 2 series. Python 3 was first released in year 2008. It has been 12 years releasing robust versions of Python 3 series. You should go for latest version of Python 3. How to install Python? There are two ways to download and install Python
  1. Download Anaconda. It comes with Python software along with preinstalled popular libraries.
  2. Download Pythonfrom its official website. You have to manually install libraries.
Recommended : Go for first option and download anaconda. It saves a lot of time in learning and coding Python Coding Environments Anaconda comes with two popular IDE :
  1. Jupyter (Ipython) Notebook
  2. Spyder
Spyder. It is like RStudio for Python. It gives an environment wherein writing python code is user-friendly. If you are a SAS User, you can think of it as SAS Enterprise Guide / SAS Studio. It comes with a syntax editor where you can write programs. It has a console to check each and every line of code. Under the 'Variable explorer', you can access your created data files and function. I highly recommend Spyder! Spyder - Python Coding Environment Jupyter (Ipython) Notebook Jupyter is equivalent to markdown in R. It is useful when you need to present your work to others or when you need to create step by step project report as it can combine code, output, words, and graphics.
To read this article in full, please click hereThis post appeared first on ListenData
Categories: FLOSS Project Planets

ListenData: 15 Free Open Source ChatGPT Alternatives (with Code)

Sun, 2024-03-17 20:50

In this article we will explain how Open Source ChatGPT alternatives work and how you can use them to build your own ChatGPT clone for free. By the end of this article you will have a good understanding of these models and will be able to compare and use them.

Benefits of Open Source ChatGPT Alternatives

There are various benefits of using open source large language models which are alternatives to ChatGPT. Some of them are listed below.

  1. Data Privacy: Many companies want to have control over data. It is important for them as they don't want any third-party to have access to their data.
  2. Customization: It allows developers to train large language models with their own data and some filtering on some topics if they want to apply
  3. Affordability: Open source GPT models let you to train sophisticated large language models without worrying about expensive hardware.
  4. Democratizing AI: It opens room for further research which can be used for solving real-world problems.
Table of Contents Llama Introduction : Llama

Llama stands for Large Language Model Meta AI. It includes a range of model sizes from 7 billion to 65 billion parameters. Meta AI researchers focused on scaling the model's performance by increasing the volume of training data, rather than the number of parameters. They claimed the 13 billion parameter model outperformed 175 billion parameters of GPT-3 model. It uses the transformer architecture and was trained on 1.4 trillion tokens extracted by web scraping Wikipedia, GitHub, Stack Exchange, Books from Project Gutenberg, scientific papers on ArXiv.

Python Code : Llama # Install Package pip install llama-cpp-python from llama_cpp import Llama llm = Llama(model_path="./models/7B/ggml-model.bin") output = llm("Q: Name the planets in the solar system? A: ", max_tokens=128, stop=["Q:", "\n"], echo=True) print(output) In the model path, you need to have weights for Llama in GGML format and then store them into the models folder. You can search it on Hugging Face website. See one of them here Llama 2 What's New in Llama 2

Here are some of the key differences between Llama 2 and Llama:

  • Training data: Llama 2 is trained on 40% more tokens than Llama, a total of 2 trillion tokens. This gives it a larger knowledge base and allows it to generate more accurate responses.
  • Model size: Llama 2 is available in three sizes: 7 billion parameters, 13 billion parameters, and 70 billion parameters. Whereas, the maximum size of Llama is 65 billion parameters.
  • Chat optimization: Llama 2-Chat is a specialized version of Llama 2 that is optimized for engaging in two-way conversations. It has been trained on a dataset of human conversations, which allows it to generate more natural and engaging responses.
  • Safety and bias mitigation: Llama 2 has been trained with a focus on safety and bias mitigation. This means that it is less likely to generate toxic or harmful content.
  • Open source: Llama 2 is open source, which means that anyone can use it for research or commercial purposes. Whereas, Llama can't be used for commercial purposes.
Python Code : Llama 2 Llam2: 7 Billion Parameters

To run Llama2 7B model, refer the code below. The following code uses a 4-bit quantization technique that reduces the size of the LLM, which can make it easier to deploy and use on sytems with limited memory.

  Colab: Llama2 7B Model %cd /content !apt-get -y install -qq aria2 !git clone -b v1.3 https://github.com/camenduru/text-generation-webui %cd /content/text-generation-webui !pip install -r requirements.txt !pip install -U gradio==3.28.3 !mkdir /content/text-generation-webui/repositories %cd /content/text-generation-webui/repositories !git clone -b v1.2 https://github.com/camenduru/GPTQ-for-LLaMa.git %cd GPTQ-for-LLaMa !python setup_cuda.py install !aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/4bit/Llama-2-7b-Chat-GPTQ/raw/main/config.json -d /content/text-generation-webui/models/Llama-2-7b-Chat-GPTQ -o config.json !aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/4bit/Llama-2-7b-Chat-GPTQ/raw/main/generation_config.json -d /content/text-generation-webui/models/Llama-2-7b-Chat-GPTQ -o generation_config.json !aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/4bit/Llama-2-7b-Chat-GPTQ/raw/main/special_tokens_map.json -d /content/text-generation-webui/models/Llama-2-7b-Chat-GPTQ -o special_tokens_map.json !aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/4bit/Llama-2-7b-Chat-GPTQ/resolve/main/tokenizer.model -d /content/text-generation-webui/models/Llama-2-7b-Chat-GPTQ -o tokenizer.model !aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/4bit/Llama-2-7b-Chat-GPTQ/raw/main/tokenizer_config.json -d /content/text-generation-webui/models/Llama-2-7b-Chat-GPTQ -o tokenizer_config.json !aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/4bit/Llama-2-7b-Chat-GPTQ/resolve/main/gptq_model-4bit-128g.safetensors -d /content/text-generation-webui/models/Llama-2-7b-Chat-GPTQ -o gptq_model-4bit-128g.safetensors %cd /content/text-generation-webui !python server.py --share --chat --wbits 4 --groupsize 128 --model_type llama Llam2: 13 Billion Parameters

To run Llama2 13B model, refer the code below.

  Colab: Llama2 13B Model %cd /content !apt-get -y install -qq aria2 !git clone -b v1.8 https://github.com/camenduru/text-generation-webui %cd /content/text-generation-webui !pip install -r requirements.txt !aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/4bit/Llama-2-13b-chat-hf/resolve/main/model-00001-of-00003.safetensors -d /content/text-generation-webui/models/Llama-2-13b-chat-hf -o model-00001-of-00003.safetensors !aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/4bit/Llama-2-13b-chat-hf/resolve/main/model-00002-of-00003.safetensors -d /content/text-generation-webui/models/Llama-2-13b-chat-hf -o model-00002-of-00003.safetensors !aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/4bit/Llama-2-13b-chat-hf/resolve/main/model-00003-of-00003.safetensors -d /content/text-generation-webui/models/Llama-2-13b-chat-hf -o model-00003-of-00003.safetensors !aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/4bit/Llama-2-13b-chat-hf/raw/main/model.safetensors.index.json -d /content/text-generation-webui/models/Llama-2-13b-chat-hf -o model.safetensors.index.json !aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/4bit/Llama-2-13b-chat-hf/raw/main/special_tokens_map.json -d /content/text-generation-webui/models/Llama-2-13b-chat-hf -o special_tokens_map.json !aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/4bit/Llama-2-13b-chat-hf/resolve/main/tokenizer.model -d /content/text-generation-webui/models/Llama-2-13b-chat-hf -o tokenizer.model !aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/4bit/Llama-2-13b-chat-hf/raw/main/tokenizer_config.json -d /content/text-generation-webui/models/Llama-2-13b-chat-hf -o tokenizer_config.json !aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/4bit/Llama-2-13b-chat-hf/raw/main/config.json -d /content/text-generation-webui/models/Llama-2-13b-chat-hf -o config.json !aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/4bit/Llama-2-13b-chat-hf/raw/main/generation_config.json -d /content/text-generation-webui/models/Llama-2-13b-chat-hf -o generation_config.json %cd /content/text-generation-webui !python server.py --share --chat --load-in-8bit --model /content/text-generation-webui/models/Llama-2-13b-chat-hf Alpaca Introduction : Alpaca

A team of researchers from Stanford University developed an open-source language model called Alpaca. It is based on Meta's large-scale language model Llama. The team used OpenAI's GPT API (text-davinci-003) to fine tune the Llama 7 billion (7B) parameters sized model. The goal of the team is to make AI available for everyone for free so that academicians can do further research without worrying about expensive hardwares to execute these memory-intensive algorithms. Although these open source models are not available for commercial use, small businesses can still utilize it for building their own chatbots.

How does Alpaca work

The Stanford team began their research with the smallest language model among Llama models, which was the Llama 7B model, and pre-trained it with 1 trillion tokens. They started with the 175 human-written instruction-output pairs from the self-instruct seed set. They then used OpenAI API to ask ChatGPT to generate more instructions using the seed set. It is to obtain roughly 52,000 sample conversations, which the team used to further fine-tune the Llama models using Hugging Face's training framework.

To read this article in full, please click hereThis post appeared first on ListenData
Categories: FLOSS Project Planets

Brett Cannon: State of WASI support for CPython: March 2024

Sun, 2024-03-17 18:43

The biggest update since June 2023 is WASI is now a tier 2 platform for CPython! This means that the main branch of CPython should never be broken more than 24 hours for WASI and that a release will be blocked if WASI support is broken. This only applies to Python 3.13 and later, although I have been trying to keep Python 3.11 and 3.12 working with WASI as well.

To help make this support easier, the devguide has build instructions for WASI. There is also now a WASI step in CI to help make things easier for core developers.

Starting in wasmtime 14, a new command line interface was introduced. All the relevant bits of code that call wasmtime have been updated to use the new CLI in Python 3.11, 3.12, and 3.13/main.

Lastly, 3.13/main and 3.12 now support WASI SDK 21 – which is the official name of the project – and 3.11 is one bug fix away in the test suite from also having support.

At this point I think CPython has caught up to what&aposs available in WASI 0.2 and wasi-libc via WASI SDK. The open issues are mostly feature requests or checking if assumptions related to what&aposs supported still hold.

I&aposm on parental leave at this point, so future WASI work from me is on hold until I return to work in June. Another side effect of me becoming a parent soon is I stepped down as the sponsor of Emscripten support in CPython. That means CPython 3.13 does not officially support Emscripten and probably starting in 3.14, I will be removing any code that complicates supporting WASI. The Pyodide project already knows about this and they don&apost expect it to be a major hindrance for them since they are already used to patching CPython source code.

Categories: FLOSS Project Planets

Robin Wilson: How to speed up appending to PostGIS tables with ogr2ogr

Fri, 2024-03-15 13:28

Summary: If appending to a PostGIS table with GDAL/OGR is taking a long time, try setting the PG_USE_COPY config option to YES (eg. adding --config PG_USE_COPY YES to your command line). This should speed it up, but beware that if there are concurrent writes to your table at the same time as OGR is accessing it then there could be issues with unique identifiers.

As with many of my blog posts, I’m writing this in the hope that it will appear in searches when someone else has the same problem that I ran into recently. In the past I’ve found myself Googling problems that I’ve had before and finding a link to my blog with an explanation in a post that I didn’t even remember writing.

Anyway, the problem I’m talking about today is one I ran into when working with a client a few weeks ago.

I was using the ogr2ogr command-line tool (part of the GDAL software suite) to import data from a local Geopackage file into a PostGIS database (ie. a PostgreSQL database with the PostGIS extension).

I had multiple files of data that I wanted to put into one Postgres table. Specifically, I was using the lovely data collated by Alasdair Rae on the resources page of his website. Even more specifically, I was using some of the Local Authority GIS data to get buildings data for various areas of the UK. I downloaded multiple GeoPackage files (for example, for Southampton City Council, Hampshire County Council and Portsmouth City Council) and wanted to import them all to a buildings table.

I originally tested this with a Postgres server running on my local machine, and ran the following ogr2ogr commands:

ogr2ogr --debug ON \ -f PostgreSQL PG:"host=localhost user=postgres password=blah dbname=test_db" \ buildings1.gpkg -nln buildings ogr2ogr -append -update --debug ON \ -f PostgreSQL PG:"host=localhost user=postgres password=blah dbname=test_db" \ buildings2.gpkg -nln buildings

Here I’m using the -f switch and the arguments following it to tell ogr2ogr to export to PostgreSQL and how to connect to the server, giving it the input file of buildings1.gpkg and using the -nln parameter to tell it what layer name (ie. table name) to use as the output. In the second command I do exactly the same with buildings2.gpkg but also add -append and -update to tell it to append to the existing table rather than overwriting it.

This all worked fine. Great!

A few days later I tried the same thing with a Postgres server running on Azure (using Azure Database for PostgreSQL). The first command ran fine, but the second command seemed to hang.

I was expecting that it would be a bit slower when connecting to a remote database, but I left it running for 10 minutes and it still hadn’t finished. I then tried importing the second file to a new table and it completed quickly – therefore suggesting it was some sort of problem with appending the data.

I worked round this for the time being (using the ogrmerge.py script to merge my buildings1.gpkg and buildings2.gpkg into one file and then importing that file), but resolved to get to the bottom of it when I had time.

Recently, I had that time, and posted on the GDAL mailing list about this. The maintainer of GDAL got back to me to tell me about something I’d missed in the documentation. This was that when importing to a brand new table, the Postgres COPY mode is used, but when appending to an existing table individual INSERT statements are used instead, which can be a lot slower.

Let’s look into this in a bit more detail. The PostgreSQL COPY command is a fast way of importing data into Postgres which involves copying a whole file of data into Postgres in one go, rather than dealing with each row of data individually. This can be significantly faster than iterating through each row of the data and running a separate INSERT statement for each row.

So, ogr2ogr hadn’t hung, it was just running extremely slowly, as inserting my buildings layer involved running an INSERT statement separately for each row, and there were hundreds of thousands of rows. Because the server was hosted remotely on Azure, this involved sending the INSERT command from my computer to the server, waiting for the server to process it, and then the server sending back a result to my computer – a full round-trip for each row of the table.

So, I was told, the simple way to speed this up was to use a configuration setting to turn COPY mode on when appending to tables. This can be done by adding --config PG_USE_COPY YES to the ogr2ogr command. This did the job, and the append commands now completed nice and quickly. If you’re using GDAL/OGR from within a programming language, then have a look at the docs for the GDAL bindings for your language – there should be a way to set GDAL configuration options in your code.

The only final part of this was to understand why the COPY method isn’t used all the time, as it’s so much quicker. Even explained that this is because of potential issues with other connections to the database updating the table at the same time as GDAL is accessing it. It is a fairly safe assumption that if you’re creating a brand new table then no-one else will be accessing it yet, but you can’t assume the same for an existing table. The COPY mode can’t deal with making sure unique identifiers are unique when other connections may be accessing the data. whereas individual INSERT statements can cope with this. Therefore it’s safer to default to INSERT statements when there is any risk of data corruption.

As a nice follow-up for this, and on the maintainer’s advice, I submitted a PR to the GDAL docs, which adds a new section explaining this and giving guidance on setting the config option. I’ve copied that section below:

When data is appended to an existing table (for example, using the -append option in ogr2ogr) the driver will, by default, emit an INSERT statement for each row of data to be added. This may be significantly slower than the COPY-based approach taken when creating a new table, but ensures consistency of unique identifiers if multiple connections are accessing the table simultaneously.

If only one connection is accessing the table when data is appended, the COPY-based approach can be chosen by setting the config option PG_USE_COPY to YES, which may significantly speed up the operation.

Categories: FLOSS Project Planets

Pages