Feeds

Test and Code: 206: TDD in Context

Planet Python - Wed, 2023-08-23 19:36

TDD (Test Driven Development) started from Test First Programming, and has been around at least since the 90's.

However, software tools and available CI systems have changed quite a bit since then.
Maybe it's time to re-examine the assumptions, practices, processes, and principles of TDD.
 
At least in the context of my software engineering career, modifications to TDD, at least the version of TDD as it's frequently taught, have been necessary.

This is the start of a series focused on examining TDD and related lightweight practices and processes.

Links from the show:


Thank you Coverage Cat for sponsoring this episode

  • Coverage Cat is the best way to buy your umbrella, car, home, and renters insurance.
  • Get your free, optimized insurance quote today at coveragecat.com
<p>TDD (Test Driven Development) started from Test First Programming, and has been around at least since the 90's. </p><p>However, software tools and available CI systems have changed quite a bit since then. <br>Maybe it's time to re-examine the assumptions, practices, processes, and principles of TDD. <br> <br>At least in the context of my software engineering career, modifications to TDD, at least the version of TDD as it's frequently taught, have been necessary. </p><p>This is the start of a series focused on examining TDD and related lightweight practices and processes.</p><p><strong>Links from the show:</strong></p><ul> <li>From XP<ul> <li><a href="http://www.extremeprogramming.org/rules/testfirst.html">Test First</a></li> <li><a href="http://www.extremeprogramming.org/rules/unittests.html">Unit Tests</a></li> <li><a href="http://www.extremeprogramming.org/rules/functionaltests.html">Acceptance Tests</a></li> </ul> </li> <li><a href="https://en.wikipedia.org/wiki/Test-driven_development">Test-Driven Development (wikipedia)</a></li> </ul> <br><p><strong>Thank you Coverage Cat for sponsoring this episode</strong></p><ul> <li>Coverage Cat is the best way to buy your umbrella, car, home, and renters insurance.</li> <li>Get your free, optimized insurance quote today at <a href="http://coveragecat.com">coveragecat.com</a> </li> </ul>
Categories: FLOSS Project Planets

PyBites: Harnessing Downtime: The Power of Disconnecting

Planet Python - Wed, 2023-08-23 13:22

In this episode of the Pybites podcast, we dive into the power of stepping back from the daily grind, whether that’s coding or career-focused

Watch here:

Or listen here:

Drawing insights from Julian’s month-long trip to Canada, we discuss how disconnecting can provide clarity and inspiration for personal growth, and how it can inform decisions in our professional journeys

And of course there are the books we’re reading

Join us for a journey of reflection, discovery, and a touch of humor

Chapters:
00:00 Intro
00:46 Welcome back (re-introducing) Julian!
01:58 Wins of the week
03:10 Julian’s break
06:20 Lessons learned from the break
08:00 Trip reflections
11:34 Reflect on your career
13:22 Hint what’s to come
14:40 Holidays no deadlines
15:07 Favorite part of the trip
17:15 Pybites on the road
18:05 Books / what we’re reading
20:50 Stoicism
21:30 Wrap up + thanks
22:22 Outro music 

Books:
– Deep work
– Siddhartha
– The Carbon Almanac

Related article:
– The Importance of Disconnecting as a Developer

Categories: FLOSS Project Planets

Stack Abuse: Fixing "NameError: name 'df'/'pd' is not defined" in Python

Planet Python - Wed, 2023-08-23 12:39
Introduction

When using Pandas in Python, a library for data manipulation and analysis, you might have encountered an error like "NameError: name 'df'/'pd' is not defined". In this Byte, we'll show why these errors occur and how you can avoid them.

Understanding this 'df' NameError

The df name error usually occurs when you try to use a DataFrame object df before it has been defined. This is a common mistake when working with Pandas (or any Python script, really), which uses the DataFrame object to store data in two-dimensional size-mutable, potentially heterogeneous tabular form.

print(df) NameError: name 'df' is not defined

The above error is thrown because df has not been defined before it's being accessed.

Declaring Variables Before Accessing

To avoid the NameError, you need to make sure that your DataFrame df is declared before it's accessed. This can be done by using the Pandas function pd.DataFrame() to create a DataFrame.

import pandas as pd data = { 'apples': [3, 2, 0, 1], 'oranges': [0, 3, 7, 2] } df = pd.DataFrame(data) print(df) apples oranges 0 3 0 1 2 3 2 0 7 3 1 2

The above code will work perfectly because df has been defined before it's being accessed.

Common Reasons for the 'df' NameError

There are several common situations that may cause the df error. As we just saw, one of these is attempting to use df before it's been declared. Another is when you mistakenly think a library or module has been imported, but it hasn't.

df = pd.DataFrame(data) print(df) NameError: name 'pd' is not defined

In the code above, the pandas module has not been imported, hence the NameError.

Scope-Related Issues with Variables

Another common trigger for the error is scope-related issues. If a DataFrame df is defined within a function, it will not be recognized outside that function. This is because df is local to the function and is not a global variable.

def create_df(): df = pd.DataFrame(data) return df print(df) NameError: name 'df' is not defined

In this code, df is defined within the create_df() function and can't be accessed outside of it.

Avoiding Nested Scope Import of Pandas

In Python, the scope of a variable refers to the context in which it's "visible". The two most common types of scope are global (the code block from which it's accessible) and local (the function or method in which it's defined). When you import pandas as pd within a function (local scope), and then try to use it outside that function (global scope), you'll likely encounter the NameError.

Here's an example:

def my_function(): import pandas as pd # some code here my_function() print(pd)

Running this code will give you a NameError: name 'pd' is not defined because the pandas module was imported in the local scope of the function and isn't accessible in the global scope.

To avoid this, always import pandas at the beginning of your script, outside any functions or methods, so it's available throughout your code.

Don't Import Pandas in try/except Blocks

We often see Python developers importing modules within try/except blocks to handle potential import errors. However, this can lead to unexpected NameErrors if not done correctly.

Consider the following code:

try: import pandas as pd except ImportError: print("pandas module not installed") print(pd)

If Pandas isn't installed, the last print statement will raise a NameError: name 'pd' is not defined since pd was never able to be defined. To avoid this, ensure that you're only referencing the module within the try block or ensure it's installed before running the script. In this case, the except block should have either exited the script or had another fallback.

The 'pd' NameError

The NameError: name 'pd' is not defined in Python happens when you try to use pandas (aliased as pd) before importing it. When you use the alias pd to call pandas functions without importing Pandas as pd, Python doesn't recognize pd and raises a NameError.

Here's an example:

df = pd.DataFrame()

Running this code without importing pandas as pd will result in a NameError: name 'pd' is not defined.

Importing Pandas Before Usage

To resolve the NameError: name 'pd' is not defined, you need to import Pandas before using it. The standard convention is to import pandas at the beginning of your script and alias it as pd for easier use.

Here's how to do it:

import pandas as pd df = pd.DataFrame()

This code will run without raising a NameError because pandas is imported before it's used.

Misspelling Issues with Pandas Module

While Python is case-sensitive, typos or incorrect capitalization can lead to a NameError. For instance, if you import Pandas as pd but later refer to it as PD or Pd, Python will raise a NameError: name 'PD' is not defined or NameError: name 'Pd' is not defined.

import pandas as pd df = PD.DataFrame() # This will raise a NameError

To avoid this, always ensure that you're consistent with the case when referring to pandas or any other Python modules.

Avoid Nested Scope Import of Pandas

Often, Python developers attempt to import modules within a function or a class, leading to a nested scope import. This can cause issues, particularly with Pandas, as the module might not be available in the global scope. Let's take a look at an example:

def some_function(): import pandas as pd df = pd.DataFrame() some_function() print(df)

This code will throw a NameError because df is not defined in the global scope. The DataFrame df is only available within the function some_function.

Note: To avoid such issues, always import your modules at the top of your script, making them available throughout the entire scope of your program.

Using Correct Pandas Import Statement

Pandas is a popular Python library for data manipulation and analysis. It's conventionally imported with the alias pd. If you're seeing a NameError for pd, it's likely that you've either forgotten to import Pandas, or have imported it incorrectly. Here's how you should do it:

import pandas as pd

Once Pandas is imported with the alias pd, you can use it to create a DataFrame, like so:

df = pd.DataFrame()

Note: Always ensure that Pandas is imported correctly at the beginning of your script. If Pandas not installed, you can install it using pip: $ pip install pandas in your console.

Conclusion

In Python, a NameError typically indicates that a variable or module has been used before it has been defined. This can occur with Pandas (commonly aliased as pd) and with DataFrames (often named df). To avoid these errors, always ensure that your modules are imported at the top of your script, using the correct syntax. Also, make sure that variables are declared before they're accessed.

Categories: FLOSS Project Planets

Jo Shields: Retirement

Planet Debian - Wed, 2023-08-23 11:52

Apparently it’s nearly four years since I last posted to my blog. Which is, to a degree, the point here. My time, and priorities, have changed over the years. And this lead me to the decision that my available time and priorities in 2023 aren’t compatible with being a Debian or Ubuntu developer, and realistically, haven’t been for years. As of earlier this month, I quit as a Debian Developer and Ubuntu MOTU.

I think a lot of my blogging energy got absorbed by social media over the last decade, but with the collapse of Twitter and Reddit due to mismanagement, I’m trying to allocate more time for blog-based things instead. I may write up some of the things I’ve achieved at work (.NET 8 is now snapped for release Soon). I might even blog about work-adjacent controversial topics, like my changed feelings about the entire concept of distribution packages. But there’s time for that later. Maybe.

I’ll keep tagging vaguely FOSS related topics with the Debian and Ubuntu tags, which cause them to be aggregated in the Planet Debian/Ubuntu feeds (RSS, remember that from the before times?!) until an admin on those sites gets annoyed at the off-topic posting of an emeritus dev and deletes them.

But that’s where we are. Rather than ignore my distro obligations, I’ve admitted that I just don’t have the energy any more. Let someone less perpetually exhausted than me take over. And if they don’t, maybe that’s OK too.

Categories: FLOSS Project Planets

Five Jars: Implementing Automated Testing with Codeception

Planet Drupal - Wed, 2023-08-23 10:24
We're excited to share our experience implementing automated testing into our legacy site, where it has helped improve functionality and user experience through automated testing.
Categories: FLOSS Project Planets

Real Python: Click and Python: Build Extensible and Composable CLI Apps

Planet Python - Wed, 2023-08-23 10:00

You can use the Click library to quickly provide your Python automation and tooling scripts with an extensible, composable, and user-friendly command-line interface (CLI). Whether you’re a developer, data scientist, DevOps engineer, or someone who often uses Python to automate repetitive tasks, you’ll very much appreciate Click and its unique features.

In the Python ecosystem, you’ll find multiple libraries for creating CLIs, including argparse from the standard library, Typer, and a few others. However, Click offers a robust, mature, intuitive, and feature-rich solution.

In this tutorial, you’ll learn how to:

  • Create command-line interfaces with Click and Python
  • Add arguments, options, and subcommands to your CLI apps
  • Enhance the usage and help pages of your CLI apps with Click
  • Prepare a Click CLI app for installation, use, and distribution

To get the most out of this tutorial, you should have a good understanding of Python programming, including topics such as using decorators. It’ll also be helpful if you’re familiar with using your current operating system’s command line or terminal.

Get Your Code: Click here to download the sample code that you’ll use to build your CLI app with Click and Python.

Creating Command-Line Interfaces With Click and Python

The Click library enables you to quickly create robust, feature-rich, and extensible command-line interfaces (CLIs) for your scripts and tools. This library can significantly speed up your development process because it allows you to focus on the application’s logic and leave CLI creation and management to the library itself.

Click is a great alternative to the argparse module, which is the default CLI framework in the Python standard library. Next up, you’ll learn what sets it apart.

Why Use Click for CLI Development

Compared with argparse, Click provides a more flexible and intuitive framework for creating CLI apps that are highly extensible. It allows you to gradually compose your apps without restrictions and with a minimal amount of code. This code will be readable even when your CLI grows and becomes more complex.

Click’s application programming interface (API) is highly intuitive and consistent. The API takes advantage of Python decorators, allowing you to add arguments, options, and subcommands to your CLIs quickly.

Functions are fundamental in Click-based CLIs. You have to write functions that you can then wrap with the appropriate decorators to create arguments, commands, and so on.

Click has several desirable features that you can take advantage of. For example, Click apps:

  • Can be lazily composable without restrictions
  • Follow the Unix command-line conventions
  • Support loading values from environment variables
  • Support custom prompts for input values
  • Handle paths and files out of the box
  • Allow arbitrary nesting of commands, also known as subcommands

You’ll find that Click has many other cool features. For example, Click keeps information about all of your arguments, options, and commands. This way, it can generate usage and help pages for the CLI, which improves the user experience.

When it comes to processing user input, Click has a strong understanding of data types. Because of this feature, the library generates consistent error messages when the user provides the wrong type of input.

Now that you have a general understanding of Click’s most relevant features, it’s time to get your hands dirty and write your first Click app.

How to Install and Set Up Click: Your First CLI App

Unlike argparse, Click doesn’t come in the Python standard library. This means that you need to install Click as a dependency of your CLI project to use the library. You can install Click from PyPI using pip. First, you should create a Python virtual environment to work on. You can do all of that with the following platform-specific commands:

PS> python -m venv venv PS> venv\Scripts\activate (venv) PS> python -m pip install click $ python -m venv venv $ source venv/bin/activate (venv) $ python -m pip install click

With the first two commands, you create and activate a Python virtual environment called venv in your working directory. Once the environment is active, you install Click using pip.

Great! You’ve installed Click in a fresh virtual environment. Now go ahead and fire up your favorite code editor. Create a new hello.py file and add the following content to it:

# hello.py import click @click.command("hello") @click.version_option("0.1.0", prog_name="hello") def hello(): click.echo("Hello, World!") if __name__ == "__main__": hello() Read the full article at https://realpython.com/python-click/ »

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Stack Abuse: Solving "NameError: name 'random' is not defined" in Python

Planet Python - Wed, 2023-08-23 09:30
Introduction

In Python, one of the most common errors that beginners and even some seasoned programmers encounter is the NameError: name 'random' is not defined. This error often pops up when trying to use the random module without properly importing it.

In this Byte, we will understand this error and learn how to correctly import and use the random module in Python.

Understanding the Error

Before we get into fixing the problem, let's first understand what this error means. The NameError: name 'random' is not defined error is raised when you try to use the random module, or a function from it, without first importing it into your script. This is because Python doesn't automatically load all modules at startup due to performance reasons. Here's an example of this error:

print(random.randint(1, 10))

Output:

NameError: name 'random' is not defined

As you can see, attempting to use random.randint() without first importing the random module results in the NameError.

Importing the random Module

To use the random module or any other module in Python, you need to import it first. The import statement in Python is used to load a module into your script. Here's how you can import it:

import random print(random.randint(1, 10))

Output:

7

Now, the script works fine because we've imported the random module before using its randint function.

Note: Beginners, remember that the import statement should be placed at the beginning of your script, before any function that uses the module is called!

Proper Scoping for Modules

Another thing that can trip up programmers is module scoping. Understanding the scope is important, especially when you're not importing modules at the top of your source file. When you import a module, it's only available in the scope where you imported it. So if you import a module inside a function, it won't be available outside that function since that is out of scope.

Here's an example:

def generate_random_number(): import random return random.randint(1, 10) print(generate_random_number()) print(random.randint(1, 10)) # This will raise an error

Output:

5 NameError: name 'random' is not defined

As you can see, the random module is not available outside the generate_random_number function. To make a module available to your entire script, import it at the top level of your script, outside any function or class.

Avoid Importing in try/except Blocks

In Python, it's a common practice to use try/except blocks to handle exceptions. However, importing modules within these blocks can cause unexpected errors. A common mistake is to put the import statement outside of the try block, which can lead to a NameError if an error is raised before the import.

Here's a code snippet that demonstrates the problem:

try: // Some code... import random num = random.randint(1, 10) except Exception: print("Oh no! An error...") num = random.randint(1, 10) # This could raise an error

In this code, if an exception occurs before the import random line, the import statement will be skipped, and any subsequent code that uses the random module will fail with the NameError: name 'random' is not defined error.

Note: It's best to avoid importing modules in try/except blocks. Instead, always import all necessary modules at the beginning of your script. Importing in these blocks should be reserved for special cases.

By moving the import statement outside the try block, you ensure that the module is always available in your script, even if the code within the try block raises an exception.

Importing Specific Functions from the random Module

Instead of importing the entire random module, you can import only the specific functions you need. This is done using the from ... import ... statement.

from random import randint, choice

Now, you can directly use randint and choice without prefixing them with random.

num = randint(1, 10) letter = choice('abc')

Just make sure to only use these function names when calling them and not random.randint(), for example, to avoid the NameError.

Fixing "NameError: name 'randint' is not defined"

If you encounter the NameError: name 'randint' is not defined, it's likely that you're trying to use the randint function without importing it from the random module.

num = randint(1, 10)

To fix this error, you should import the randint function from the random module.

from random import randint num = randint(1, 10) Resolving "'random' has no attribute 'X'" Error

The error AttributeError: 'module' object has no attribute 'X' occurs when you're trying to access a function or attribute that doesn't exist in the module. This could be due to a typo in the function name or the function might not exist in the module.

import random num = random.randit(1, 10)

In the above code, randit is a typo and it should be randint. Correcting the typo will fix the error.

import random num = random.randint(1, 10) Fixing "'random' has no attribute 'choice'" Error

Another common error you may encounter when working with the random module is the AttributeError: 'module' object has no attribute 'choice'. This can happen when you try to use the choice function from the random module, but Python can't find it.

Here's an example of when you might see this error:

import random print(random.Choice([1, 2, 3, 4, 5]))

Output:

AttributeError: module 'random' has no attribute 'Choice'

The problem here is that Python is case-sensitive, which means Choice and choice are considered different. In the random module, the correct function is choice, not Choice.

To fix this error, you just need to make sure you're using the correct case when calling the choice function:

import random print(random.choice([1, 2, 3, 4, 5]))

Output:

3

With this change, the choice function works as expected, and you won't get the error anymore.

Conclusion

In this Byte, we've covered quite a few possible errors around importing modules, specifically the random module. Specifically, we looked at the error NameError: name 'random' is not defined and how to resolve it.

We've also looked at some related errors that occur when working with the random module, such as AttributeError: 'module' object has no attribute 'choice' and how to fix them.

Categories: FLOSS Project Planets

Mike Driscoll: Python in Excel Announcement!

Planet Python - Wed, 2023-08-23 09:11

Microsoft announced that Python is now a part of Excel! Guido van Rossum mentioned on Twitter that he helped the Excel team add Python to the popular spreadsheet application.

You can get Python in Excel if you have the ability to access Public Preview versions of Office.

This new feature allows you to write Python directly in a cell. For some reason, the Python code will run in Microsoft Cloud rather than locally, so an internet connection is required for this to work. When the cloud is done running, it will return the results, including your plots or other visualizations.

Python in Excel is using the Anaconda distribution of Python rather than the regular Python from Python.org. What that means is that you have many popular scientific packages included automatically, such as pandas, Matplotlib, scikit-learn, etc.

You can learn more by checking out Microsoft’s announcement or visiting Anaconda’s page on the topic.

The post Python in Excel Announcement! appeared first on Mouse Vs Python.

Categories: FLOSS Project Planets

Stack Abuse: Fix "Could not install packages due to an OSError: [WinError 2]" Error

Planet Python - Wed, 2023-08-23 08:39
Introduction

Python, an open-source language used by many developers, sometimes presents us with error messages that can be difficult to decipher. One such error message is "Could not install packages due to an OSError: [WinError 2] The system cannot find the file specified."

In this Byte, we'll take a closer look at this error message, understand its causes, and show several solutions to help you fix it and move on.

Understanding the Error

The "OSError: [WinError 2]" usually happens when Python can't locate a file or directory that it needs in order to execute a task. This could be due to a variety of reasons, like wrong file paths, insufficient permissions, or multiple Python versions causing conflicts.

Note: While the WinError 2 is specific to Windows operating systems, similar errors can occur on other operating systems as well, but they may look slightly different. So understanding the following solutions may also help you fix similar errors on other systems.

Solution 1: Package Installation with --user Option

One common solution to the error is to install the package in the user's local directory using the --user option. This bypasses any system-wide permissions and installs the package in a location that the user has full control over.

$ pip install --user <package-name>

Replace <package-name> with the name of the package you want to install. If the installation works without issue, you should see an output similar to the following:

$ pip install --user requests Collecting requests Downloading requests-2.25.1-py2.py3-none-any.whl (61 kB) ... Installing collected packages: requests Successfully installed requests-2.25.1 Solution 2: Running CMD as Administrator

Sometimes, the issue can be resolved by simply running the Command Prompt as an Administrator. This gives the Command Prompt elevated permissions and may allow Python to find and access the necessary files or directories.

To run CMD as an Administrator, just right-click on the Command Prompt icon and select "Run as administrator". Then try installing the package again.

Solution 3: Dealing with Multiple Python Versions

If you have multiple versions of Python installed on your system, it can lead to conflicts and result in the "OSError: [WinError 2]" error. In these cases, you should specify which Python version you want to use.

Note: You can check the current Python version by running the command python --version.

If you want to use a specific Python version, you can do so by using the py launcher followed by the version number. For example, to use Python 3.8, you'd want to use the following command:

$ py -3.8 -m pip install <package-name>

Of course, replace <package-name> with the name of the package you want to install.

Solution 4: Modifying User Access Permissions

Sometimes, this error can occur due to insufficient user permissions. This is especially common when you're trying to install packages in a directory where your user account does not have write access.

To solve this, just modify the permissions of the Python directory to allow your user account to install packages. Here is how you can do this:

  1. Right-click on the Python directory and select "Properties".
  2. Go to the "Security" tab and click on "Edit".
  3. Select your user account and check the "Full control" box under "Permissions for Users".
  4. Click "Apply" and then "OK" to save changes.

Warning: Be careful when modifying user access permissions. Giving full control to a user can potentially expose your system to security risks!

Solution 5: Creating a Virtual Environment

Another way to solve the "OSError: [WinError 2]" is by creating a virtual environment. A virtual environment is a self-contained directory that has its own Python installation and packages. Each virtual environment is separate from each other, so changes to one env will not affect another.

To create a virtual environment, you can use the venv module that comes with Python 3. Here is how you can do this:

$ python3 -m venv myenv

This will create a new directory called 'myenv' in your current directory. To activate the virtual environment, you can use the following command:

$ source myenv/bin/activate

Now, you should be able to install packages without encountering the "OSError".

Solution 6: Configuring venv to Include System Site Packages

If you're still encountering the error after creating a virtual environment, you can try configuring venv to include system site packages. This means that the packages installed in your system Python will also be available in the virtual environment.

Here is how you can do this:

$ python3 -m venv myenv --system-site-packages

This solution is particularly useful when you're dealing with multiple Python versions. By including system site packages, you can ensure that your virtual environment has access to all the packages installed in your system Python.

Conclusion

In this Byte, we looked at several solutions to the OSError: [WinError 2] that occurs when trying to install Python packages. We discussed how to change user access permissions, create a virtual environment, and configure venv to include system site packages.

Remember, it's important to understand the root cause of the error to choose the most appropriate solution. Always be cautious when modifying system settings and consider creating a virtual environment to avoid issues.

Categories: FLOSS Project Planets

Marcos Dione: disk-usage-while-importing-an-osm-rendering-database

Planet Python - Wed, 2023-08-23 08:10

Preface: I wanted to include the updating part in this post, but it got too long already, so I'll split it in two. I will update this post when the second comes out.

TL;DR version: import seems to use a peak of 15x space compared to the .osm.pbf source file, and a final 10x. If you're tight on space, try --number-processes 1, but it will be way slower. You can also save some 4x size if you have 4x RAM (by not using --flat-nodes).

I decided to try and maintain a daily OpenStreetMap rendering database for Europe. The plan is to, at some point, have a rendering server at home, instead of depending of occasional renderings. For this I will use my old laptop, which has a 1TB SSD and only 8GiB of RAM, which I plan to upgrade to 16 at some point. The machines does a few other things (webmail, backups, cloud with only two users, and other stuff), but the usage is really low, so I think it'll be fine.

This week I cleaned up disk space and I prepared the main import. Due to the space constraints I wanted to know how the disk usage evolves during the import. In particular, I wanted to see if there could be a index creation order which could be beneficial for people with limited space for these operations. Let's check the times first[2]:

$ osm2pgsql --verbose --database gis --cache 0 --number-processes 4 --slim --flat-nodes $(pwd)/nodes.cache --hstore --multi-geometry \ --style $osm_carto/openstreetmap-carto.style --tag-transform-script $osm_carto/openstreetmap-carto.lua europe-latest.osm.pbf 2023-08-18 15:46:20 osm2pgsql version 1.8.0 2023-08-18 15:46:20 [0] Database version: 15.3 (Debian 15.3-0+deb12u1) 2023-08-18 15:46:20 [0] PostGIS version: 3.3 2023-08-18 15:46:20 [0] Setting up table 'planet_osm_nodes' 2023-08-18 15:46:20 [0] Setting up table 'planet_osm_ways' 2023-08-18 15:46:20 [0] Setting up table 'planet_osm_rels' 2023-08-18 15:46:20 [0] Setting up table 'planet_osm_point' 2023-08-18 15:46:20 [0] Setting up table 'planet_osm_line' 2023-08-18 15:46:20 [0] Setting up table 'planet_osm_polygon' 2023-08-18 15:46:20 [0] Setting up table 'planet_osm_roads' 2023-08-19 20:49:52 [0] Reading input files done in 104612s (29h 3m 32s). 2023-08-19 20:49:52 [0] Processed 3212638859 nodes in 3676s (1h 1m 16s) - 874k/s 2023-08-19 20:49:52 [0] Processed 390010251 ways in 70030s (19h 27m 10s) - 6k/s 2023-08-19 20:49:52 [0] Processed 6848902 relations in 30906s (8h 35m 6s) - 222/s 2023-08-19 20:49:52 [0] Overall memory usage: peak=85815MByte current=85654MByte

Now I wonder why I have --cache 0. Unluckily I didn't leave any comments, so I'll have to check my commits (I automated all this with a script :). Unluckily again, this slipped in in a commit about something else, so any references are lost :( Here are the table sizes[1]:

Name | Type | Access method | Size --------------------+-------+---------------+---------------- planet_osm_ways | table | heap | 95_899_467_776 planet_osm_polygon | table | heap | 92_916_039_680 * planet_osm_line | table | heap | 43_485_192_192 * planet_osm_point | table | heap | 16_451_903_488 * planet_osm_roads | table | heap | 6_196_699_136 * planet_osm_rels | table | heap | 3_488_505_856 spatial_ref_sys | table | heap | 7_102_464 geography_columns | view | | 0 geometry_columns | view | | 0

Those marked with * are used for rendering, the rest are kept for update purposes. Let's see the disk usage [click to get the full image]:

The graph shows the rate at which disk is used during the import of data. We can only see some space being freed around 1/10th in, and later at around 2/3rds, but nothing major.

Then osm2pgsql does a lot of stuff, including clustering, indexing and analyzing:

2023-08-19 20:49:52 [1] Clustering table 'planet_osm_polygon' by geometry... 2023-08-19 20:49:52 [2] Clustering table 'planet_osm_line' by geometry... 2023-08-19 20:49:52 [3] Clustering table 'planet_osm_roads' by geometry... 2023-08-19 20:49:52 [4] Clustering table 'planet_osm_point' by geometry... 2023-08-19 20:49:52 [1] Using native order for clustering table 'planet_osm_polygon' 2023-08-19 20:49:52 [2] Using native order for clustering table 'planet_osm_line' 2023-08-19 20:49:52 [3] Using native order for clustering table 'planet_osm_roads' 2023-08-19 20:49:52 [4] Using native order for clustering table 'planet_osm_point' 2023-08-19 22:35:50 [3] Creating geometry index on table 'planet_osm_roads'... 2023-08-19 22:50:47 [3] Creating osm_id index on table 'planet_osm_roads'... 2023-08-19 22:55:52 [3] Analyzing table 'planet_osm_roads'... 2023-08-19 22:57:47 [3] Done task [Analyzing table 'planet_osm_roads'] in 7674389ms. 2023-08-19 22:57:47 [3] Starting task... 2023-08-19 22:57:47 [3] Done task in 1ms. 2023-08-19 22:57:47 [3] Starting task... 2023-08-19 22:57:47 [0] Done postprocessing on table 'planet_osm_nodes' in 0s 2023-08-19 22:57:47 [3] Building index on table 'planet_osm_ways' 2023-08-19 23:32:06 [4] Creating geometry index on table 'planet_osm_point'... 2023-08-20 00:13:30 [4] Creating osm_id index on table 'planet_osm_point'... 2023-08-20 00:20:35 [4] Analyzing table 'planet_osm_point'... 2023-08-20 00:20:40 [4] Done task in 12647156ms. 2023-08-20 00:20:40 [4] Starting task... 2023-08-20 00:20:40 [4] Building index on table 'planet_osm_rels' 2023-08-20 02:03:11 [4] Done task in 6151838ms. 2023-08-20 03:17:24 [2] Creating geometry index on table 'planet_osm_line'... 2023-08-20 03:54:40 [2] Creating osm_id index on table 'planet_osm_line'... 2023-08-20 04:02:57 [2] Analyzing table 'planet_osm_line'... 2023-08-20 04:03:01 [2] Done task in 25988218ms. 2023-08-20 05:26:21 [1] Creating geometry index on table 'planet_osm_polygon'... 2023-08-20 06:17:31 [1] Creating osm_id index on table 'planet_osm_polygon'... 2023-08-20 06:30:46 [1] Analyzing table 'planet_osm_polygon'... 2023-08-20 06:30:47 [1] Done task in 34854542ms. 2023-08-20 10:48:18 [3] Done task in 42630605ms. 2023-08-20 10:48:18 [0] Done postprocessing on table 'planet_osm_ways' in 42630s (11h 50m 30s) 2023-08-20 10:48:18 [0] Done postprocessing on table 'planet_osm_rels' in 6151s (1h 42m 31s) 2023-08-20 10:48:18 [0] All postprocessing on table 'planet_osm_point' done in 12647s (3h 30m 47s). 2023-08-20 10:48:18 [0] All postprocessing on table 'planet_osm_line' done in 25988s (7h 13m 8s). 2023-08-20 10:48:18 [0] All postprocessing on table 'planet_osm_polygon' done in 34854s (9h 40m 54s). 2023-08-20 10:48:18 [0] All postprocessing on table 'planet_osm_roads' done in 7674s (2h 7m 54s). 2023-08-20 10:48:18 [0] Overall memory usage: peak=85815MByte current=727MByte 2023-08-20 10:48:18 [0] osm2pgsql took 154917s (43h 1m 57s) overall.

I tried to make sense of this part. We have 4 workers 1-4 plus one main thread 0. On the 19th, ~20:50, all four workers start working on polygon, line, roads and point respectively. 2h07m54s later worker 3 finishes clustering roads, which is the time reported at the end of the run. But immediately starts creating indexes for it, which take ~15m and ~5m each. It then starts analyzing roads, which I guess it's the task that finishes at 22:57 (~2m runtime)?

Then 2 anonymous tasks, one finishes in 1ms, and the second lingers...? And immediately starts indexing ways. Meanwhile, nodes, which wasn't reported as being processed by any worker, also finishes. Maybe it's the main loop which does it? And if so, why did it finish only now, after only 0s? All this happens on the same second, 10:48:18.

Still on the 19th, at ~23:32, worker 4 starts creating indexes for point. This is ~3h30m after it started clustering it, which is also what is reported at the end. Again, 2 indexes and one analysis for this table, then an anonymous task... which I guess finishes immediately? Because on the same second it creates an index for it, which looks like a pattern (W3 did the same, remember?). It finishes in ~1h40m, so I guess W3's "Done task" at the end is the index it was creating since the 19th?

Given all that, I added some extra annotations that I think are the right ones to make sense of all that. I hope I can use some of my plenty spare time to fix it, see this issue:

2023-08-19 20:49:52 [1] Clustering table 'planet_osm_polygon' by geometry... 2023-08-19 20:49:52 [2] Clustering table 'planet_osm_line' by geometry... 2023-08-19 20:49:52 [3] Clustering table 'planet_osm_roads' by geometry... 2023-08-19 20:49:52 [4] Clustering table 'planet_osm_point' by geometry... 2023-08-19 20:49:52 [1] Using native order for clustering table 'planet_osm_polygon' 2023-08-19 20:49:52 [2] Using native order for clustering table 'planet_osm_line' 2023-08-19 20:49:52 [3] Using native order for clustering table 'planet_osm_roads' 2023-08-19 20:49:52 [4] Using native order for clustering table 'planet_osm_point' 2023-08-19 22:35:50 [3] Creating geometry index on table 'planet_osm_roads'... 2023-08-19 22:50:47 [3] Creating osm_id index on table 'planet_osm_roads'... 2023-08-19 22:55:52 [3] Analyzing table 'planet_osm_roads'... 2023-08-19 22:57:47 [3] Done task [Analyzing table 'planet_osm_roads'] in 7674389ms. 2023-08-19 22:57:47 [3] Starting task [which one?]... 2023-08-19 22:57:47 [3] Done task in 1ms. 2023-08-19 22:57:47 [3] Starting task [which one?]... 2023-08-19 22:57:47 [0] Done postprocessing on table 'planet_osm_nodes' in 0s 2023-08-19 22:57:47 [3] Building index on table 'planet_osm_ways' 2023-08-19 23:32:06 [4] Creating geometry index on table 'planet_osm_point'... 2023-08-20 00:13:30 [4] Creating osm_id index on table 'planet_osm_point'... 2023-08-20 00:20:35 [4] Analyzing table 'planet_osm_point'... 2023-08-20 00:20:40 [4] Done task [Analyzing table 'planet_osm_point'] in 12647156ms. 2023-08-20 00:20:40 [4] Starting task... 2023-08-20 00:20:40 [4] Building index on table 'planet_osm_rels' 2023-08-20 02:03:11 [4] Done task [Building index on table 'planet_osm_rels'] in 6151838ms. 2023-08-20 03:17:24 [2] Creating geometry index on table 'planet_osm_line'... 2023-08-20 03:54:40 [2] Creating osm_id index on table 'planet_osm_line'... 2023-08-20 04:02:57 [2] Analyzing table 'planet_osm_line'... 2023-08-20 04:03:01 [2] Done task [Analyzing table 'planet_osm_line'] in 25988218ms. 2023-08-20 05:26:21 [1] Creating geometry index on table 'planet_osm_polygon'... 2023-08-20 06:17:31 [1] Creating osm_id index on table 'planet_osm_polygon'... 2023-08-20 06:30:46 [1] Analyzing table 'planet_osm_polygon'... 2023-08-20 06:30:47 [1] Done task [Analyzing table 'planet_osm_polygon'] in 34854542ms. 2023-08-20 10:48:18 [3] Done task [Building index on table 'planet_osm_ways'] in 42630605ms. 2023-08-20 10:48:18 [0] Done postprocessing on table 'planet_osm_ways' in 42630s (11h 50m 30s) 2023-08-20 10:48:18 [0] Done postprocessing on table 'planet_osm_rels' in 6151s (1h 42m 31s) 2023-08-20 10:48:18 [0] All postprocessing on table 'planet_osm_point' done in 12647s (3h 30m 47s). 2023-08-20 10:48:18 [0] All postprocessing on table 'planet_osm_line' done in 25988s (7h 13m 8s). 2023-08-20 10:48:18 [0] All postprocessing on table 'planet_osm_polygon' done in 34854s (9h 40m 54s). 2023-08-20 10:48:18 [0] All postprocessing on table 'planet_osm_roads' done in 7674s (2h 7m 54s). 2023-08-20 10:48:18 [0] Overall memory usage: peak=85815MByte current=727MByte 2023-08-20 10:48:18 [0] osm2pgsql took 154917s (43h 1m 57s) overall.

Here's the graph for that section [click to enlarge]:

If you look closer, you can barely see the indexes being created, but the most popping up features are the peaks of space being freed. These correspond to points were some task has just finished and another one begins. Correlating back with those logs, and taking just the 4 bigger ones, we get (in chronological order):

  • Clustering the roads table frees 10GiB.
  • Clustering the point table frees 24GiB.
  • Clustering the line table frees 68GiB.
  • Clustering the polygon table frees 153GiB.

Which makes a lot of sense. But there's nothing for us here if we want to reorder stuff because they're all started in parallel, and we could only reorder them if we set --number-processes 1. The space being freed more or less corresponds (at least in terms of orders of magnitude) to the sizes of the final tables we see above. These are also the main rendering tables. The rest is space freed when finishing creating indexes, but the amounts are so small that I'm not going to focus on them. Also, notice that most of the space is freed after the last of those four events because the original data is bigger and so it takes longer to process.

As a side note, I generated the above graphs using Prometheus, Grafana and its annotations. One of the good things about Grafana is that it has an API that allows you to do a lot of stuff (but surprisingly, not list Dashboards, although I guess I could use the search API for that). I tried to do it with Python and requests but for some reason it didn't work[3]:

#! /sur/bin/env python3 import requests import datetime ​ time = datetime.datetime.strptime('2023-08-18 15:46:20', '%Y-%m-%d %H:%M:%S') # also tried dashboardId=1 data = dict(dashboardUID='fO9lMi3Zz', panelId=26, time=time.timestamp(), text='Setting up tables') requests.put('http://diablo:3000/api/annotations', json=data, auth=('admin', 'XXXXXXXXXXXXXXX')) # <Response [404]>

I left finding out why for another time because I managed to do the annotations with curl:

$ python3 -c 'import datetime, sys; time = datetime.datetime.strptime(sys.argv[1], "%Y-%m-%d %H:%M:%S"); print(int(time.timestamp()) * 1000)' '2023-08-20 10:48:18' 1692521298000 $ curl --verbose --request POST --user 'admin:aing+ai3eiv7Aexu5Shi' http://diablo:3000/api/annotations --header 'Content-Type: application/json' \ --data '{ "dashboardId": 1, "panelId": 26, "time": 1692521298000, "text": "Done task [Building index on table planet_osm_ways]" }'

(Yes. All annotations. By hand. And double-checked the next day, because I was doing them until so late I made a few mistakes.)

Here are the indexes this step creates and their sizes:

Name | Table | Access method | Size ---------------------------------------------+-------------------------------------+---------------+---------------- planet_osm_ways_nodes_bucket_idx | planet_osm_ways | gin | 11_817_369_600 planet_osm_polygon_way_idx | planet_osm_polygon | gist | 11_807_260_672 planet_osm_ways_pkey | planet_osm_ways | btree | 8_760_164_352 planet_osm_polygon_osm_id_idx | planet_osm_polygon | btree | 6_186_663_936 planet_osm_point_way_idx | planet_osm_point | gist | 4_542_480_384 planet_osm_line_way_idx | planet_osm_line | gist | 4_391_354_368 planet_osm_line_osm_id_idx | planet_osm_line | btree | 2_460_491_776 planet_osm_point_osm_id_idx | planet_osm_point | btree | 2_460_352_512 planet_osm_rels_parts_idx | planet_osm_rels | gin | 2_093_768_704 planet_osm_roads_way_idx | planet_osm_roads | gist | 291_995_648 planet_osm_rels_pkey | planet_osm_rels | btree | 153_853_952 planet_osm_roads_osm_id_idx | planet_osm_roads | btree | 148_979_712 spatial_ref_sys_pkey | spatial_ref_sys | btree | 212_992

Even when the sizes are quite big (51GiB total), it's below the peak extra space (124GiB), so we can also ignore this.

Then it's time to create some indexes with an SQL file provided by osm-carto. psql does not print timestamps for the lines, so I used my trusty pefan script to add them[2]:

$ time psql --dbname gis --echo-all --file ../../osm-carto/indexes.sql | pefan -t 2023-08-22T21:08:59.516386: -- These are indexes for rendering performance with <a href="../../posts/openstreetmap/">OpenStreetMap</a> Carto. 2023-08-22T21:08:59.516772: -- This file is generated with scripts/indexes.py 2023-08-22T21:08:59.516803: CREATE INDEX planet_osm_line_ferry ON planet_osm_line USING GIST (way) WHERE route = 'ferry' AND osm_id > 0; 2023-08-22T21:10:17.226963: CREATE INDEX 2023-08-22T21:10:17.227708: CREATE INDEX planet_osm_line_label ON planet_osm_line USING GIST (way) WHERE name IS NOT NULL OR ref IS NOT NULL; 2023-08-22T21:13:20.074513: CREATE INDEX 2023-08-22T21:13:20.074620: CREATE INDEX planet_osm_line_river ON planet_osm_line USING GIST (way) WHERE waterway = 'river'; 2023-08-22T21:14:41.430259: CREATE INDEX 2023-08-22T21:14:41.430431: CREATE INDEX planet_osm_line_waterway ON planet_osm_line USING GIST (way) WHERE waterway IN ('river', 'canal', 'stream', 'drain', 'ditch'); 2023-08-22T21:16:22.528526: CREATE INDEX 2023-08-22T21:16:22.528618: CREATE INDEX planet_osm_point_place ON planet_osm_point USING GIST (way) WHERE place IS NOT NULL AND name IS NOT NULL; 2023-08-22T21:17:05.195416: CREATE INDEX 2023-08-22T21:17:05.195502: CREATE INDEX planet_osm_polygon_admin ON planet_osm_polygon USING GIST (ST_PointOnSurface(way)) WHERE name IS NOT NULL AND boundary = 'administrative' AND admin_level IN ('0', '1', '2', '3', '4'); 2023-08-22T21:20:00.114673: CREATE INDEX 2023-08-22T21:20:00.114759: CREATE INDEX planet_osm_polygon_military ON planet_osm_polygon USING GIST (way) WHERE (landuse = 'military' OR military = 'danger_area') AND building IS NULL; 2023-08-22T21:22:53.872835: CREATE INDEX 2023-08-22T21:22:53.872917: CREATE INDEX planet_osm_polygon_name ON planet_osm_polygon USING GIST (ST_PointOnSurface(way)) WHERE name IS NOT NULL; 2023-08-22T21:26:36.166407: CREATE INDEX 2023-08-22T21:26:36.166498: CREATE INDEX planet_osm_polygon_name_z6 ON planet_osm_polygon USING GIST (ST_PointOnSurface(way)) WHERE name IS NOT NULL AND way_area > 5980000; 2023-08-22T21:30:00.829190: CREATE INDEX 2023-08-22T21:30:00.829320: CREATE INDEX planet_osm_polygon_nobuilding ON planet_osm_polygon USING GIST (way) WHERE building IS NULL; 2023-08-22T21:35:40.274071: CREATE INDEX 2023-08-22T21:35:40.274149: CREATE INDEX planet_osm_polygon_water ON planet_osm_polygon USING GIST (way) WHERE waterway IN ('dock', 'riverbank', 'canal') OR landuse IN ('reservoir', 'basin') OR "natural" IN ('water', 'glacier'); 2023-08-22T21:38:54.905074: CREATE INDEX 2023-08-22T21:38:54.905162: CREATE INDEX planet_osm_polygon_way_area_z10 ON planet_osm_polygon USING GIST (way) WHERE way_area > 23300; 2023-08-22T21:43:20.125524: CREATE INDEX 2023-08-22T21:43:20.125602: CREATE INDEX planet_osm_polygon_way_area_z6 ON planet_osm_polygon USING GIST (way) WHERE way_area > 5980000; 2023-08-22T21:47:05.219135: CREATE INDEX 2023-08-22T21:47:05.219707: CREATE INDEX planet_osm_roads_admin ON planet_osm_roads USING GIST (way) WHERE boundary = 'administrative'; 2023-08-22T21:47:27.862548: CREATE INDEX 2023-08-22T21:47:27.862655: CREATE INDEX planet_osm_roads_admin_low ON planet_osm_roads USING GIST (way) WHERE boundary = 'administrative' AND admin_level IN ('0', '1', '2', '3', '4'); 2023-08-22T21:47:30.879559: CREATE INDEX 2023-08-22T21:47:30.879767: CREATE INDEX planet_osm_roads_roads_ref ON planet_osm_roads USING GIST (way) WHERE highway IS NOT NULL AND ref IS NOT NULL; 2023-08-22T21:47:41.250887: CREATE INDEX real 38m41,802s user 0m0,098s sys 0m0,015s

The generated indexes and sizes:

Name | Table | Access method | Size ---------------------------------------------+-------------------------------------+---------------+---------------- planet_osm_polygon_nobuilding | planet_osm_polygon | gist | 2_314_887_168 planet_osm_line_label | planet_osm_line | gist | 1_143_644_160 planet_osm_polygon_way_area_z10 | planet_osm_polygon | gist | 738_336_768 planet_osm_line_waterway | planet_osm_line | gist | 396_263_424 planet_osm_polygon_name | planet_osm_polygon | gist | 259_416_064 planet_osm_polygon_water | planet_osm_polygon | gist | 188_227_584 planet_osm_roads_roads_ref | planet_osm_roads | gist | 147_103_744 planet_osm_point_place | planet_osm_point | gist | 138_854_400 planet_osm_roads_admin | planet_osm_roads | gist | 47_947_776 planet_osm_polygon_way_area_z6 | planet_osm_polygon | gist | 24_559_616 planet_osm_line_river | planet_osm_line | gist | 17_408_000 planet_osm_polygon_name_z6 | planet_osm_polygon | gist | 13_336_576 planet_osm_roads_admin_low | planet_osm_roads | gist | 2_424_832 planet_osm_polygon_military | planet_osm_polygon | gist | 925_696 planet_osm_line_ferry | planet_osm_line | gist | 425_984 planet_osm_polygon_admin | planet_osm_polygon | gist | 32_768

The sizes are small enough to ignore.

The last step is to import external data[2]:

mdione@diablo:~/src/projects/osm/data/osm$ time ../../osm-carto/scripts/get-external-data.py --config ../../osm-carto/external-data.yml --data . --database gis --port 5433 --username $USER --verbose 2>&1 | pefan.py -t '%Y-%m-%d %H:%M:%S' 2023-08-22 23:04:16: INFO:root:Checking table simplified_water_polygons 2023-08-22 23:04:19: INFO:root: Importing into database 2023-08-22 23:04:20: INFO:root: Import complete 2023-08-22 23:04:21: INFO:root:Checking table water_polygons 2023-08-22 23:06:16: INFO:root: Importing into database 2023-08-22 23:07:12: INFO:root: Import complete 2023-08-22 23:07:49: INFO:root:Checking table icesheet_polygons 2023-08-22 23:07:55: INFO:root: Importing into database 2023-08-22 23:07:59: INFO:root: Import complete 2023-08-22 23:08:01: INFO:root:Checking table icesheet_outlines 2023-08-22 23:08:06: INFO:root: Importing into database 2023-08-22 23:08:09: INFO:root: Import complete 2023-08-22 23:08:11: INFO:root:Checking table ne_110m_admin_0_boundary_lines_land 2023-08-22 23:08:12: INFO:root: Importing into database 2023-08-22 23:08:13: INFO:root: Import complete real 3m57,162s user 0m36,408s sys 0m15,387s

Notice how this time I changed pefan's -t option to have more consistent timestamps. The generated indexes are:

Name | Persistence | Access method | Size -------------------------------------+-------------+---------------+---------------- water_polygons | permanent | heap | 1_201_078_272 icesheet_outlines | permanent | heap | 83_951_616 icesheet_polygons | permanent | heap | 74_571_776 simplified_water_polygons | permanent | heap | 34_701_312 ne_110m_admin_0_boundary_lines_land | permanent | heap | 139_264 external_data | permanent | heap | 16_384 Name | Table | Access method | Size ---------------------------------------------+-------------------------------------+---------------+--------------- icesheet_outlines_way_idx | icesheet_outlines | gist | 3_325_952 icesheet_polygons_way_idx | icesheet_polygons | gist | 679_936 simplified_water_polygons_way_idx | simplified_water_polygons | gist | 630_784 water_polygons_way_idx | water_polygons | gist | 630_784 ne_110m_admin_0_boundary_lines_land_way_idx | ne_110m_admin_0_boundary_lines_land | gist | 24_576 external_data_pkey | external_data | btree | 16_384

Again, too small to care.

Let's summarize what we have.

We started with a 28GiB Europe export, which generated 298GiB of data and indexes, with a peak of 422GiB of usage during the clustering phase. This 'data and indexes' includes an 83GiB flat nodes file, which I need to use because of the puny size of my rendering gig. This means a 10x explosion on data size, but also a 15x peak usage (ballpark figures). As the peak happens in a part of the process that can't be reordered, any other reordering in data import or index generation would provide no advantage. Maybe only if you used --number-processes 1, and given how the clustering of tables are assigned to workers, I think it's already the right order, except maybe swapping the last two, but they're also the smaller two; polygons was already the peak.

[1] psql uses pg_catalog.pg_size_pretty() to print tables and indexes sizes, and it's hard coded on the query that it runs. Thanks to the people at #postgresql:libera.chat I found out that most of the slash commands (for instance, both \d+ and \di+ I used above) are implemented with SQL queries. You can find out which queries by using \set ECHO_HIDDEN 1 and running them again. I sorted them by decreasing size, and I used Python's long int prettifying method :) I also removed uninteresting columns.

[2] I removed lines from this output that I think don't add new interesting info, and I split it in sections so it's easier to read and marks important milestones for the graphs later.

[3] I usually use a highlighter for these snippets, but alas, it's broken right now. I've been kicking down the road a migration out of ikiwiki to probably nikola, maybe it's time.

openstreetmap osm2pgsql grafana python requests curl pefan ikiwiki

Categories: FLOSS Project Planets

The Drop Times: Revisiting the Drupal Modules of Yesteryears

Planet Drupal - Wed, 2023-08-23 06:30
Revisit the past with a 16-year-old ancient Drupal forum discussion. As we rediscover these hidden gems from the past, it's a reminder that the digital landscape is ever-changing. What was innovative in 2007 may not be cutting-edge in 2023, but the principles of adaptability and innovation remain constant.
Categories: FLOSS Project Planets

Eli Bendersky: My favorite prime number generator

Planet Python - Wed, 2023-08-23 06:01

Many years ago I've re-posted a Stack Overflow answer with Python code for a terse prime sieve function that generates a potentially infinite sequence of prime numbers ("potentially" because it will run out of memory eventually). Since then, I've used this code many times - mostly because it's short and clear. In this post I will explain how this code works, where it comes from (I didn't come up with it), and some potential optimizations. If you want a teaser, here it is:

def gen_primes(): """Generate an infinite sequence of prime numbers.""" D = {} q = 2 while True: if q not in D: D[q * q] = [q] yield q else: for p in D[q]: D.setdefault(p + q, []).append(p) del D[q] q += 1 The sieve of Eratosthenes

To understand what this code does, we should first start with the basic Sieve of Eratosthenes; if you're familiar with it, feel free to skip this section.

The Sieve of Eratosthenes is a well-known algorithm from ancient Greek times for finding all the primes below a certain number reasonably efficiently using a tabular representation. This animation from Wikipedia explains it pretty well:

Starting with the first prime (2) it marks all its multiples until the requested limit. It then takes the next unmarked number, assumes it's a prime (because it is not a multiple of a smaller prime), and marks its multiples, and so on until all the multiples below the limit are marked. The remaining unmarked numbers are primes.

Here's a well-commented, basic Python implementation:

def gen_primes_upto(n): """Generates a sequence of primes < n. Uses the full sieve of Eratosthenes with O(n) memory. """ if n == 2: return # Initialize table; True means "prime", initially assuming all numbers # are prime. table = [True] * n sqrtn = int(math.ceil(math.sqrt(n))) # Starting with 2, for each True (prime) number I in the table, mark all # its multiples as composite (starting with I*I, since earlier multiples # should have already been marked as multiples of smaller primes). # At the end of this process, the remaining True items in the table are # primes, and the False items are composites. for i in range(2, sqrtn): if table[i]: for j in range(i * i, n, i): table[j] = False # Yield all the primes in the table. yield 2 for i in range(3, n, 2): if table[i]: yield i

When we want a list of all the primes below some known limit, gen_primes_upto is great, and performs fairly well. There are two issues with it, though:

  1. We have to know what the limit is ahead of time; this isn't always possible or convenient.
  2. Its memory usage is high - O(n); this can be significantly optimized, however; see the bonus section at the end of the post for details.
The infinite prime generator

Back to the infinite prime generator that's in the focus of this post. Here is its code again, now with some comments:

def gen_primes(): """Generate an infinite sequence of prime numbers.""" # Maps composites to primes witnessing their compositeness. D = {} # The running integer that's checked for primeness q = 2 while True: if q not in D: # q is a new prime. # Yield it and mark its first multiple that isn't # already marked in previous iterations D[q * q] = [q] yield q else: # q is composite. D[q] holds some of the primes that # divide it. Since we've reached q, we no longer # need it in the map, but we'll mark the next # multiples of its witnesses to prepare for larger # numbers for p in D[q]: D.setdefault(p + q, []).append(p) del D[q] q += 1

The key to the algorithm is the map D. It holds all the primes encountered so far, but not as keys! Rather, they are stored as values, with the keys being the next composite number they divide. This lets the program avoid having to divide each number it encounters by all the primes known so far - it can simply look in the map. A number that's not in the map is a new prime, and the way the map updates is not unlike the sieve of Eratosthenes - when a composite is removed, we add the next composite multiple of the same prime(s). This is guaranteed to cover all the composite numbers, while prime numbers should never be keys in D.

I highly recommend instrumenting this function with some printouts and running through a sample invocation - it makes it easy to understand how the algorithm makes progress.

Compared to the full sieve gen_primes_upto, this function doesn't require us to know the limit ahead of time - it will keep producing prime numbers ad infinitum (but will run out of memory eventually). As for memory usage, the D map has all the primes in it somewhere, but each one appears only once. So its size is O(\pi(n)), where \pi(n) is the Prime-counting function, the number of primes smaller or equal to n. This can be approximated by O(\frac{n}{ln(n)}) [1].

I don't remember where I first saw this approach mentioned, but all the breadcrumbs lead to this ActiveState Recipe by David Eppstein from way back in 2002.

Optimizing the generator

I really like gen_primes; it's short, easy to understand and gives me as many primes as I need without forcing me to know what limit to use, and its memory usage is much more reasonable than the full-blown sieve of Eratosthenes. It is, however, also quite slow, over 5x slower than gen_primes_upto.

The aforementioned ActiveState Recipe thread has several optimization ideas; here's a version that incorporates ideas from Alex Martelli, Tim Hochberg and Wolfgang Beneicke:

def gen_primes_opt(): yield 2 D = {} for q in itertools.count(3, step=2): p = D.pop(q, None) if not p: D[q * q] = q yield q else: x = q + p + p # get odd multiples while x in D: x += p + p D[x] = p

The optimizations are:

  1. Instead of holding a list as the value of D, just have a single number. In cases where we need more than one witness to a composite, find the next multiple of the witness and assign that instead (this is the while x in D inner loop in the else clause). This is a bit like using linear probing in a hash table instead of having a list per bucket.
  2. Skip even numbers by starting with 2 and then proceeding from 3 in steps of 2.
  3. The loop assigning the next multiple of witnesses may land on even numbers (when p and p are both odd). So instead jump to q + p + p directly, which is guaranteed to be odd.

With these in place, the function is more than 3x faster than before, and is now only within 40% or so of gen_primes_upto, while remaining short and reasonably clear.

There are even fancier algorithms that use interesting mathematical tricks to do less work. Here's an approach by Will Ness and Tim Peters (yes, that Tim Peters) that's reportedly faster. It uses the wheels idea from this paper by Sorenson. Some additional details on this approach are available here. This algorithm is both faster and consumes less memory; on the other hand, it's no longer short and simple.

To be honest, it always feels a bit odd to me to painfully optimize Python code, when switching languages provides vastly bigger benefits. For example, I threw together the same algorithms using Go and its experimental iterator support; it's 3x faster than the Python version, with very little effort (even though the new Go iterators and yield functions are still in the proposal stage and aren't optimized). I can't try to rewrite it in C++ or Rust for now, due to the lack of generator support; the yield statement is what makes this code so nice and elegant, and alternative idioms are much less convenient.

Bonus: segmented sieve of Eratosthenes

The Wikipedia article on the sieve of Eratosthenes mentions a segmented approach, which is also described in the Sorenson paper in section 5.

The main insight is that we only need the primes up to \sqrt{n} to be able to sieve a table all the way to N. This results in a sieve that uses only O(\sqrt{n}) memory. Here's a commented Python implementation:

def gen_primes_upto_segmented(n): """Generates a sequence of primes < n. Uses the segmented sieve or Eratosthenes algorithm with O(√n) memory. """ # Simplify boundary cases by hard-coding some small primes. if n < 11: for p in [2, 3, 5, 7]: if p < n: yield p return # We break the range [0..n) into segments of size √n segsize = int(math.ceil(math.sqrt(n))) # Find the primes in the first segment by calling the basic sieve on that # segment (its memory usage will be O(√n)). We'll use these primes to # sieve all subsequent segments. baseprimes = list(gen_primes_upto(segsize)) for bp in baseprimes: yield bp for segstart in range(segsize, n, segsize): # Create a new table of size √n for each segment; the old table # is thrown away, so the total memory use here is √n # seg[i] represents the number segstart+i seg = [True] * segsize for bp in baseprimes: # The first multiple of bp in this segment can be calculated using # modulo. first_multiple = ( segstart if segstart % bp == 0 else segstart + bp - segstart % bp ) # Mark all multiples of bp in the segment as composite. for q in range(first_multiple, segstart + segsize, bp): seg[q % len(seg)] = False # Sieving is done; yield all composites in the segment (iterating only # over the odd ones). start = 1 if segstart % 2 == 0 else 0 for i in range(start, len(seg), 2): if seg[i]: if segstart + i >= n: break yield segstart + i Code

The full code for this post - along with tests and benchmarks - is available on GitHub.

[1]While this is a strong improvement over O(n) (e.g. for a billion primes, memory usage here is only 5% of the full sieve version), it still depends on the size of the input. In the unlikely event that you need to generate truly gigantic primes starting from 2, even the square-root-space solutions become infeasible. In this case, the whole approach should be changed; instead, one would just generate random huge numbers and use probabilistic primality testing to check for their primeness. This is what real libraries like Go's crypto/rand.Prime do.
Categories: FLOSS Project Planets

LN Webworks: Drupal 10 is LIVE: Upgrade to The Latest Version with LN Webworks Today!

Planet Drupal - Wed, 2023-08-23 02:54

The digital world is a swift river, and staying current is of utmost importance. The advent of Drupal 10 marks a substantial leap towards improved performance, security, and user experience. As a pioneer in 360 Drupal services, LN Webworks is proud to offer seamless Drupal upgrade Services

 Let’s navigate the intricacies of Drupal 10's core attributes, the advantages of embracing this upgrade, and how LN Webworks stands as your steadfast partner for a successful migration.

Categories: FLOSS Project Planets

The Drop Times: Exploring Twin Cities DrupalCamp 2023: A Hub of Drupal Innovation and Collaboration

Planet Drupal - Wed, 2023-08-23 01:32
Discover the Twin Cities DrupalCamp 2023, a two-day event showcasing an array of engaging sessions and peer discussions. This article delves into the highlights of the camp, its featured speakers, and the diverse spectrum of sponsors that contribute to its success.
Categories: FLOSS Project Planets

Salsa Digital: The European Union’s Cyber Resilience Act and how it affects open source

Planet Drupal - Wed, 2023-08-23 01:04
What is the Cyber Resilience Act? The Cyber Resilience Act is proposed legislation that will make it mandatory for hardware and software producers to: Ensure that their products meet minimum security standards before they’re released  Keep their products’ security up-to-date after release (e.g. monitoring security threats and releasing security updates) The Act also aims to make it easy for users to compare the security features of different products. The end goal: to make digital products more secure, to protect citizens and businesses.  View the draft Act in multiple languages What effects will it have on software development? A factsheet about the Act outlines the manufacturer’s obligations.
Categories: FLOSS Project Planets

My work in KDE for August 2023

Planet KDE - Tue, 2023-08-22 20:00

I’m posting this a little bit earlier this month, because I’ll be busy until next week and won’t have a chance to finish anything. I have a lot of new features to show, and important documentation work to showcase!

Plasma #

More KCMs have their footer actions moved to the header, such as Application Style:

So long, footer actions! Kirigami #

The Kirigami Add-ons AboutPage should be nicer to use for non-KDE projects, and stops them from opening up the QML file itself in your default text editor when you click on certain links.

Components in AboutPage now have support for viewing their licenses and webpage!

Now you can read GPL to your heart’s content!

Kirigami’s FlexColumn now respects the spacing parameter, allowing applications that use it to use Kirigami units.

MobileForm/FormCard’s SpinBox and TextField delegates now use the disabled color when needed in it’s labels.

MobileForm/FormCard’s separator now hides when navigating it via keyboard.

Kirigami Add-ons is now REUSE compliant.

Tokodon #

I added support for focal points in media attachments! This means that media won’t be centered carelessly, but will be cropped to the author’s wishes. This feature will appear in 23.12.

This artwork is now focused as it should be!

You can set the focal points in Tokodon as well! I really needed this feature because I post art on Mastodon, and had to use Mastodon Web because that was the only place I could set focal points.

I added a new floating button on the timeline to scroll back to the beginning. This feature will appear in 23.12.

The new scroll back button.

Now you can now share to the Fediverse via Tokodon!. Right now it only supports URLs, but this can be expanded in the future. This feature will appear in 23.12.

When it comes to media in posts, it should be much easier to add them. Now you can drop or paste them into the composer! This feature will appear in 23.12.

Account and hashtag links in profile descriptions should now open inside of Tokodon. This feature will appear in 23.12.

Do you use Pleroma or Akkoma? If you found the experience a bit lacking in Tokodon, it should now be a whole lot better. The language selector should no longer disappear, and the correct maximum character count is fetched. The local visibility option and custom poll limits are supported now too! Everything but the local visibility option will appear in 23.08.

There’s now an option to disable Tokodon asking for the admin scope when logging in. This is useful for admins who don’t feel safe with giving Tokodon permissions, or if your server (like Pixelfed) does not correctly handle it when logging in. This will appear in 23.12.

The new moderation toggle, visible on the login page.

For the cherry on top, the post margins should now look much, much better. This will show up in 23.12.

Bug Squashing #

Sometimes I wonder if it’s worth describing each bug I fix, but I think it’s important to make it known that we’re fixing our old features too. Most of these apply to 23.08 as well:

Work In Progress #

Here’s a list of features I’m working on, but are not quite finished yet. The first is Cross-account actions! This means you can interact with a post from another account, without the hassle of searching for it again and losing your place on the timeline.

Ever wanted to still receive notifications, even though Tokodon is closed? Say hello to Push Notifications! (powered by KUnifiedPush. There’s also going to be more granular notification settings, and the ability to turn off all notifications for an account.

It’s not really interesting to see a notification, so say hello to Tokodon in the Push Notifications KCM!

You can check out the work-in-progress MR here but it’s still a little finicky to set up.

NeoChat #

When you paste an image that contains a URL and a bitmap, NeoChat won’t still put the URL in the textbox as well.

The space drawer context menu should behave and look a little bit nicer now, with icons!

The space context menu.

The avatar used in notifications should be more resilient, including transparent and weirdly shaped avatars.

The attachment chooser dialog should now be functional again, the one that asks if you want to “Paste from Clipboard” or “Choose Local File”.

The room and user completion menu should no longer show invalid entries, such as rooms that have no canonical aliases (so you couldn’t link them anyway!)

The “Explore Rooms” page now looks nicer, and also functions better. It should be clearer when it’s still loading or there’s actually no rooms.

The nicer looking Explore Rooms page.

Image attachments are now resized, improving memory usage and smoothing. In my testing, a 4K image (that didn’t have a thumbnail) now takes up ~30 MB less memory.

More about images, NeoChat will soon have support for sending blurhashes, which improves the loading experience for clients that support it. Blurhash generation should be faster as well.

And of course, logging out shouldn’t crash NeoChat anymore.

Dr. Konqi #

I made the save report dialog a little clearer, and it now defaults to your home folder.

When on a mobile device, the buttons on the main page are now laid out vertically.

Dr. Konqi when it’s vertically constrained. Discover #

Fixed a bug in the Flatpak backend where Discover could crash… when trying to log an error.

Documentation #

There’s an obvious documentation problem in many KDE repositories: they have no README! This month, I took some time to write out a basic template (with the help of Thiago, thanks!) and pushed them to some projects that desperately needed one.

If you haven’t heard me soapbox about this before, READMEs are small but vital lifeline for a project. To some people (including myself), if a project doesn’t have a detailed README it feels more “dead” even if the commit history is active. Some repositories are closely related (e.g. Breeze and Breeze Icons) but have no visible link between them on Invent - and let’s be honest, it’s search functionality sucks. To make matters more complex, some of our projects may be viewed elsewhere, like from the official GitHub mirror!

Here’s some important points I try to include a README:

  1. What is the purpose of this project?
  2. If this is a end-user facing application, what does it look like? (Note: not everyone knows what a “Gwenview” is but might be able to recognize it from a product screenshot.)
  3. If this is a complex, multi-layered and multi-component library - what is the folder called “autotests” or “kcms”? What does it contain? For some projects, this information shouldbe separated into a CONTRIBUTING document if it’s too long.
  4. If I want to contribute, how do I do it? And link to the community.kde.org if you can, instead of trying to maintain manual instructions. There can be exceptions.

For new developers, it’s important to strike down these blockers because it may not be totally obvious to some people what a “Plasma Framework” is. Or that you’re not actually supposed to open a GitHub PR or a GitLab issue. It’s a document you really only need to write once, update sometimes and it passively helps everyone.

And here’s the KDE projects I got to so far:

I encourage anyone reading this to look at the README for the projects you maintain (or even contribute to), and see if there’s room for any improvement!

Somewhat related, some repositories were missing Bugzilla links on Invent and someone told me how to fix them:

Thanks to everyone who helped review and push these along!

Outside of KDE #

For Akkoma users, I contributed limited support for the Mastodon Preferences API! This is useful for Tokodon and other clients.

I fixed a qmlformat bug that added an extra space after target bindings. I hinted at what else we need to work on before qmlformat can be adopted in KDE here.

Categories: FLOSS Project Planets

PyCoder’s Weekly: Issue #591 (Aug. 22, 2023)

Planet Python - Tue, 2023-08-22 15:30

#591 – AUGUST 22, 2023
View in Browser »

Python Polars: A Lightning-Fast DataFrame Library

Welcome to the world of Polars, a powerful DataFrame library for Python! In this showcase tutorial, you’ll get a hands-on introduction to Polars’ core features and see why this library is catching so much buzz.
REAL PYTHON

Introducing Immortal Objects for Python

This article explains immortal objects (PEP 683) which are excluded from garbage collection. This causes performance and shared memory improvements for large architectures.
ENGINEERING AT META

Support Your Next Python Project With a .XYZ Domain

The .xyz domain extension was built for tech-forward applications. Python developers can showcase their skills or support their latest project using a .xyz domain name. New .xyz domains are on sale for just about $2 at Porkbun now. Get your .xyz domain →
PORKBUN sponsor

End-to-End Testing With Python and Playwright

This post shows you how to get started with Playwright, add end-to-end tests to an existing project, and automate running it using GitHub Actions.
MARKOS GOGOULOS • Shared by Laura Stephens

Introducing Python in Excel

Microsoft has announced that they’re embedding Python into Excel through a partnership with Anaconda. Read on for details.
STEFAN KINNESTRAND

Articles & Tutorials Using the NumPy Random Number Generator

In this tutorial, you’ll take a look at the powerful random number capabilities of the NumPy random number generator. You’ll learn how to work with both individual numbers and NumPy arrays, as well as how to sample from a statistical distribution.
REAL PYTHON

Not-So-Casual Performance Optimization in Python

Nathaniel did a small project where he implemented the sum of inverse squares in multiple programming languages. The Python version was rather slow. This article talks about alternate ways of writing the Python code for better performance.
NATHANIEL THOMAS

Companies like GitLab, Snowflake, and Slack Scan Their Code for Vulnerabilities Using Semgrep

Scan your code and dependencies for security vulnerabilities for free with Semgrep - the trusted OSS tool used by top companies like GitLab, Snowflake, and Slack. No security expertise needed, simply add your project and let Semgrep do the work in just minutes →
SEMGREP sponsor

Solving a Simple Puzzle Using SymPy

This short blog post shows you how to formulate a series of equations and solve them using SymPy for a small geometric brain teaser. There is also an associated Hacker News Discussion.
STEFAN PETREA

Process Images Using the Pillow Library and Python

In this video course, you’ll learn how to use the Python Pillow library to deal with images and perform image processing. You’ll also explore using NumPy for further processing, including to create animations.
REAL PYTHON course

Python: Just Write SQL

This article shows you how to use SQL directly from Python, serializing to a dataclass instead of using an ORM. It has an associated Hacker News Discussion.
JOÃO FERREIRA

Create Your Own Diff-Tool Using Python

This article teaches you how to create your own diff-tool using pure Python. In addition to covering how to diff content, it also incorporates argparse to manage the command line options.
FLORIAN DAHLITZ

What Learning APL Taught Me About Python

Sometimes learning a new language provides perspective in the ones you already know. Rodrigo picked up APL, and this article discusses what that taught him about Python.
RODRIGO GIRÃO SERRÃO

GitHub Now Scans Public Issues for PyPI Secrets

This PyPI blog post talks about the integration between them and GitHub to help ensure accidental exposure of PyPI secrets is quickly dealt with.
THE PYTHON PACKAGE INDEX

Reference Counting Internals of Python

Explore CPython’s memory management through a deep dive into Reference Counting. Learn how it functions, its implementation and its limitations
ABHINAV UPADYAY

Learn How to Deploy Scientific AI Models to Edge Environments, Using OpenVINO Model Server

🤔 Can cell therapy and AI be used together? Learn how to efficiently build and deploy scientific AI models using open-source technologies with Beckman Coulter Life Sciences at our upcoming DevCon OpenVINO webinar. #DevCon2023
INTEL CORPORATION sponsor

Avoiding Silent Failures: Best Practices for Error Handling

The Zen of Python famously states, “Errors should never pass silently.” But just what does that mean, and what should you do instead?
BOB BELDERBOS

Projects & Code functime: Time-Series ML and Embeddings at Scale

GITHUB.COM/DESCENDANT-AI

learndb-py: Learn Database Internals by Implementing One

GITHUB.COM/SPANDANB

outlines: Generative Model Programming

GITHUB.COM/NORMAL-COMPUTING

trafilatura: Python & Command-Line Tool to Get Web Text

GITHUB.COM/ADBAR

PythonMonkey: JavaScript Engine Embedded in Python

GITHUB.COM/DISTRIBUTIVE-NETWORK

Events Weekly Real Python Office Hours Q&A (Virtual)

August 23, 2023
REALPYTHON.COM

PyCon Latam 2023

August 24 to August 27, 2023
PYLATAM.ORG

PyDelhi User Group Meetup

August 26, 2023
MEETUP.COM

PythOnRio Meetup

August 26, 2023
PYTHON.ORG.BR

Singula Python Meetup Online

August 29, 2023
EVENTBRITE.COM

PyConTW 2023

September 2 to September 4, 2023
PYCON.ORG

Happy Pythoning!
This was PyCoder’s Weekly Issue #591.
View in Browser »

[ Subscribe to 🐍 PyCoder’s Weekly 💌 – Get the best Python news, articles, and tutorials delivered to your inbox once a week >> Click here to learn more ]

Categories: FLOSS Project Planets

COSCUP Unveiled

Open Source Initiative - Tue, 2023-08-22 14:20

(Thanks to Paloma Oliveira for this contribution!)

Reflecting on how to improve our open communities

Navigating uncharted waters often leads to intriguing discoveries. Imagine immersing yourself in a realm that commemorates a quarter-century of Open Source accomplishment. Invited by Open Source Initiative (OSI) to reflect upon the 25 years of Open Source at COSCUP, a conference in Taiwan that focuses on coders, users and promoters of Open Source, I threw myself into these waters by proposing a review of history that is not unique around the globe, taking my perspective from South America and Europe to Asia, where I had never before ventured. 

You can read a full transcript of my talk here and check my critical take on the topic. After all, to review is to be able to identify where we failed and to be able to proceed from there.

More than offering something, I return with baggage full of new perspectives that made me renew my vision about the importance of Open Source in global and local contexts. COSCUP is a distinguished conference, drawing together Open Source enthusiasts mostly from around Asia: Japan, Malaysia, Indonesia, Singapore, Hong Kong and many others were heavily present. In this piece, we’ll embark on a thoughtful exploration of COSCUP’s defining characteristics, offering a nuanced perspective that distinguishes it in the bustling landscape of technology events.

So, what makes COSCUP a great conference?

From and to the communities

Spread across two days, the conference adopts a unique structure, with sessions categorized under different tracks, each managed by dedicated communities. This approach empowers participants to navigate subjects aligned with their interests, fostering connections with kindred spirits. The emphasis on community-led curation breathes fresh air into the conventional conference model. You can find the topics/ communities here  https://coscup.org/2023/en/topics.

Image from the author from the hallway signage informing of rooms and content curated by communities 

A melting pot of global and local voices

Navigating through COSCUP’s conference offerings went beyond language preferences in English or Chinese. In reality, it was a journey through a tapestry of diverse voices, akin to a symphony of polyphonies. This allowed for an intriguing blend of both global and local perspectives. 

While English has emerged as the dominant language in the technology landscape, serving as a common thread for communication, relying solely on English excludes those without fluency. This limitation bears various consequences; fluently expressing and understanding nuances in a language beyond one’s mother tongue is a privilege. Creating spaces for regional languages broadens participation and welcomes those who are still learning or aiming to navigate the intricate world of Open Source. This inclusion empowers individuals to express their thoughts across a broad spectrum, fostering the exploration of local solutions.

An illustration of the need for such inclusivity can be found in conversations with individuals like Naru from the SODA Foundation who asked us to consider the challenge of non-alphabetic writing systems. Naru highlighted the case of LibreOffice, which has a scarcity of developers fluent in logographic languages. This linguistic gap causes code disruptions, as changes from Europe and America often disregard alternative writing systems. How can this issue be tackled without understanding the unique requirements of such languages? This showcases the necessity for more developers who are versed in these languages to contribute actively and have a say in decisions. Hence, it becomes evident that influential conferences like COSCUP should maintain programs that encompass a broad spectrum, catering to both global connections and local preservation of diverse cultures.

In the conference schedule you can find communities from Hong Kong, a special Japan track and several talks about local dialect preservation, such as the talk: “How can we utilize Wikidata to protect Puyuma, an endangered language?

Shining a spotlight on open design

Organized by Women Techmakers Taiwan and curated by catcatcatcat, this track directed attention to the intersection of development and design, a facet that often remains overlooked in the Open Source landscape. 

Unlike traditional tech conferences, where technical aspects often take precedence, the curated workshops and talks placed the spotlight on design’s pivotal role in enhancing usability. This spotlight reflects a broader understanding that technology should seamlessly align with users’ needs. The renewed focus on open design casts light on a pivotal aspect that influences the adoption and longevity of Open Source solutions.

While I’ve observed a growing trend of incorporating this topic into conferences like FOSS-Backstage and AllThingsOpen, it often remains on the periphery. However, at COSCUP, the dedicated room hosted a series of workshops and talks that delved beyond the technology driving creations. The emphasis extended to the synergy between developers and designers, with a paramount focus on the intrinsic purpose of technology – to serve users.

Historically, Open Source has leaned heavily towards lauding the technical aspects of its creations, an inclination that spawns a cascade of challenges. From an inclusion standpoint, this often hampers opportunities for contributions from diverse perspectives, particularly when these technologies directly influence various demographics.

Image taken from Eriol Fox & Abhishek Sharma workshop Becoming better contributors to social impact

From a sustainability perspective, technologies devoid of usability contribute to the generation of excessive waste. Although digital, the hidden mound of discarded components remains invisible. If we could visualize it, the space consumed by discarded hardware, the energy expended by servers, electrical consumption, data usage, and more would likely span vast expanses. Surprisingly, cloud storage – in existence for over a decade – has become more polluting than the aviation industry. Amidst the digital revolution’s accelerated production of software and the cost-effective proliferation of hardware and software, minimal thought has been spared for the unsustainability of this excessive production. Moreover, the repercussions of this surplus on the physical world remain woefully unaddressed.

From both a software and product perspective, technology devoid of usability and tangible user value fails to find traction within communities or markets. The pursuit of acceleration often overlooks a pivotal question: Why and for whom are we creating this technology? While development timelines might differ from research periods, harmony between these phases ensures the birth of superior and more sustainable creations.

In essence, the COSCUP conference didn’t just highlight open design’s significance, it underscored the imperative need to integrate user-centric perspectives into Open Source innovation. This paradigm shift extends beyond code, advocating for a holistic approach that recognizes the interplay of technology, design and its real-world implications.

Prioritizing well-being: nurturing mental health and balance

For a while now, both Europe and America have been awash with articles and talks addressing mental health issues, burnout and the impostor syndrome. A growing chorus stresses the urgency of spotlighting these challenges, emphasizing individual care and self-preservation. 

Conferences can often become grueling endeavors. The short timeframes that cram copious amounts of information, combined with the jet lag and the effort of navigating languages that aren’t always native, transform conference participation into a marathon. While undeniably exciting, it’s essential to recognize that conferences also constitute a form of work, especially in the context of Open Source, which largely resides within the professional sphere.

Seldom do conferences provide havens for respite, such as quiet rooms (which are rare), but other great conferences like PyCon PyData Berlin and KubeCon do offer the space. This initiative marked a commendable effort towards acknowledging the attendees’ well-being. However, COSCUP took this commitment a step further. By constraining conference hours from 8:50 AM to 5:00 PM, the organizers ensured that attendees’ time, mirroring regular working hours, remained within manageable limits. This pragmatic approach mitigated the risk of exhaustion, a common side effect of conferences.

In addition, conversations with Zoei, who boasts a background in psychology and is a key contributor to the well-being initiatives at COSCUP, provided valuable insights. She emphasized the transition from rhetoric to action. This commitment was tangibly manifested in the Healing Market, offering a range of amenities – from massage room to meditation sessions and even wine yoga – all designed to offer attendees much-needed moments of solace during the conference days.

Image from the author from the hallway signage informing about Healing Market offerings: yoga, meditation, board game and parents workshop Becoming better contributors to social impact

Notably, COSCUP extended its support to attendees who are parents, a demographic often left underserved in such environments. By dedicating specialized rooms, sessions and workshops to parents and children, COSCUP fostered an environment where developers and enthusiasts with children could participate fully without compromising on their family responsibilities.

Image from the author showing door signage for the parent-child workshop

Image from the author attending Wine Yoga session

In conclusion, COSCUP’s stance on well-being transcended the theoretical to embrace the practical, acknowledging the multifaceted nature of conference participation. The meticulous considerations for attendees’ mental and physical well-being reflect the conference’s commitment to holistic care, setting an example for other events to prioritize the welfare of their participants.

Beyond the conference halls: embracing cultural diversity

COSCUP invited participants to explore the rich tapestry of its host city beyond the conference walls. As a first-time traveler to Asia, I embarked on this journey with a mix of anticipation and trepidation. The value of in-person conferences became evident as I immersed myself in different cultures. Tangible experiences – from unfamiliar scents to novel flavors – offer a depth of engagement that virtual interactions can’t replicate. COSCUP’s encouragement to step beyond the digital realm aligns perfectly with the yearning for immersive experiences. International Exchanges Cross Community Gathering, Taipei City Tour, and several other community lead gatherings offered opportunities for meetings outside the conference walls, allowing participants to strengthen their interpersonal relationships.

Image from the author with other attendants from all around Asia during the international gathering night

Image from the author with COSCUP organizers at the end of the Taipei City Tour, which included a walk-in tour through the Old Town and a Chinese medicine experience in the Daily Health store

Image from the author with other COSCUP participants making medicinal tea at the Daily Health Chinese Medicine shop experience

Why attend conferences?

While digital interactions possess the potential for depth, the allure of in-person conferences holds a distinct magic. This allure magnifies when we immerse ourselves in diverse cultures. Even when we share common themes, the prism of reception and cultural context transforms how we comprehend and receive information. Sensory dimensions such as scents, tastes, textures and even ambient temperature intricately shape our attention and interpretation. The symphony of these sensations underpins why we travel; it’s an experience beyond the distraction-prone realm of simultaneous online engagement.

I seized the chance to integrate myself into the country’s fabric and cross the east coast of the island by bike. I gathered some useful information about it which you can read here.

The essence of conferences truly thrives in the hushed conversations, spontaneous exchanges, and the symphony of interaction beyond the spotlight. Sensory immersion plays a pivotal role—varied sights, sounds, scents and tastes provide a holistic understanding of the conference’s backdrop and its cultural nuances. These elements, often absent in virtual participation, infuse layers of depth into the learning process. The impact of international conference travel transcends the confines of the conference hall, offering a multifaceted experience that enriches both professional and personal growth. It serves as a catalyst for forging meaningful connections, fostering a broader comprehension of global perspectives, and embracing the transformative potency of diverse cultural viewpoints.

Conclusion

Beyond the conference sessions, COSCUP’s true essence lies in the connections forged, dialogues exchanged, and camaraderie nurtured within its corridors. It’s a collective journey that fuels personal evolution and transformation. The intricate tapestry of community engagement, well-being initiatives, and cultural immersion makes COSCUP an event that leaves an indelible mark.

As we contemplate the multifaceted nature of COSCUP, let’s acknowledge its distinctive blend of global perspectives, user-centric design and well-being advocacy. COSCUP transcends being just a tech event; it’s a platform that fosters connections, celebrates diversity, and sparks meaningful conversations that cross geographical boundaries. This is the true spirit of COSCUP – a narrative woven with threads of innovation, inclusivity and cross-cultural understanding.

The post <span class='p-name'>COSCUP Unveiled</span> appeared first on Voices of Open Source.

Categories: FLOSS Research

Freelock Blog: Rate Limiting an aggressive bot in Nginx

Planet Drupal - Tue, 2023-08-22 13:58
Rate Limiting an aggressive bot in Nginx John Locke Tue, 08/22/2023 - 10:58

High load isn't necessarily an emergency, but it may be a heads-up before a site noticeably slows down. Sometimes there are weird spikes that just go away, but sometimes this is an indication of a Denial of Service.

Rate Limiting

NGinx has rate limiting that can be used to handle cases where a slow URL is targeted. Today one of our sites had high load alerts. Here's how I handled it:

Categories: FLOSS Project Planets

KDE: A Day in the Life the KDE Snapcrafter Part 2

Planet KDE - Tue, 2023-08-22 13:55
KDE Mascot

Much to my dismay, I figured out that my blog has been disabled on the Ubuntu planet since May. If you are curious about what I have been up to, please go to the handy links -> and read up! This post is a continuation of last weeks https://www.scarlettgatelymoore.dev/kde-a-day-in-the-life-of-the-kde-snapcrafter/

IMPORTANT: I am still looking for a super awesome team lead for a super amazing project involving KDE and Snaps. Time is running out and well the KDE world will be a better a better place if this project goes through! I would like to clarify, this is a paid position! A current KDE developer would be ideal as it is a small team so your time will be split managing and coding alike. If you or anyone you know might be interested please contact me ASAP!

Snaps: I am wrapping up the 23.04.3 KDE applications release! Head on over to https://snapcraft.io/search?q=KDE and enjoy! We are now up to 180 snaps! PIM snaps will be slowly rolling in as they go through manual reviews for D-Bus.

Snapcraft: minor fix in qmake plugin found by ruff.

Launchpad: I almost have approval for per application repository snapcraft files, but I have to prove it will work to our benefit and not cause loads of polling etc. So I have been testing various methods of achieving such a task, and so far I have come up with launchpads ability to watch and download release tarballs into a project. I will then need to script getting the tarball and pushing it to a bzr branch from which I can create a proper snap recipe. Unfortunately, my proper snap recipe fails! Hopefully a very helpful cjwatson will chime in, or if anyone wants to take a gander please chime in here: https://bugs.launchpad.net/launchpad/+bug/2031307

As reality sets in that my project may not happen if I don’t find anyone, I need help surviving until I find work or funding to continue my snap work ( still much to do! ) If you or anyone else you know enjoys our snaps please consider a donation, anything helps! Please share! Thank you for your consideration!

A script element has been removed to ensure Planet works properly. Please find it in the original post.
Categories: FLOSS Project Planets

Pages