FLOSS Project Planets

Talking Drupal: Talking Drupal #331 - Migrating Paragraphs for The National Zoo

Planet Drupal - Mon, 2022-01-24 14:00

Today we are talking about Migrating Paragraphs for the National Zoo with Mohammed El-Khatib.

TalkingDrupal.com/331

Topics
  • Nic - Family flew home
  • Abby - Little free library – Hades game
  • Mohammed - Migrating D9 to Tailwind CSS and Alpine – Travel plans fell through
  • John - Listening to TD with kids
  • National Zoo
    • Favorite animal
  • How the National Zoo uses Drupal
  • Why the zoo needed to migrate paragraphs
  • Mapping migration strategy
  • Tool
    • Migrate Plus
    • Migrate Tools
    • Migrate Upgrade
  • Nested Paragraphs
  • Translation
  • Any strategies to migrate
  • Resources for help
  • Tips and Tricks
  • What is next for National Zoo
  • Anything to add?
Resources Guests

Mo El-Khatib - mmelkhatib

Hosts

Nic Laflin - www.nLighteneddevelopment.com @nicxvan John Picozzi - www.epam.com @johnpicozzi Abby Bowman - www.linkedin.com/in/arbowman @abowmanr

MOTW
  • Draggable views DraggableViews makes rows of a view “draggable” which means that they can be rearranged by Drag’n’Drop.
Categories: FLOSS Project Planets

TEN7: Adding Bundle Subclasses to Drupal Core 9.3.0 (Part 1)

Planet Drupal - Mon, 2022-01-24 11:47
The technical details of why and how TEN7 contributed bundle subclasses to Drupal Core 9.3.0.
Categories: FLOSS Project Planets

PyCharm: Together, We Supported Python!

Planet Python - Mon, 2022-01-24 11:03

Last November, PyCharm joined forces with the Python Software Foundation (PSF) for their end-of-the-year fundraiser. From November 9 to December 1, all proceeds from every new PyCharm Professional Edition license purchased with the discount code ‘SUPPORTPYTHON21’ went to the PSF’s general fund.

The Python Software Foundation is the main organization behind the Python programming language. As a non-profit organization, the PSF depends on sponsorships and donations to support its work. You can always donate yourself through their website.

JetBrains and the PSF would like to thank all of you who took part in this campaign. Together, we raised $25,000 in less than a month! Contributions like those from PyCharm users help the PSF maintain a healthy balance and continue to support the Python community and its various outreach and diversity programs.

We hope that you enjoy using Python in 2022 and that PyCharm will be a powerful and reliable partner for your journey!

The PyCharm Team

Categories: FLOSS Project Planets

ItsMyCode: Adding new column to existing DataFrame in Pandas

Planet Python - Mon, 2022-01-24 10:49

In this article, we will look at different ways to adding new column to existing DataFrame in Pandas. 

Let us create a simple DataFrame that we will use as a reference throughout this article to demonstrate adding new columns into Pandas DataFrame.

# import pandas library import pandas as pd # create pandas DataFrame df = pd.DataFrame({'team': ['India', 'South Africa', 'New Zealand', 'England'], 'points': [10, 8, 3, 5], 'runrate': [0.5, 1.4, 2, -0.6], 'wins': [5, 4, 2, 2]}) # print the DataFrame print(df)

Output

team points runrate wins 0 India 10 0.5 5 1 South Africa 8 1.4 4 2 New Zealand 3 2.0 2 3 England 5 -0.6 2

Now that we have created a DataFrame let’s assume that we need to add a new column called “lost”, which holds the count of total matches each team has lost.

Method 1: Declare and assign a new list as a column

The simplest way is to create a new list and assign the list to the new DataFrame column. Let us see how we can achieve this with an example.

# import pandas library import pandas as pd # create pandas DataFrame df = pd.DataFrame({'team': ['India', 'South Africa', 'New Zealand', 'England'], 'points': [10, 8, 3, 5], 'runrate': [0.5, 1.4, 2, -0.6], 'wins': [5, 4, 2, 2]}) # print the DataFrame print(df) # declare a new list and add the values into the list match_lost = [2, 1, 3, 4] # assign the list to the new DataFrame Column df["lost"] = match_lost # Print the new DataFrame print(df)

Output

team points runrate wins lost 0 India 10 0.5 5 2 1 South Africa 8 1.4 4 1 2 New Zealand 3 2.0 2 3 3 England 5 -0.6 2 4 Method 2: Using the DataFrame.insert() method

The disadvantage of the above approach is that we cannot add the column at the specified position, and by default, the column is inserted towards the end, making it the last column.

We can overcome the issue using the pandas.DataFrame.insert() method. This method is useful when you need to insert a new column in a specific position or index.

In the below example, let us insert the new column “lost” before the “wins” column. We can achieve this by inserting a new column at index 2.

# import pandas library import pandas as pd # create pandas DataFrame df = pd.DataFrame({'team': ['India', 'South Africa', 'New Zealand', 'England'], 'points': [10, 8, 3, 5], 'runrate': [0.5, 1.4, 2, -0.6], 'wins': [5, 4, 2, 2]}) # print the DataFrame print(df) # insert the new column at the specific position df.insert(3, "lost", [2, 1, 3, 4], True) # Print the new DataFrame print(df)

Output

team points runrate lost wins 0 India 10 0.5 2 5 1 South Africa 8 1.4 1 4 2 New Zealand 3 2.0 3 2 3 England 5 -0.6 4 2 Method 3: Using the DataFrame.assign() method

The pandas.DataFrame.assign() method is used if we need to create multiple new columns in a DataFrame.

This method returns a new object with all original columns in addition to new ones. All the existing columns that are re-assigned will be overwritten.

In the below example, we are adding multiple columns to Pandas DataFrame.

# import pandas library import pandas as pd # create pandas DataFrame df = pd.DataFrame({'team': ['India', 'South Africa', 'New Zealand', 'England'], 'points': [10, 8, 3, 5], 'runrate': [0.5, 1.4, 2, -0.6], 'wins': [5, 4, 2, 2]}) # print the DataFrame print(df) # append multiple columns to Pandas DataFrame df2 = df.assign(lost=[2, 1, 3, 4], matches_remaining=[2, 3, 1, 1]) # Print the new DataFrame print(df2)

Output

team points runrate wins lost matches_remaining 0 India 10 0.5 5 2 2 1 South Africa 8 1.4 4 1 3 2 New Zealand 3 2.0 2 3 1 3 England 5 -0.6 2 4 1 Method 4: Using the pandas.concat() method

We can also leverage the pandas.concat() method to concatenate a new column to a DataFrame by passing axis=1 as an argument. This method returns a new DataFrame after concatenating the columns.

# import pandas library import pandas as pd # create pandas DataFrame df = pd.DataFrame({'team': ['India', 'South Africa', 'New Zealand', 'England'], 'points': [10, 8, 3, 5], 'runrate': [0.5, 1.4, 2, -0.6], 'wins': [5, 4, 2, 2]}) # print the DataFrame print(df) # create a new DataFrame df2 = pd.DataFrame([[1, 2], [2, 1], [3, 4], [0, 3]], columns=['matches_left', 'lost']) # concat and Print the new DataFrame print(pd.concat([df, df2], axis=1))

Output

team points runrate wins matches_left lost 0 India 10 0.5 5 1 2 1 South Africa 8 1.4 4 2 1 2 New Zealand 3 2.0 2 3 4 3 England 5 -0.6 2 0 3 Method 5: Using the Dictionary

Another trick is to create a dictionary to add a new column in Pandas DataFrame. We can use the existing columns as Key to the dictionary and assign values respectively to the new column.

# import pandas library import pandas as pd # create pandas DataFrame df = pd.DataFrame({'team': ['India', 'South Africa', 'New Zealand', 'England'], 'points': [10, 8, 3, 5], 'runrate': [0.5, 1.4, 2, -0.6], 'wins': [5, 4, 2, 2]}) # print the DataFrame print(df) # Create a new dictionary with keys as existing column # and the values of new column match_lost = {2: 'India', 1: 'South Africa', 3: 'New Zealand', 0: 'England'} # assign the dictionary to the DataFrame Column df['lost'] = match_lost # print Dataframe print(df)

Output

team points runrate wins lost 0 India 10 0.5 5 2 1 South Africa 8 1.4 4 1 2 New Zealand 3 2.0 2 3 3 England 5 -0.6 2 0 Conclusion

In this article, we saw the 5 approaches creating and assigning a list, insert(), assign(), concat() and dictionary to insert new columns into Pandas DataFrame or overwrite the existing ones. Depending on the need and the requirement, you can choose one of the methods specified which are more suitable.

Categories: FLOSS Project Planets

Real Python: Modulo String Formatting in Python

Planet Python - Mon, 2022-01-24 09:00

If you’re writing modern Python code with Python 3, you’ll probably want to format your strings with Python f-strings. However, if you’re working with older Python codebases, you’re likely to encounter the string modulo operator for string formatting.

If you’re reading or writing Python 2 code, it’ll help if you’re familiar with this technique. Because the syntax still works in Python 3, you might even see developers use it in modern Python codebases.

In this tutorial, you’ll learn how to:

  • Use the modulo operator (%) for string formatting
  • Convert values into specific types before inserting them into your string
  • Specify the horizontal space a formatted value occupies
  • Fine-tune the display using conversion flags
  • Specify values using dictionary mapping instead of tuples

If you’re acquainted with the printf() family of functions of C, Perl, or Java, then you’ll see that these don’t exist in Python. However, there’s quite a bit of similarity between printf() and the string modulo operator, so if you’re familiar with printf(), then a lot of the following will feel familiar.

On the other hand, if you aren’t familiar with printf(), don’t worry! You don’t need any prior knowledge of printf() to master modulo string formatting in Python.

Free Bonus: Click here to get our free Python Cheat Sheet that shows you the basics of Python 3, like working with data types, dictionaries, lists, and Python functions.

Use the Modulo Operator for String Formatting in Python

You’ve probably used the modulo operator (%) before with numbers, in which case it computes the remainder from a division:

>>>>>> 11 % 3 2

With string operands, the modulo operator has an entirely different function: string formatting.

Note: These two operations aren’t very much alike. They only share the same name because they are represented by the same symbol (%).

Here’s what the syntax of the string modulo operator looks like:

<format_string> % <values>

On the left side of the % operator, <format_string> is a string containing one or more conversion specifiers. The <values> on the right side get inserted into <format_string> in place of the conversion specifiers. The resulting formatted string is the value of the expression.

Get started with an example where you call print() to display a formatted string using the string modulo operator:

>>>>>> print("%d %s cost $%.2f" % (6, "bananas", 1.74)) 6 bananas cost $1.74

In addition to representing the string modulo operation itself, the % character also denotes the beginning of a conversion specifier in the format string—in this case, there are three: %d, %s, and %.2f.

In the output, Python converted each item from the tuple of values to a string value and inserted it into the format string in place of the corresponding conversion specifier:

  • The first item in the tuple is 6, a numeric value that replaces %d in the format string.
  • The next item is the string value "bananas", which replaces %s.
  • The last item is the float value 1.74, which replaces %.2f.

The resulting string is 6 bananas cost $1.74, as demonstrated in the following diagram:

The String Modulo Operator

If there are multiple values to insert, then they must be enclosed in a tuple, as illustrated above. If there’s only one value, then you can write it by itself without the surrounding parentheses:

>>>>>> print("Hello, my name is %s." % "Graham") Hello, my name is Graham.

Notice also that string modulo operation isn’t only for printing. You can also format values and assign them to another string variable:

Read the full article at https://realpython.com/python-modulo-string-formatting/ »

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Okular: Signing of unsigned signature fields has landed

Planet KDE - Mon, 2022-01-24 07:32

Up to today, Okular would kind of error out when opening a PDF file that contains a signature field that was unsigned (think like the old space in paper forms saying "sign here")


It would tell you "document is signed but can't be properly validated"




 

And that was it, you couldn't do much with the signature. When you tried to "see" it all the fields would be default like "Signed at 1 Jan 1970", etc.


With the new code we properly detect that there are unsigned signatures and we offer to sign them when interacting with it





Relevant merge requests:

https://invent.kde.org/graphics/okular/-/merge_requests/539

https://gitlab.freedesktop.org/poppler/poppler/-/merge_requests/1026

Categories: FLOSS Project Planets

Podcast.__init__: Improve Your Productivity By Investing In Developer Experience Design For Your Projects

Planet Python - Mon, 2022-01-24 06:30
When we are creating applications we spend a significant amount of effort on optimizing the experience of our end users to ensure that they are able to complete the tasks that the system is intended for. A similar effort that we should all consider is optimizing the developer experience for ourselves and other engineers who contribute to the projects that we work on. Adam Johnson recently wrote a book on how to improve the developer experience for Django projects and in this episode he shares some of the insights that he has gained through that project and his work with clients to help you improve the experience that you and your team have when collaborating on software development.Summary

When we are creating applications we spend a significant amount of effort on optimizing the experience of our end users to ensure that they are able to complete the tasks that the system is intended for. A similar effort that we should all consider is optimizing the developer experience for ourselves and other engineers who contribute to the projects that we work on. Adam Johnson recently wrote a book on how to improve the developer experience for Django projects and in this episode he shares some of the insights that he has gained through that project and his work with clients to help you improve the experience that you and your team have when collaborating on software development.

Announcements
  • Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science.
  • When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
  • Your host as usual is Tobias Macey and today I’m interviewing Adam Johnson about optimizing your developer experience
Interview
  • Introductions
  • How did you get introduced to Python?
  • Can you describe what you mean by the term "developer experience"?
    • How does it compare to the concept of user experience design?
  • What are the main goals that you aim for through improving DX?
  • When considering DX, what are the categories of focus for improvement? (e.g. the experience of a given software project, the developer’s physical environment, their editing environment, etc.)
  • What are some of the most high impact optimizations that a developer can make?
  • What are some of the areas of focus that have the most variable impact on a developer’s experience of a project?
  • What are some of the most helpful tools or practices that you rely on in your own projects?
  • How does the size of a development team or the scale of an organization impact the decisions and benefits around DX improvements?
  • One of the perennial challenges with selecting a given tool or architectural pattern is the continually changing landscape of software. How have your choices for DX strategies changed or evolved over the years?
  • What are the most interesting, innovative, or unexpected developer experience tweaks that you have encountered?
  • What are the most interesting, unexpected, or challenging lessons that you have learned while working on your book?
  • What are some of the potential pitfalls that individuals and teams need to guard against in their quest to improve developer experience for their projects?
  • What are some of the new tools or practices that you are considering incorporating into your own work?
Keep In Touch Picks
  • Tobias
  • Adam
    • Fan of Eternals, enjoyed Neil Gaiman series
    • Also general MCU fan, watched it all in lockdown
    • Moon Knight trailer
Closing Announcements
  • Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management.
  • Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
  • If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@podcastinit.com) with your story.
  • To help other people find the show please leave a review on iTunes and tell your friends and co-workers
Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Categories: FLOSS Project Planets

Zato Blog: Remote API debugging with VS Code

Planet Python - Mon, 2022-01-24 04:31
  • Each Zato environment ships with default configuration that lets its servers be started from a Visual Studio Code’s debugger

  • Servers started in this way can run remotely, e.g. your local IDE may be on Mac or Windows while your Zato servers will be in a remote Linux instance. This will still let you debug your services deployed remotely.

  • It does not matter whether the server is under Docker or if it runs in a Linux VM directly

  • This article contains step-by-step instructions on how to start a Zato server in such a manner and how to debug your code remotely from VS Code

Prerequisites
  • Ensure that your remote Zato server runs in a Linux system that has at least 2 CPUs and 2 GB of RAM. If you use AWS, a medium instance will be the correct one to choose.

  • Make sure that there is SSH connectivity between your localhost and the remote Linux server, that is, you can ssh into the system where the Zato server is. As a reminder, if you use Docker Quickstart, the default port is 22022.

  • If the server runs in a Docker Quickstart instance, there are no further prerequisites and you can skip to the next section

  • If you created a Zato cluster yourself, check if these two files exist:

    /path/to/your/environment/.vscode/launch.json /path/to/your/environment/.vscode/settings.json
  • If the files do not exist, download them here: launch.json and settings.json

  • Once downloaded, save them to the locations above; if the .vscode directory does not exist, create it. The end result should be that if, for instance, your environment is in /opt/zato/env/dev, the files will go to /opt/zato/env/dev/.vscode.

SSH connections in VS Code
  • In VS Code, install an extension called Remote - SSH

  • After you install it, there will be a new button available in the bottom-left hand side corner of your IDE. The button will let you open SSH connections. Click it and then click “Connect to Host”

  • Choose the host where your Zato server is and the IDE will open a new window to connect to that host using SSH. Enter SSH credentials if necessary. Note that you will be in a new IDE window now.
Opening a remote environment
  • Once you are connected in a new window, choose “Open Folder”, select the directory where your environment is and click OK
  • Under Docker Quickstart, the path will be /opt/zato/env/qs-1/. In other environments, navigate to your environment’s path accordingly.
  • After opening the remote directory with a Zato environment, your IDE window will look like below:
Starting the server
  • Click the Run and Debug icon:
  • Click the play icon next to the Remote Zato Main option:
  • The IDE will work now on installing all of its own components in the remote Linux server - that may take a couple of minutes the first time around. The process is CPU-intensive which is why 2 CPUs are a prerequisite.

  • If VS Code tells you that its Python IDE extension is not installed in the remote SSH system, choose to install it over SSH. This may also take a couple of minutes.

  • Allow for some time until the IDE completes the installation of its remote components - there will be feedback provided in the IDE window’s footer. Once they are installed, proceed to the next section below.

Deploying a test service
  • Save the code below as demo.py and hot-deploy it in the now-already-started remote server. Note the highlighted line, we are going to add a breakpoint to it soon.
# -*- coding: utf-8 -*- # Zato from zato.server.service import Service class MyService(Service): def handle(self): msg = 'Hello, I am a demo service' self.logger.info(msg) Debugging a service
  • In the IDE, find the deployed service on a remote server under the path of /path/to/server/work/hot-deploy/current/demo.py and open it:
  • Add a breakpoint in line 9, as indicated below:
  • Invoke the service in any way you prefer, e.g. through REST, Dashboard or from command line

  • The debugger will stop at line 9, showing the local variables, the call stack and other details, exactly as if it was a local server


Congratulations! This concludes the process, everything is set up, you can debug your Zato API services remotely now.

Next steps
  • Start the tutorial to learn how to integrate APIs and build systems. After completing it, you will have a multi-protocol service representing a sample scenario often seen in banking systems with several applications cooperating to provide a single and consistent API to its callers.

  • Visit the support page if you need assistance.

  • Para aprender más sobre las integraciones de Zato y API en español, haga clic aquí

  • Pour en savoir plus sur les intégrations API avec Zato en français, cliquez ici

Categories: FLOSS Project Planets

IslandT: Starting a python language chess game project

Planet Python - Mon, 2022-01-24 02:41

Hello everyone, this is a new chess game project which I am going to start to create and update the code on this website weekly. In this first chapter of the project report, I am going to 1) render out the chessboard 2) write a code to recognize the square which I am touching on 3) Put a pawn on the board.

Without further due, let me start by explaining to you first what this project is all about and what is the final outcome of the project.

The reason I have started this project is that I want to create a chess game application that I can play with as a training exercise for myself to sharpen my chess skill. Stockfish is the chess engine that runs this application and the python code will use one of the Stockfish wrappers written in python to communicate with the Stockfish chess engine. Pygame will be used to create the chess game interface as well as to move pieces along the chessboard. The finished project will contain a common interface together with the chessboard and possibly also includes the AI and analysis part later on but not in this project.

The below python program will perform the above three goals, first, render the chessboard, then prints out the square on the command line and highlight the square area (I am using pycharm to write this program but you can certainly use another editor if you have one) each time the user pressed on one of the squares, and finally, put a pawn on one of the squares on the chessboard.

import sys, pygame import math pygame.init() size = width, height = 512, 512 white = 255, 178, 102 black = 126, 126, 126 hightlight = 192, 192, 192 title = "IslandT Chess" width = 64 # width of the square original_color = '' #empty chess dictionary chess_dict = {} #chess square list chess_square_list = [ "a8", "b8", "c8", "d8", "e8", "f8", "g8", "h8", "a7", "b7", "c7", "d7", "e7", "f7", "g7", "h7", "a6", "b6", "c6", "d6", "e6", "f6", "g6", "h6", "a5", "b5", "c5", "d5", "e5", "f5", "g5", "h5", "a4", "b4", "c4", "d4", "e4", "f4", "g4", "h4", "a3", "b3", "c3", "d3", "e3", "f3", "g3", "h3", "a2", "b2", "c2", "d2", "e2", "f2", "g2", "h2", "a1", "b1", "c1", "d1", "e1", "f1", "g1", "h1" ] # chess square position chess_square_position = [] #pawn image pawn0 = pygame.image.load("pawn.png") # create a list to map name of column and row for i in range(0, 8) : # control row for j in range(0, 8): # control column chess_square_position.append((j * width, i * width)) # create a dictionary to map the name of column and row for n in range(0, len(chess_square_position)): chess_dict[chess_square_list[n]] = chess_square_position[n] screen = pygame.display.set_mode(size) pygame.display.set_caption(title) rect_list = list() # this is the list of brown rectangle # used this loop to create a list of brown rectangles for i in range(0, 8): # control the row for j in range(0, 8): # control the column if i % 2 == 0: # which means it is an even row if j % 2 != 0: # which means it is an odd column rect_list.append(pygame.Rect(j * width, i * width, width, width)) else: if j % 2 == 0: # which means it is an even column rect_list.append(pygame.Rect(j * width, i * width, width, width)) # create main surface and fill the base color with light brown color chess_board_surface = pygame.Surface(size) chess_board_surface.fill(white) # next draws the dark brown rectangles on the chessboard surface for chess_rect in rect_list: pygame.draw.rect(chess_board_surface, black, chess_rect) while True: for event in pygame.event.get(): if event.type == pygame.QUIT: sys.exit() elif event.type == pygame.MOUSEBUTTONDOWN: pos = event.pos x = math.floor(pos[0] / width) y = math.floor(pos[1] / width) # print the square name which you have clicked on for key, value in chess_dict.items(): if (x * width, y * width) == (value[0],value[1]): print(key) original_color = chess_board_surface.get_at((x * width, y * width )) pygame.draw.rect(chess_board_surface, hightlight, pygame.Rect((x) * width, (y) * width, 64, 64)) elif event.type == pygame.MOUSEBUTTONUP: pos = event.pos x = math.floor(pos[0] / width) y = math.floor(pos[1] / width) pygame.draw.rect(chess_board_surface, original_color, pygame.Rect((x) * width, (y) * width, 64, 64)) # displayed the chess surface screen.blit(chess_board_surface, (0, 0)) screen.blit(pawn0, (0, 64)) # just testing... pygame.display.update()

The result is as follows…

pygame chess project

At the moment only one piece of pawn is on the chessboard and our next goal is to move that piece along the board which is going to happen in the next coming chapter!

If you are interested in another topic besides programming then do visit my blog and become my friend on Blogspot.

Categories: FLOSS Project Planets

Mike Driscoll: PyDev of the Week: Julian Sequeira

Planet Python - Mon, 2022-01-24 01:05

This week we welcome Julian Sequeira (@juliansequeira) as our PyDev of the Week! Julian is one of the co-founders of PyBites. They post articles, courses, run a podcast and have a fun Code Challenge site too.

You can connect with Julian on LinkedIn if you'd like to. Now let's spend some time getting to know him better!

Can you tell us a little about yourself (hobbies, education, etc):

I'm Julian Sequeira, an entrepreneur obsessed with Python and everything Mindset related. I've spent my life in technology and am currently the Co-Founder of PyBites as well as a Program Manager at Amazon Web Services. Off the books, I love to tinker with tech, play the guitar, spend time with my kids and dive into a casual video game or three.

Why did you start using Python?

I started using Python when I was working as a Field Engineer at Oracle in 2016. By then I'd already befriended my now best mate and business partner, Bob Belderbos, who I was looking to partner with on a project. At the same time, I had the need to create a simple app to track the over time I was doing as part of the day job.

While using Python to create the app, I found myself wanting and needing to take notes on the concepts I was learning (virtual environments blew my mind!). This led to Bob and I deciding to create a blog so we could both learn Python and record what we learned at the same time. PyBites was born, the Python learning continued, all manner of material was created and here we are today, with PyBites as our very own company.

Oh and yes, I finished creating the overtime tracker and while I found no discrepancies in *my* pay, it found some gaps in someone else's! Win!

What other programming languages do you know and which is your favorite?

This is a tough one! I've spent the vast majority of my time on Python but do have some rudimentary Javascript and C++ experience. C++ holds a special (painful) place in my heart as it was the very first programming language I learned back in high school. I remember making a  text-based Black Jack game which I was super proud of. Given the relative complexities, I still feel like a champion just writing "#include <iostream>".

What projects are you working on now?

Not as many projects as I'd like due to moving houses but I've just started working on a Python app with my son so we can categorise and record his Pokemon card collection. We're using the PyBites Developer Mindset approach and starting with an MVP. It's slow going as I'm teaching him Python at the same time but it's not the speed that counts, we're just enjoying the journey and every little win along the way.

Outside of that, there are always projects in the works for PyBites. I'm not coding anything right now but am working on some content for schools that are using our Python exercises platform to teach students how to code. This is what gets me up in the morning!

Which Python libraries are your favorite (core or 3rd party)?

The kind with books! ... Anyone?

Fine! While I'm not an expert with it in any way, I've come to appreciate OpenCV. To me it just unlocks so much from a creative perspective and allows me to take dreams I've had my entire life and make them a reality. I remember following a tutorial from Adrian Rosebrock to create my own OpenCV based Pokedex. It blew my mind and really reminded me that Python can be used not just to create the usual "serious" app but to set us free and really use our imaginations to make the world a brighter place.

What were we talking about? Oh yeah, OpenCV would be it!

How is the PyBites Podcast going? How is that different from the other things you do at PyBites?

The PyBites Podcast is going strong! It has to be the most fun we have second only to working with our clients. As we record each episode, I honestly forget that it's being recorded and just enjoy the conversation. It allows me to share openly and honestly without feeling tied down by a script. I feel like this sets it apart from the rest of the things we do at PyBites. It's candid, light and raw and allows listeners to hear us at our best.

Also, while PyBites can be quite technical on the blog, the podcast tends to lean toward the mindset side of things. Mindset topics can be dry in text form but when discussed over audio people can hear the passion and enthusiasm we have regarding the theme of the episode.

Is there anything else you’d like to say?

As you may or may not know, Bob and I coach people through their Python journey. One thing that comes up over and over is how important it is to have a strong mindset as you push for your Python goals. It's not enough to just code. You have to learn to effectively work with the successes *and* the failures, how to work with others, how to take criticism, deal with tutorial paralysis and imposter syndrome... the list goes on.

Ultimately, you need to find a balance between the mindset and the tech skills. This is when real progress is made.

And to wrap it up, it may sound lame but the reality is that it starts and ends with you. Trust in yourself and your abilities and go get it.

Thanks for doing the interview, Julian!

The post PyDev of the Week: Julian Sequeira appeared first on Mouse Vs Python.

Categories: FLOSS Project Planets

James Bennett: For hire

Planet Python - Sun, 2022-01-23 20:51

As I write this it’s the evening of January 23, 2022. A little over two weeks ago I gave notice at my now-former employer, and as of two days ago I am officially on the job market.

If you already know me and are interested in talking about an opportunity, please get in touch. Or if you want to know a bit more first, read on…

Who I am

It’s a bit tricky to pin down when …

Read full entry

Categories: FLOSS Project Planets

Matthieu Caneill: Debsources, python3, and funky file names

Planet Debian - Sun, 2022-01-23 18:00

Rumors are running that python2 is not a thing anymore.

Well, I'm certainly late to the party, but I'm happy to report that sources.debian.org is now running python3.

Wait, it wasn't?

Back when development started, python3 was very much a real language, but it was hard to adopt because it was not supported by many libraries. So python2 was chosen, meaning print-based debugging was used in lieu of print()-based debugging, and str were bytes, not unicode.

And things were working just fine. One day python2 EOL was announced, with a date far in the future. Far enough to procrastinate for a long time. Combine this with a codebase that is stable enough to not see many commits, and the fact that Debsources is a volunteer-based project that happens at best on week-ends, and you end up with a dormant software and a missed deadline.

But, as dormant as the codebase is, the instance hosted at sources.debian.org is very popular and gets 200k to 500k hits per day. Largely enough to be worth a proper maintenance and a transition to python3.

Funky file names

While transitioning to python3 and juggling left and right with str, bytes and unicode for internal objects, files, database entries and HTTP content, I stumbled upon a bug that has been there since day 1.

Quick recap if you're unfamiliar with this tool: Debsources displays the content of the source packages in the Debian archive. In other words, it's a bit like GitHub, but for the Debian source code.

And some pieces of software out there, that ended up in Debian packages, happen to contain files whose names can't be decoded to UTF-8. Interestingly enough, there's no such thing as a standard for file names: with a few exceptions that vary by operating system, any sequence of bytes can be a legit file name. And some sequences of bytes are not valid UTF-8.

Of course those files are rare, and using ASCII characters to name a file is a much more common practice than using bytes in a non-UTF-8 character encoding. But when you deal with almost 100 billion files on which you have no control (those files come from free software projects, and make their way into Debian without any renaming), it happens.

Now back to the bug: when trying to display such a file through the web interface, it would crash because it can't convert the file name to UTF-8, which is needed for the HTML representation of the page.

Bugfix

An often valid approach when trying to represent invalid UTF-8 content is to ignore errors, and replace them with ? or �. This is what Debsources actually does to display non-UTF-8 file content.

Unfortunately, this best-effort approach is not suitable for file names, as file names are also identifiers in Debsources: among other places, they are part of URLs. If an URL were to use placeholder characters to replace those bytes, there would be no deterministic way to match it with a file on disk anymore.

The representation of binary data into text is a known problem. Multiple lossless solutions exist, such as base64 and its variants, but URLs looking like https://sources.debian.org/src/Y293c2F5LzMuMDMtOS4yL2Nvd3NheS8= are not readable at all compared to https://sources.debian.org/src/cowsay/3.03-9.2/cowsay/. Plus, not backwards-compatible with all existing links.

The solution I chose is to use double-percent encoding: this allows the representation of any byte in an URL, while keeping allowed characters unchanged - and preventing CGI gateways from trying to decode non-UTF-8 bytes. This is the best of both worlds: regular file names get to appear normally and are human-readable, and funky file names only have percent signs and hex numbers where needed.

Here is an example of such an URL: https://sources.debian.org/src/aspell-is/0.51-0-4/%25EDslenska.alias/. Notice the %25ED to represent the percentage symbol itself (%25) followed by an invalid UTF-8 byte (%ED).

Transitioning to this was quite a challenge, as those file names don't only appear in URLs, but also in web pages themselves, log files, database tables, etc. And everything was done with str: made sense in python2 when str were bytes, but not much in python3.

What are those files? What's their network?

I was wondering too. Let's list them!

import os with open('non-utf-8-paths.bin', 'wb') as f: for root, folders, files in os.walk(b'/srv/sources.debian.org/sources/'): for path in folders + files: try: path.decode('utf-8') except UnicodeDecodeError: f.write(root + b'/' + path + b'\n')

Running this on the Debsources main instance, which hosts pretty much all Debian packages that were part of a Debian release, I could find 307 files (among a total of almost 100 billion files).

Without looking deep into them, they seem to fall into 2 categories:

  • File names that are not valid UTF-8, but are valid in a different charset. Not all software is developed in English or on UTF-8 systems.
  • File names that can't be decoded to UTF-8 on purpose, to be used as input to test suites, and assert resilience of the software to non-UTF-8 data.

That last point hits home, as it was clearly lacking in Debsources. A funky file name is now part of its test suite. ;)

Categories: FLOSS Project Planets

Anarcat: Switching from OpenNTPd to Chrony

Planet Python - Sun, 2022-01-23 16:55

A friend recently reminded me of the existence of chrony, a "versatile implementation of the Network Time Protocol (NTP)". The excellent introduction is worth quoting in full:

It can synchronise the system clock with NTP servers, reference clocks (e.g. GPS receiver), and manual input using wristwatch and keyboard. It can also operate as an NTPv4 (RFC 5905) server and peer to provide a time service to other computers in the network.

It is designed to perform well in a wide range of conditions, including intermittent network connections, heavily congested networks, changing temperatures (ordinary computer clocks are sensitive to temperature), and systems that do not run continuosly, or run on a virtual machine.

Typical accuracy between two machines synchronised over the Internet is within a few milliseconds; on a LAN, accuracy is typically in tens of microseconds. With hardware timestamping, or a hardware reference clock, sub-microsecond accuracy may be possible.

Now that's already great documentation right there. What it is, why it's good, and what to expect from it. I want more. They have a very handy comparison table between chrony, ntp and openntpd.

My problem with OpenNTPd

Following concerns surrounding the security (and complexity) of the venerable ntp program, I have, a long time ago, switched to using openntpd on all my computers. I hadn't thought about it until I recently noticed a lot of noise on one of my servers:

jan 18 10:09:49 curie ntpd[1069]: adjusting local clock by -1.604366s jan 18 10:08:18 curie ntpd[1069]: adjusting local clock by -1.577608s jan 18 10:05:02 curie ntpd[1069]: adjusting local clock by -1.574683s jan 18 10:04:00 curie ntpd[1069]: adjusting local clock by -1.573240s jan 18 10:02:26 curie ntpd[1069]: adjusting local clock by -1.569592s

You read that right, openntpd was constantly rewinding the clock, sometimes in less than two minutes. The above log was taken while doing diagnostics, looking at the last 30 minutes of logs. So, on average, one 1.5 seconds rewind per 6 minutes!

That might be due to a dying real time clock (RTC) or some other hardware problem. I know for a fact that the CMOS battery on that computer (curie) died and I wasn't able to replace it (!). So that's partly garbage-in, garbage-out here. But still, I was curious to see how chrony would behave... (Spoiler: much better.)

But I also had trouble on another workstation, that one a much more recent machine (angela). First, it seems OpenNTPd would just fail at boot time:

anarcat@angela:~(main)$ sudo systemctl status openntpd ● openntpd.service - OpenNTPd Network Time Protocol Loaded: loaded (/lib/systemd/system/openntpd.service; enabled; vendor pres> Active: inactive (dead) since Sun 2022-01-23 09:54:03 EST; 6h ago Docs: man:openntpd(8) Process: 3291 ExecStartPre=/usr/sbin/ntpd -n $DAEMON_OPTS (code=exited, sta> Process: 3294 ExecStart=/usr/sbin/ntpd $DAEMON_OPTS (code=exited, status=0/> Main PID: 3298 (code=exited, status=0/SUCCESS) CPU: 34ms jan 23 09:54:03 angela systemd[1]: Starting OpenNTPd Network Time Protocol... jan 23 09:54:03 angela ntpd[3291]: configuration OK jan 23 09:54:03 angela ntpd[3297]: ntp engine ready jan 23 09:54:03 angela ntpd[3297]: ntp: recvfrom: Permission denied jan 23 09:54:03 angela ntpd[3294]: Terminating jan 23 09:54:03 angela systemd[1]: Started OpenNTPd Network Time Protocol. jan 23 09:54:03 angela systemd[1]: openntpd.service: Succeeded.

After a restart, somehow it worked, but it took a long time to sync the clock. At first, it would just not consider any peer at all:

anarcat@angela:~(main)$ sudo ntpctl -s all 0/20 peers valid, clock unsynced peer wt tl st next poll offset delay jitter 159.203.8.72 from pool 0.debian.pool.ntp.org 1 5 2 6s 6s ---- peer not valid ---- 138.197.135.239 from pool 0.debian.pool.ntp.org 1 5 2 6s 7s ---- peer not valid ---- 216.197.156.83 from pool 0.debian.pool.ntp.org 1 4 1 2s 9s ---- peer not valid ---- 142.114.187.107 from pool 0.debian.pool.ntp.org 1 5 2 5s 6s ---- peer not valid ---- 216.6.2.70 from pool 1.debian.pool.ntp.org 1 4 2 2s 8s ---- peer not valid ---- 207.34.49.172 from pool 1.debian.pool.ntp.org 1 4 2 0s 5s ---- peer not valid ---- 198.27.76.102 from pool 1.debian.pool.ntp.org 1 5 2 5s 5s ---- peer not valid ---- 158.69.254.196 from pool 1.debian.pool.ntp.org 1 4 3 1s 6s ---- peer not valid ---- 149.56.121.16 from pool 2.debian.pool.ntp.org 1 4 2 5s 9s ---- peer not valid ---- 162.159.200.123 from pool 2.debian.pool.ntp.org 1 4 3 1s 6s ---- peer not valid ---- 206.108.0.131 from pool 2.debian.pool.ntp.org 1 4 1 6s 9s ---- peer not valid ---- 205.206.70.40 from pool 2.debian.pool.ntp.org 1 5 2 8s 9s ---- peer not valid ---- 2001:678:8::123 from pool 2.debian.pool.ntp.org 1 4 2 5s 9s ---- peer not valid ---- 2606:4700:f1::1 from pool 2.debian.pool.ntp.org 1 4 3 2s 6s ---- peer not valid ---- 2607:5300:205:200::1991 from pool 2.debian.pool.ntp.org 1 4 2 5s 9s ---- peer not valid ---- 2607:5300:201:3100::345c from pool 2.debian.pool.ntp.org 1 4 4 1s 6s ---- peer not valid ---- 209.115.181.110 from pool 3.debian.pool.ntp.org 1 5 2 5s 6s ---- peer not valid ---- 205.206.70.42 from pool 3.debian.pool.ntp.org 1 4 2 0s 6s ---- peer not valid ---- 68.69.221.61 from pool 3.debian.pool.ntp.org 1 4 1 2s 9s ---- peer not valid ---- 162.159.200.1 from pool 3.debian.pool.ntp.org 1 4 3 4s 7s ---- peer not valid ----

Then it would accept them, but still wouldn't sync the clock:

anarcat@angela:~(main)$ sudo ntpctl -s all 20/20 peers valid, clock unsynced peer wt tl st next poll offset delay jitter 159.203.8.72 from pool 0.debian.pool.ntp.org 1 8 2 5s 6s 0.672ms 13.507ms 0.442ms 138.197.135.239 from pool 0.debian.pool.ntp.org 1 7 2 4s 8s 1.260ms 13.388ms 0.494ms 216.197.156.83 from pool 0.debian.pool.ntp.org 1 7 1 3s 5s -0.390ms 47.641ms 1.537ms 142.114.187.107 from pool 0.debian.pool.ntp.org 1 7 2 1s 6s -0.573ms 15.012ms 1.845ms 216.6.2.70 from pool 1.debian.pool.ntp.org 1 7 2 3s 8s -0.178ms 21.691ms 1.807ms 207.34.49.172 from pool 1.debian.pool.ntp.org 1 7 2 4s 8s -5.742ms 70.040ms 1.656ms 198.27.76.102 from pool 1.debian.pool.ntp.org 1 7 2 0s 7s 0.170ms 21.035ms 1.914ms 158.69.254.196 from pool 1.debian.pool.ntp.org 1 7 3 5s 8s -2.626ms 20.862ms 2.032ms 149.56.121.16 from pool 2.debian.pool.ntp.org 1 7 2 6s 8s 0.123ms 20.758ms 2.248ms 162.159.200.123 from pool 2.debian.pool.ntp.org 1 8 3 4s 5s 2.043ms 14.138ms 1.675ms 206.108.0.131 from pool 2.debian.pool.ntp.org 1 6 1 0s 7s -0.027ms 14.189ms 2.206ms 205.206.70.40 from pool 2.debian.pool.ntp.org 1 7 2 1s 5s -1.777ms 53.459ms 1.865ms 2001:678:8::123 from pool 2.debian.pool.ntp.org 1 6 2 1s 8s 0.195ms 14.572ms 2.624ms 2606:4700:f1::1 from pool 2.debian.pool.ntp.org 1 7 3 6s 9s 2.068ms 14.102ms 1.767ms 2607:5300:205:200::1991 from pool 2.debian.pool.ntp.org 1 6 2 4s 9s 0.254ms 21.471ms 2.120ms 2607:5300:201:3100::345c from pool 2.debian.pool.ntp.org 1 7 4 5s 9s -1.706ms 21.030ms 1.849ms 209.115.181.110 from pool 3.debian.pool.ntp.org 1 7 2 0s 7s 8.907ms 75.070ms 2.095ms 205.206.70.42 from pool 3.debian.pool.ntp.org 1 7 2 6s 9s -1.729ms 53.823ms 2.193ms 68.69.221.61 from pool 3.debian.pool.ntp.org 1 7 1 1s 7s -1.265ms 46.355ms 4.171ms 162.159.200.1 from pool 3.debian.pool.ntp.org 1 7 3 4s 8s 1.732ms 35.792ms 2.228ms

It took a solid five minutes to sync the clock, even though the peers were considered valid within a few seconds:

jan 23 15:58:41 angela systemd[1]: Started OpenNTPd Network Time Protocol. jan 23 15:58:58 angela ntpd[84086]: peer 142.114.187.107 now valid jan 23 15:58:58 angela ntpd[84086]: peer 198.27.76.102 now valid jan 23 15:58:58 angela ntpd[84086]: peer 207.34.49.172 now valid jan 23 15:58:58 angela ntpd[84086]: peer 209.115.181.110 now valid jan 23 15:58:59 angela ntpd[84086]: peer 159.203.8.72 now valid jan 23 15:58:59 angela ntpd[84086]: peer 138.197.135.239 now valid jan 23 15:58:59 angela ntpd[84086]: peer 162.159.200.123 now valid jan 23 15:58:59 angela ntpd[84086]: peer 2607:5300:201:3100::345c now valid jan 23 15:59:00 angela ntpd[84086]: peer 2606:4700:f1::1 now valid jan 23 15:59:00 angela ntpd[84086]: peer 158.69.254.196 now valid jan 23 15:59:01 angela ntpd[84086]: peer 216.6.2.70 now valid jan 23 15:59:01 angela ntpd[84086]: peer 68.69.221.61 now valid jan 23 15:59:01 angela ntpd[84086]: peer 205.206.70.40 now valid jan 23 15:59:01 angela ntpd[84086]: peer 205.206.70.42 now valid jan 23 15:59:02 angela ntpd[84086]: peer 162.159.200.1 now valid jan 23 15:59:04 angela ntpd[84086]: peer 216.197.156.83 now valid jan 23 15:59:05 angela ntpd[84086]: peer 206.108.0.131 now valid jan 23 15:59:05 angela ntpd[84086]: peer 2001:678:8::123 now valid jan 23 15:59:05 angela ntpd[84086]: peer 149.56.121.16 now valid jan 23 15:59:07 angela ntpd[84086]: peer 2607:5300:205:200::1991 now valid jan 23 16:03:47 angela ntpd[84086]: clock is now synced

That seems kind of odd. It was also frustrating to have very little information from ntpctl about the state of the daemon. I understand it's designed to be minimal, but it could inform me on his known offset, for example. It does tell me about the offset with the different peers, but not as clearly as one would expect. It's also unclear how it disciplines the RTC at all.

Compared to chrony

Now compare with chrony:

jan 23 16:07:16 angela systemd[1]: Starting chrony, an NTP client/server... jan 23 16:07:16 angela chronyd[87765]: chronyd version 4.0 starting (+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP +SCFILTER +SIGND +ASYNCDNS +NTS +SECHASH +IPV6 -DEBUG) jan 23 16:07:16 angela chronyd[87765]: Initial frequency 3.814 ppm jan 23 16:07:16 angela chronyd[87765]: Using right/UTC timezone to obtain leap second data jan 23 16:07:16 angela chronyd[87765]: Loaded seccomp filter jan 23 16:07:16 angela systemd[1]: Started chrony, an NTP client/server. jan 23 16:07:21 angela chronyd[87765]: Selected source 206.108.0.131 (2.debian.pool.ntp.org) jan 23 16:07:21 angela chronyd[87765]: System clock TAI offset set to 37 seconds

First, you'll notice there's none of that "clock synced" nonsense, it picks a source, and then... it's just done. Because the clock on this computer is not drifting that much, and openntpd had (presumably) just sync'd it anyways. And indeed, if we look at detailed stats from the powerful chronyc client:

anarcat@angela:~(main)$ sudo chronyc tracking Reference ID : CE6C0083 (ntp1.torix.ca) Stratum : 2 Ref time (UTC) : Sun Jan 23 21:07:21 2022 System time : 0.000000311 seconds slow of NTP time Last offset : +0.000807989 seconds RMS offset : 0.000807989 seconds Frequency : 3.814 ppm fast Residual freq : -24.434 ppm Skew : 1000000.000 ppm Root delay : 0.013200894 seconds Root dispersion : 65.357254028 seconds Update interval : 1.4 seconds Leap status : Normal

We see that we are nanoseconds away from NTP time. That was ran very quickly after starting the server (literally in the same second as chrony picked a source), so stats are a bit weird (e.g. the Skew is huge). After a minute or two, it looks more reasonable:

Reference ID : CE6C0083 (ntp1.torix.ca) Stratum : 2 Ref time (UTC) : Sun Jan 23 21:09:32 2022 System time : 0.000487002 seconds slow of NTP time Last offset : -0.000332960 seconds RMS offset : 0.000751204 seconds Frequency : 3.536 ppm fast Residual freq : +0.016 ppm Skew : 3.707 ppm Root delay : 0.013363549 seconds Root dispersion : 0.000324015 seconds Update interval : 65.0 seconds Leap status : Normal

Now it's learning how good or bad the RTC clock is ("Frequency"), and is smoothly adjusting the System time to follow the average offset (RMS offset, more or less). You'll also notice the Update interval has risen, and will keep expanding as chrony learns more about the internal clock, so it doesn't need to constantly poll the NTP servers to sync the clock. In the above, we're 487 micro seconds (less than a milisecond!) away from NTP time.

(People interested in the explanation of every single one of those fields can read the excellent chronyc manpage. That thing made me want to nerd out on NTP again!)

On the machine with the bad clock, chrony also did a 1.5 second adjustment, but just once, at startup:

jan 18 11:54:33 curie chronyd[2148399]: Selected source 206.108.0.133 (2.debian.pool.ntp.org) jan 18 11:54:33 curie chronyd[2148399]: System clock wrong by -1.606546 seconds jan 18 11:54:31 curie chronyd[2148399]: System clock was stepped by -1.606546 seconds jan 18 11:54:31 curie chronyd[2148399]: System clock TAI offset set to 37 seconds

Then it would still struggle to keep the clock in sync, but not as badly as openntpd. Here's the offset a few minutes after that above startup:

System time : 0.000375352 seconds slow of NTP time

And again a few seconds later:

System time : 0.001793046 seconds slow of NTP time

I don't currently have access to that machine, and will update this post with the latest status, but so far I've had a very good experience with chrony on that machine, which is a testament to its resilience, and it also just works on my other machines as well.

Extras

On top of "just working" (as demonstrated above), I feel that chrony's feature set is so much superior... Here's an excerpt of the extras in chrony, taken from comparison table:

  • source frequency tracking
  • source state restore from file
  • temperature compensation
  • ready for next NTP era (year 2036)
  • replace unreachable / falseticker servers
  • aware of jitter
  • RTC drift tracking
  • RTC trimming
  • Restore time from file w/o RTC
  • leap seconds correction, in slew mode
  • drops root privileges

I even understand some of that stuff. I think.

So kudos to the chrony folks, I'm switching.

Caveats

One thing to keep in mind in the above, however is that it's quite possible chrony does as bad of a job as openntpd on that old machine, and just doesn't tell me about it. For example, here's another log sample from another server (marcos):

jan 23 11:13:25 marcos ntpd[1976694]: adjusting clock frequency by 0.451035 to -16.420273ppm

I get those basically every day, which seems to show that it's at least trying to keep track of the hardware clock.

In other words, it's quite possible I have no idea what I'm talking about and you definitely need to take this article with a grain of salt. I'm not an NTP expert.

Switching to chrony

Because the default configuration in chrony (at least as shipped in Debian) is sane (good default peers, no open network by default), installing it is as simple as:

apt install chrony

And because it somehow conflicts with openntpd, that also takes care of removing that cruft as well.

Categories: FLOSS Project Planets

Antoine Beaupré: Switching from OpenNTPd to Chrony

Planet Debian - Sun, 2022-01-23 16:55

A friend recently reminded me of the existence of chrony, a "versatile implementation of the Network Time Protocol (NTP)". The excellent introduction is worth quoting in full:

It can synchronise the system clock with NTP servers, reference clocks (e.g. GPS receiver), and manual input using wristwatch and keyboard. It can also operate as an NTPv4 (RFC 5905) server and peer to provide a time service to other computers in the network.

It is designed to perform well in a wide range of conditions, including intermittent network connections, heavily congested networks, changing temperatures (ordinary computer clocks are sensitive to temperature), and systems that do not run continuosly, or run on a virtual machine.

Typical accuracy between two machines synchronised over the Internet is within a few milliseconds; on a LAN, accuracy is typically in tens of microseconds. With hardware timestamping, or a hardware reference clock, sub-microsecond accuracy may be possible.

Now that's already great documentation right there. What it is, why it's good, and what to expect from it. I want more. They have a very handy comparison table between chrony, ntp and openntpd.

My problem with OpenNTPd

Following concerns surrounding the security (and complexity) of the venerable ntp program, I have, a long time ago, switched to using openntpd on all my computers. I hadn't thought about it until I recently noticed a lot of noise on one of my servers:

jan 18 10:09:49 curie ntpd[1069]: adjusting local clock by -1.604366s jan 18 10:08:18 curie ntpd[1069]: adjusting local clock by -1.577608s jan 18 10:05:02 curie ntpd[1069]: adjusting local clock by -1.574683s jan 18 10:04:00 curie ntpd[1069]: adjusting local clock by -1.573240s jan 18 10:02:26 curie ntpd[1069]: adjusting local clock by -1.569592s

You read that right, openntpd was constantly rewinding the clock, sometimes in less than two minutes. The above log was taken while doing diagnostics, looking at the last 30 minutes of logs. So, on average, one 1.5 seconds rewind per 6 minutes!

That might be due to a dying real time clock (RTC) or some other hardware problem. I know for a fact that the CMOS battery on that computer (curie) died and I wasn't able to replace it (!). So that's partly garbage-in, garbage-out here. But still, I was curious to see how chrony would behave... (Spoiler: much better.)

But I also had trouble on another workstation, that one a much more recent machine (angela). First, it seems OpenNTPd would just fail at boot time:

anarcat@angela:~(main)$ sudo systemctl status openntpd ● openntpd.service - OpenNTPd Network Time Protocol Loaded: loaded (/lib/systemd/system/openntpd.service; enabled; vendor pres> Active: inactive (dead) since Sun 2022-01-23 09:54:03 EST; 6h ago Docs: man:openntpd(8) Process: 3291 ExecStartPre=/usr/sbin/ntpd -n $DAEMON_OPTS (code=exited, sta> Process: 3294 ExecStart=/usr/sbin/ntpd $DAEMON_OPTS (code=exited, status=0/> Main PID: 3298 (code=exited, status=0/SUCCESS) CPU: 34ms jan 23 09:54:03 angela systemd[1]: Starting OpenNTPd Network Time Protocol... jan 23 09:54:03 angela ntpd[3291]: configuration OK jan 23 09:54:03 angela ntpd[3297]: ntp engine ready jan 23 09:54:03 angela ntpd[3297]: ntp: recvfrom: Permission denied jan 23 09:54:03 angela ntpd[3294]: Terminating jan 23 09:54:03 angela systemd[1]: Started OpenNTPd Network Time Protocol. jan 23 09:54:03 angela systemd[1]: openntpd.service: Succeeded.

After a restart, somehow it worked, but it took a long time to sync the clock. At first, it would just not consider any peer at all:

anarcat@angela:~(main)$ sudo ntpctl -s all 0/20 peers valid, clock unsynced peer wt tl st next poll offset delay jitter 159.203.8.72 from pool 0.debian.pool.ntp.org 1 5 2 6s 6s ---- peer not valid ---- 138.197.135.239 from pool 0.debian.pool.ntp.org 1 5 2 6s 7s ---- peer not valid ---- 216.197.156.83 from pool 0.debian.pool.ntp.org 1 4 1 2s 9s ---- peer not valid ---- 142.114.187.107 from pool 0.debian.pool.ntp.org 1 5 2 5s 6s ---- peer not valid ---- 216.6.2.70 from pool 1.debian.pool.ntp.org 1 4 2 2s 8s ---- peer not valid ---- 207.34.49.172 from pool 1.debian.pool.ntp.org 1 4 2 0s 5s ---- peer not valid ---- 198.27.76.102 from pool 1.debian.pool.ntp.org 1 5 2 5s 5s ---- peer not valid ---- 158.69.254.196 from pool 1.debian.pool.ntp.org 1 4 3 1s 6s ---- peer not valid ---- 149.56.121.16 from pool 2.debian.pool.ntp.org 1 4 2 5s 9s ---- peer not valid ---- 162.159.200.123 from pool 2.debian.pool.ntp.org 1 4 3 1s 6s ---- peer not valid ---- 206.108.0.131 from pool 2.debian.pool.ntp.org 1 4 1 6s 9s ---- peer not valid ---- 205.206.70.40 from pool 2.debian.pool.ntp.org 1 5 2 8s 9s ---- peer not valid ---- 2001:678:8::123 from pool 2.debian.pool.ntp.org 1 4 2 5s 9s ---- peer not valid ---- 2606:4700:f1::1 from pool 2.debian.pool.ntp.org 1 4 3 2s 6s ---- peer not valid ---- 2607:5300:205:200::1991 from pool 2.debian.pool.ntp.org 1 4 2 5s 9s ---- peer not valid ---- 2607:5300:201:3100::345c from pool 2.debian.pool.ntp.org 1 4 4 1s 6s ---- peer not valid ---- 209.115.181.110 from pool 3.debian.pool.ntp.org 1 5 2 5s 6s ---- peer not valid ---- 205.206.70.42 from pool 3.debian.pool.ntp.org 1 4 2 0s 6s ---- peer not valid ---- 68.69.221.61 from pool 3.debian.pool.ntp.org 1 4 1 2s 9s ---- peer not valid ---- 162.159.200.1 from pool 3.debian.pool.ntp.org 1 4 3 4s 7s ---- peer not valid ----

Then it would accept them, but still wouldn't sync the clock:

anarcat@angela:~(main)$ sudo ntpctl -s all 20/20 peers valid, clock unsynced peer wt tl st next poll offset delay jitter 159.203.8.72 from pool 0.debian.pool.ntp.org 1 8 2 5s 6s 0.672ms 13.507ms 0.442ms 138.197.135.239 from pool 0.debian.pool.ntp.org 1 7 2 4s 8s 1.260ms 13.388ms 0.494ms 216.197.156.83 from pool 0.debian.pool.ntp.org 1 7 1 3s 5s -0.390ms 47.641ms 1.537ms 142.114.187.107 from pool 0.debian.pool.ntp.org 1 7 2 1s 6s -0.573ms 15.012ms 1.845ms 216.6.2.70 from pool 1.debian.pool.ntp.org 1 7 2 3s 8s -0.178ms 21.691ms 1.807ms 207.34.49.172 from pool 1.debian.pool.ntp.org 1 7 2 4s 8s -5.742ms 70.040ms 1.656ms 198.27.76.102 from pool 1.debian.pool.ntp.org 1 7 2 0s 7s 0.170ms 21.035ms 1.914ms 158.69.254.196 from pool 1.debian.pool.ntp.org 1 7 3 5s 8s -2.626ms 20.862ms 2.032ms 149.56.121.16 from pool 2.debian.pool.ntp.org 1 7 2 6s 8s 0.123ms 20.758ms 2.248ms 162.159.200.123 from pool 2.debian.pool.ntp.org 1 8 3 4s 5s 2.043ms 14.138ms 1.675ms 206.108.0.131 from pool 2.debian.pool.ntp.org 1 6 1 0s 7s -0.027ms 14.189ms 2.206ms 205.206.70.40 from pool 2.debian.pool.ntp.org 1 7 2 1s 5s -1.777ms 53.459ms 1.865ms 2001:678:8::123 from pool 2.debian.pool.ntp.org 1 6 2 1s 8s 0.195ms 14.572ms 2.624ms 2606:4700:f1::1 from pool 2.debian.pool.ntp.org 1 7 3 6s 9s 2.068ms 14.102ms 1.767ms 2607:5300:205:200::1991 from pool 2.debian.pool.ntp.org 1 6 2 4s 9s 0.254ms 21.471ms 2.120ms 2607:5300:201:3100::345c from pool 2.debian.pool.ntp.org 1 7 4 5s 9s -1.706ms 21.030ms 1.849ms 209.115.181.110 from pool 3.debian.pool.ntp.org 1 7 2 0s 7s 8.907ms 75.070ms 2.095ms 205.206.70.42 from pool 3.debian.pool.ntp.org 1 7 2 6s 9s -1.729ms 53.823ms 2.193ms 68.69.221.61 from pool 3.debian.pool.ntp.org 1 7 1 1s 7s -1.265ms 46.355ms 4.171ms 162.159.200.1 from pool 3.debian.pool.ntp.org 1 7 3 4s 8s 1.732ms 35.792ms 2.228ms

It took a solid five minutes to sync the clock, even though the peers were considered valid within a few seconds:

jan 23 15:58:41 angela systemd[1]: Started OpenNTPd Network Time Protocol. jan 23 15:58:58 angela ntpd[84086]: peer 142.114.187.107 now valid jan 23 15:58:58 angela ntpd[84086]: peer 198.27.76.102 now valid jan 23 15:58:58 angela ntpd[84086]: peer 207.34.49.172 now valid jan 23 15:58:58 angela ntpd[84086]: peer 209.115.181.110 now valid jan 23 15:58:59 angela ntpd[84086]: peer 159.203.8.72 now valid jan 23 15:58:59 angela ntpd[84086]: peer 138.197.135.239 now valid jan 23 15:58:59 angela ntpd[84086]: peer 162.159.200.123 now valid jan 23 15:58:59 angela ntpd[84086]: peer 2607:5300:201:3100::345c now valid jan 23 15:59:00 angela ntpd[84086]: peer 2606:4700:f1::1 now valid jan 23 15:59:00 angela ntpd[84086]: peer 158.69.254.196 now valid jan 23 15:59:01 angela ntpd[84086]: peer 216.6.2.70 now valid jan 23 15:59:01 angela ntpd[84086]: peer 68.69.221.61 now valid jan 23 15:59:01 angela ntpd[84086]: peer 205.206.70.40 now valid jan 23 15:59:01 angela ntpd[84086]: peer 205.206.70.42 now valid jan 23 15:59:02 angela ntpd[84086]: peer 162.159.200.1 now valid jan 23 15:59:04 angela ntpd[84086]: peer 216.197.156.83 now valid jan 23 15:59:05 angela ntpd[84086]: peer 206.108.0.131 now valid jan 23 15:59:05 angela ntpd[84086]: peer 2001:678:8::123 now valid jan 23 15:59:05 angela ntpd[84086]: peer 149.56.121.16 now valid jan 23 15:59:07 angela ntpd[84086]: peer 2607:5300:205:200::1991 now valid jan 23 16:03:47 angela ntpd[84086]: clock is now synced

That seems kind of odd. It was also frustrating to have very little information from ntpctl about the state of the daemon. I understand it's designed to be minimal, but it could inform me on his known offset, for example. It does tell me about the offset with the different peers, but not as clearly as one would expect. It's also unclear how it disciplines the RTC at all.

Compared to chrony

Now compare with chrony:

jan 23 16:07:16 angela systemd[1]: Starting chrony, an NTP client/server... jan 23 16:07:16 angela chronyd[87765]: chronyd version 4.0 starting (+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP +SCFILTER +SIGND +ASYNCDNS +NTS +SECHASH +IPV6 -DEBUG) jan 23 16:07:16 angela chronyd[87765]: Initial frequency 3.814 ppm jan 23 16:07:16 angela chronyd[87765]: Using right/UTC timezone to obtain leap second data jan 23 16:07:16 angela chronyd[87765]: Loaded seccomp filter jan 23 16:07:16 angela systemd[1]: Started chrony, an NTP client/server. jan 23 16:07:21 angela chronyd[87765]: Selected source 206.108.0.131 (2.debian.pool.ntp.org) jan 23 16:07:21 angela chronyd[87765]: System clock TAI offset set to 37 seconds

First, you'll notice there's none of that "clock synced" nonsense, it picks a source, and then... it's just done. Because the clock on this computer is not drifting that much, and openntpd had (presumably) just sync'd it anyways. And indeed, if we look at detailed stats from the powerful chronyc client:

anarcat@angela:~(main)$ sudo chronyc tracking Reference ID : CE6C0083 (ntp1.torix.ca) Stratum : 2 Ref time (UTC) : Sun Jan 23 21:07:21 2022 System time : 0.000000311 seconds slow of NTP time Last offset : +0.000807989 seconds RMS offset : 0.000807989 seconds Frequency : 3.814 ppm fast Residual freq : -24.434 ppm Skew : 1000000.000 ppm Root delay : 0.013200894 seconds Root dispersion : 65.357254028 seconds Update interval : 1.4 seconds Leap status : Normal

We see that we are nanoseconds away from NTP time. That was ran very quickly after starting the server (literally in the same second as chrony picked a source), so stats are a bit weird (e.g. the Skew is huge). After a minute or two, it looks more reasonable:

Reference ID : CE6C0083 (ntp1.torix.ca) Stratum : 2 Ref time (UTC) : Sun Jan 23 21:09:32 2022 System time : 0.000487002 seconds slow of NTP time Last offset : -0.000332960 seconds RMS offset : 0.000751204 seconds Frequency : 3.536 ppm fast Residual freq : +0.016 ppm Skew : 3.707 ppm Root delay : 0.013363549 seconds Root dispersion : 0.000324015 seconds Update interval : 65.0 seconds Leap status : Normal

Now it's learning how good or bad the RTC clock is ("Frequency"), and is smoothly adjusting the System time to follow the average offset (RMS offset, more or less). You'll also notice the Update interval has risen, and will keep expanding as chrony learns more about the internal clock, so it doesn't need to constantly poll the NTP servers to sync the clock. In the above, we're 487 micro seconds (less than a milisecond!) away from NTP time.

(People interested in the explanation of every single one of those fields can read the excellent chronyc manpage. That thing made me want to nerd out on NTP again!)

On the machine with the bad clock, chrony also did a 1.5 second adjustment, but just once, at startup:

jan 18 11:54:33 curie chronyd[2148399]: Selected source 206.108.0.133 (2.debian.pool.ntp.org) jan 18 11:54:33 curie chronyd[2148399]: System clock wrong by -1.606546 seconds jan 18 11:54:31 curie chronyd[2148399]: System clock was stepped by -1.606546 seconds jan 18 11:54:31 curie chronyd[2148399]: System clock TAI offset set to 37 seconds

Then it would still struggle to keep the clock in sync, but not as badly as openntpd. Here's the offset a few minutes after that above startup:

System time : 0.000375352 seconds slow of NTP time

And again a few seconds later:

System time : 0.001793046 seconds slow of NTP time

I don't currently have access to that machine, and will update this post with the latest status, but so far I've had a very good experience with chrony on that machine, which is a testament to its resilience, and it also just works on my other machines as well.

Extras

On top of "just working" (as demonstrated above), I feel that chrony's feature set is so much superior... Here's an excerpt of the extras in chrony, taken from comparison table:

  • source frequency tracking
  • source state restore from file
  • temperature compensation
  • ready for next NTP era (year 2036)
  • replace unreachable / falseticker servers
  • aware of jitter
  • RTC drift tracking
  • RTC trimming
  • Restore time from file w/o RTC
  • leap seconds correction, in slew mode
  • drops root privileges

I even understand some of that stuff. I think.

So kudos to the chrony folks, I'm switching.

Caveats

One thing to keep in mind in the above, however is that it's quite possible chrony does as bad of a job as openntpd on that old machine, and just doesn't tell me about it. For example, here's another log sample from another server (marcos):

jan 23 11:13:25 marcos ntpd[1976694]: adjusting clock frequency by 0.451035 to -16.420273ppm

I get those basically every day, which seems to show that it's at least trying to keep track of the hardware clock.

In other words, it's quite possible I have no idea what I'm talking about and you definitely need to take this article with a grain of salt. I'm not an NTP expert.

Switching to chrony

Because the default configuration in chrony (at least as shipped in Debian) is sane (good default peers, no open network by default), installing it is as simple as:

apt install chrony

And because it somehow conflicts with openntpd, that also takes care of removing that cruft as well.

Categories: FLOSS Project Planets

ItsMyCode: How to Import CSV Files into R?

Planet Python - Sun, 2022-01-23 14:46

A comma-separated values (CSV) file is a delimited text file that uses a comma to separate the values. CSV files are popular formats for storing tabular data, i.e. data is composed of rows and columns.

In this article, we will learn how to import CSV files into R with the help of examples.

Importing CSV Files in R

There are 3 popular methods available to import CSV files into R. 

  • Using read.csv() method
  • Using read_csv() method
  • Using fread() method

In this tutorial, we will explore all the 3 methods and see how we can import the CSV file.

Using read.csv() method

The read.csv() method is used to import a CSV file, and it is best suitable for the small CSV files.

The contents of the CSV files are stored into a variable for further manipulation. We can even import multiple CSV files and store them into different variables.

The output returned will be in the format of DataFrame, where row numbers are assigned with integers.

Syntax: 

read.csv(path, header = TRUE, sep = “,”)

Arguments: 

  • path: CSV file path that needs to be imported.
  • header: Indicates whether to import headers in CSV. By default, it is set to TRUE.
  • sep: the field separator character

R often uses a concept of factors to re-encode strings. Hence it is recommended to set stringsAsFactors=FALSE so that R doesn’t convert character or categorical variables into factors.

# read the data from the CSV file data <- read.csv("C:\\Personal\\IMS\\cricket_points.csv", header=TRUE) # print the data variable (outputs as DataFrame) data

Output

ï..Teams Wins Lose Points 1 India 5 2 10 2 South Africa 3 4 6 3 West Indies 1 6 2 4 England 2 4 4 5 Australia 4 2 8 6 New Zealand 2 5 4 Method 2: Using read_csv() method

The read_csv() method is the most recommended way of reading the CSV file in R. It reads a CSV file one line at a time. 

The data is read in the form of Tibble, and only 10 rows are displayed at once, and the rest are available after expanding.

It also displays the percentage of the file read into the system making it more robust when compared to the read.csv() method.

If you are working with large CSV files, it’s recommended to use the read_csv() method. 

Syntax:

read_csv (path , col_names , n_max , col_types , progress )

Arguments : 

  • path: CSV file path that needs to be imported.
  • col_names: Indicates whether to import headers in CSV. By default, it is set to TRUE.
  • n_max: The maximum number of rows to read.
  • col_types: If any column succumbs to NULL, then the col_types can be specified in a compact string format.
  • progress: A progress meter to analyse the percentage of files read into the system
# import data.table library library(data.table) #import data data2 <- read_csv("C:\\Personal\\IMS\\cricket_points.csv")

Output

ï..Teams Wins Lose Points 1 India 5 2 10 2 South Africa 3 4 6 3 West Indies 1 6 2 4 England 2 4 4 5 Australia 4 2 8 6 New Zealand 2 5 4 Method 3: Using fread() method

If the CSV files are extremely large, the best way to import into R is using the fread() method from the data.table package.

The output of the data will be in the form of Data table in this case.

# import data.table library library(data.table) # read the CSV file data3 <- fread("C:\\Personal\\IMS\\cricket_points.csv") Teams Wins Lose Points 1: India 5 2 10 2: South Africa 3 4 6 3: West Indies 1 6 2 4: England 2 4 4 5: Australia 4 2 8 6: New Zealand 2 5 4

Note: It is recommended to use double backlashes (\\) while providing the file path. Else you may get below error.

Error: '\U' used without hex digits in character string starting ""C:\U"
Categories: FLOSS Project Planets

death and gravity: Dealing with YAML with arbitrary tags in Python

Planet Python - Sun, 2022-01-23 14:11

... in which we use PyYAML to safely read and write YAML with any tags, in a way that's as straightforward as interacting with built-in types.

If you're in a hurry, you can find the code at the end.

Contents Why is this useful? #

People mostly use YAML as a friendlier alternative to JSON1, but it can do way more.

Among others, it can represent user-defined and native data structures.

Say you need to read (or write) an AWS Cloud Formation template:

EC2Instance: Type: AWS::EC2::Instance Properties: ImageId: !FindInMap [ AWSRegionArch2AMI, !Ref 'AWS::Region', !FindInMap [AWSInstanceType2Arch, !Ref InstanceType, Arch], ] InstanceType: !Ref InstanceType >>> yaml.load(text) Traceback (most recent call last): ... yaml.constructor.ConstructorError: could not determine a constructor for the tag '!FindInMap' in "<unicode string>", line 4, column 14: ImageId: !FindInMap [ ^

... or, you need to safely read untrusted YAML that represents Python objects:

!!python/object/new:module.Class { attribute: value } >>> yaml.safe_load(text) Traceback (most recent call last): ... yaml.constructor.ConstructorError: could not determine a constructor for the tag 'tag:yaml.org,2002:python/object/new:module.Class' in "<unicode string>", line 1, column 1: !!python/object/new:module.Class ... ^

Warning

Historically, yaml.load(stream) was unsafe for untrusted data, because it allowed running arbitrary code. Consider using use safe_load() instead.

Details.

For example, you could do this:

>>> yaml.load("!!python/object/new:os.system [echo WOOSH. YOU HAVE been compromised]") WOOSH. YOU HAVE been compromised 0

There were a bunch of CVEs about it.

To address the issue, load() requires an explicit Loader since PyYAML 6. Also, version 5 added two new functions and corresponding loaders:

  • full_load() resolves all tags except those known to be unsafe (note that this was broken before 5.4, and thus vulnerable)
  • unsafe_load() resolves all tags, even those known to be unsafe (the old load() behavior)

safe_load() resolves only basic tags, remaining the safest.

Can't I just get the data, without it being turned into objects?

You can! The YAML spec says:

In a given processing environment, there need not be an available native type corresponding to a given tag. If a node’s tag is unavailable, a YAML processor will not be able to construct a native data structure for it. In this case, a complete representation may still be composed and an application may wish to use this representation directly.

And PyYAML obliges:

>>> text = """\ ... one: !myscalar string ... two: !mysequence [1, 2] ... """ >>> yaml.compose(text) MappingNode( tag='tag:yaml.org,2002:map', value=[ ( ScalarNode(tag='tag:yaml.org,2002:str', value='one'), ScalarNode(tag='!myscalar', value='string'), ), ( ScalarNode(tag='tag:yaml.org,2002:str', value='two'), SequenceNode( tag='!mysequence', value=[ ScalarNode(tag='tag:yaml.org,2002:int', value='1'), ScalarNode(tag='tag:yaml.org,2002:int', value='2'), ], ), ), ], ) >>> print(yaml.serialize(_)) one: !myscalar 'string' two: !mysequence [1, 2]

... the spec didn't say the representation has to be concise. ¯\_(ツ)_/¯

Here's how YAML processing works, to give you an idea what we're looking at:

YAML Processing Overview

The output of compose() above is the representation (node graph).

From that, safe_load() does its best to construct objects, but it can't do anything for tags it doesn't know about.

There must be a better way!

Thankfully, the spec also says:

That said, tag resolution is specific to the application. YAML processors should therefore provide a mechanism allowing the application to override and expand these default tag resolution rules.

We'll use this mechanism to convert tagged nodes to almost-native types, while preserving the tags.

A note on PyYAML extensibility #

PyYAML is a bit unusual.

For each processing direction, you have a corresponding Loader/Dumper class.

For each processing step, you can add callbacks, stored in class-level registries.

The callbacks are method-like – they receive the Loader/Dumper as the first argument:

Dice = namedtuple('Dice', 'a b') def dice_representer(dumper, data): return dumper.represent_scalar(u'!dice', u'%sd%s' % data) yaml.Dumper.add_representer(Dice, dice_representer)

You may notice the add_...() methods modify the class in-place, for everyone, which isn't necessarily great; imagine getting a Dice from safe_load(), when you were expecting only built-in types.

We can avoid this by subclassing, since the registry is copied from the parent. Note that because of how copying is implemented, registries from two direct parents are not merged – you only get the registry of the first parent in the MRO.

So, we'll start by subclassing SafeLoader/Dumper:

4 5 6 7 8class Loader(yaml.SafeLoader): pass class Dumper(yaml.SafeDumper): pass Preserving tags # Constructing unknown objects #

For now, we can use named tuples for objects with unknown tags, since they are naturally tag/value pairs:

12 13 14class Tagged(typing.NamedTuple): tag: str value: object

Tag or no tag, all YAML nodes are either a scalar, a sequence, or a mapping. For unknown tags, we delegate construction to the loader's default constructors, and wrap the resulting value:

17 18 19 20 21 22 23 24 25 26 27 28def construct_undefined(self, node): if isinstance(node, yaml.nodes.ScalarNode): value = self.construct_scalar(node) elif isinstance(node, yaml.nodes.SequenceNode): value = self.construct_sequence(node) elif isinstance(node, yaml.nodes.MappingNode): value = self.construct_mapping(node) else: assert False, f"unexpected node: {node!r}" return Tagged(node.tag, value) Loader.add_constructor(None, construct_undefined)

Constructors are registered by tag, with None meaning "unknown".

Things look much better already:

>>> yaml.load(text, Loader=Loader) { 'one': Tagged(tag='!myscalar', value='string'), 'two': Tagged(tag='!mysequence', value=[1, 2]), } A better wrapper #

That's nice, but every time we use any value, we have to check if it's tagged, and then go through value if is:

>>> one = _['one'] >>> one.tag '!myscalar' >>> one.value.upper() 'STRING'

We could subclass the Python types corresponding to core YAML tags (str, list, and so on), and add a tag attribute to each. We could subclass most of them, anyway – neither bool nor NoneType can be subclassed.

Or, we could wrap tagged objects in a class with the same interface, that delegates method calls and attribute access to the wrapee, with a tag attribute on top.

Tip

This is known as the decorator pattern design pattern (not to be confused with Python decorators).

Doing this naively entails writing one wrapper per type, with one wrapper method per method and one property per attribute. That's even worse than subclassing!

There must be a better way!

Of course, this is Python, so there is.

We can use an object proxy instead (also known as "dynamic wrapper"). While they're not perfect in general, the one wrapt provides is damn near perfect enough2:

12 13 14 15 16 17 18 19 20 21 22class Tagged(wrapt.ObjectProxy): # tell wrapt to set the attribute on the proxy, not the wrapped object tag = None def __init__(self, tag, wrapped): super().__init__(wrapped) self.tag = tag def __repr__(self): return f"{type(self).__name__}({self.tag!r}, {self.__wrapped__!r})" >>> yaml.load(text, Loader=Loader) { 'one': Tagged('!myscalar', 'string'), 'two': Tagged('!mysequence', [1, 2]), }

The proxy behaves identically to the proxied object:

>>> one = _['one'] >>> one.tag '!myscalar' >>> one.upper() 'STRING' >>> one[:3] 'str'

...up to and including fancy things like isinstance():

>>> isinstance(one, str) True >>> isinstance(one, Tagged) True

And now you don't have to care about tags if you don't want to.

Representing tagged objects #

The trip back is exactly the same, but much shorter:

39 40 41 42 43 44 45def represent_tagged(self, data): assert isinstance(data, Tagged), data node = self.represent_data(data.__wrapped__) node.tag = data.tag return node Dumper.add_representer(Tagged, represent_tagged)

Representers are registered by type.

>>> print(yaml.dump(Tagged('!hello', 'world'), Dumper=Dumper)) !hello 'world' Let's mark the occasion with some tests.

Since we still have stuff to do, we parametrize the tests from the start.

7 8 9 10 11 12 13 14 15 16 17 18 19 20BASIC_TEXT = """\ one: !myscalar string two: !mymapping three: !mysequence [1, 2] """ BASIC_DATA = { 'one': Tagged('!myscalar', 'string'), 'two': Tagged('!mymapping', {'three': Tagged('!mysequence', [1, 2])}), } DATA = [ (BASIC_TEXT, BASIC_DATA), ]

Loading works:

23 24 25@pytest.mark.parametrize('text, data', DATA) def test_load(text, data): assert yaml.load(text, Loader=Loader) == data

And dumping works:

28 29 30 31@pytest.mark.parametrize('text', [t[0] for t in DATA]) def test_roundtrip(text): data = yaml.load(text, Loader=Loader) assert data == yaml.load(yaml.dump(data, Dumper=Dumper), Loader=Loader)

... but only for known types:

34 35 36def test_dump_error(): with pytest.raises(yaml.representer.RepresenterError): yaml.dump(object(), Dumper=Dumper) Unhashable keys #

Let's try an example from the PyYAML documentation:

>>> text = """\ ... ? !!python/tuple [0,0] ... : The Hero ... ? !!python/tuple [1,0] ... : Treasure ... ? !!python/tuple [1,1] ... : The Dragon ... """

This is supposed to result in something like:

>>> yaml.unsafe_load(text) {(0, 0): 'The Hero', (1, 0): 'Treasure', (1, 1): 'The Dragon'}

Instead, we get:

>>> yaml.load(text, Loader=Loader) Traceback (most recent call last): ... TypeError: unhashable type: 'list'

That's because the keys are tagged lists, and neither type is hashable:

>>> yaml.load("!!python/tuple [0,0]", Loader=Loader) Tagged('tag:yaml.org,2002:python/tuple', [0, 0])

This limitation comes from how Python dicts are implemented,3 not from YAML; quoting from the spec again:

The content of a mapping node is an unordered set of key/value node pairs, with the restriction that each of the keys is unique. YAML places no further restrictions on the nodes. In particular, keys may be arbitrary nodes, the same node may be used as the value of several key/value pairs and a mapping could even contain itself as a key or a value.

Constructing pairs #

What now?

Same strategy as before: wrap the things we can't handle.

Specifically, whenever we have a mapping with unhashable keys, we return a list of pairs instead. To tell it apart from plain lists, we use a subclass:

48 49 50class Pairs(list): def __repr__(self): return f"{type(self).__name__}({super().__repr__()})"

Again, we let the loader do most of the work:

53 54 55 56 57 58 59 60 61def construct_mapping(self, node): value = self.construct_pairs(node) try: return dict(value) except TypeError: return Pairs(value) Loader.construct_mapping = construct_mapping Loader.add_constructor('tag:yaml.org,2002:map', Loader.construct_mapping)

We set construct_mapping so that any other Loader constructor wanting to make a mapping gets to use it (like our own construct_undefined() above). Don't be fooled by the assignment, it's a method like any other.4 But we're changing the class from outside anyway, let's stay consistent.

Note that overriding construct_mapping() is not enough: we have to register the constructor explictly, otherwise SafeDumper's construct_mapping() will be used (since that's what was in the registry before).

Note

In case you're wondering, this feature is orthogonal from handling unknown tags; we could have used different classes for them. However, as mentioned before, the constructor registry breaks multiple inheritance, so we couldn't use the two features together.

Anyway, it works:

>>> yaml.load(text, Loader=Loader) Pairs( [ (Tagged('tag:yaml.org,2002:python/tuple', [0, 0]), 'The Hero'), (Tagged('tag:yaml.org,2002:python/tuple', [1, 0]), 'Treasure'), (Tagged('tag:yaml.org,2002:python/tuple', [1, 1]), 'The Dragon'), ] ) Representing pairs #

Like before, the trip back is short and uneventful:

64 65 66 67 68 69def represent_pairs(self, data): assert isinstance(data, Pairs), data node = self.represent_dict(data) return node Dumper.add_representer(Pairs, represent_pairs) >>> print(yaml.dump(Pairs([([], 'one')]), Dumper=Dumper)) []: one Let's test this more thoroughly.

Because the tests are parametrized, we just need to add more data:

18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33UNHASHABLE_TEXT = """\ [0,0]: one !key {0: 1}: {[]: !value three} """ UNHASHABLE_DATA = Pairs( [ ([0, 0], 'one'), (Tagged('!key', {0: 1}), Pairs([([], Tagged('!value', 'three'))])), ] ) DATA = [ (BASIC_TEXT, BASIC_DATA), (UNHASHABLE_TEXT, UNHASHABLE_DATA), ] Conclusion #

YAML is extensible by design. I hope that besides what it says on the tin, this article shed some light on how to customize PyYAML for your own purposes, and that you've learned at least one new Python thing.

You can get the code here, and the tests here.

Learned something new today? Share this with others, it really helps!

Want more? Get updates via email or Atom feed.

Bonus: hashable wrapper #

You may be asking, why not make the wrapper hashable?

Most unhashable (data) objects are that for a reason: because they're mutable.

We have two options:

  • Make the wrapper hash change with the content. This this will break dictionaries in strange and unexpected ways (and other things too) – the language requires mutable objects to be unhashable.

  • Make the wrapper hash not change with the content, and wrappers equal only to themselves – that's what user-defined classes do by default anyway.

    This works, but it's not very useful, because equal values don't compare equal anymore (data != load(dump(data))). Also, it means you can only get things from a dict if you already have the object used as key:

    >>> data = {Hashable([1]): 'one'} >>> data[Hashable([1])] Traceback (most recent call last): ... KeyError: Hashable([1]) >>> key = list(data)[0] >>> data[key] 'one'

    I'd file this under "strange and unexpected" too.

    (You can find the code for the example above here.)

>> yaml.compose("!x!aaa x") Traceback (most recent call last): ... ParserError: while parsing a node found undefined tag handle '!x!' in "", line 1, column 1: !x!aaa x ^ ``` ```pycon >>> yaml.compose("key: *alias") Traceback (most recent call last): ... ComposerError: found undefined alias 'alias' in "", line 1, column 6: key: *alias ^ ``` -->
  1. Of which YAML is actually a superset. [return]

  2. Timothy 20:9. [return]

  3. Using a hash table. For nice explanation of how it all works, complete with a pure-Python implementation, check out Raymond Hettinger's talk Modern Python Dictionaries: A confluence of a dozen great ideas (code). [return]

  4. Almost. The zero argument form of super() won't work for methods defined outside of a class definition, but we're not using it here. [return]

Categories: FLOSS Project Planets

Keyboards and Open-Source

Planet KDE - Sun, 2022-01-23 13:25
Keyboards and Open-Source, how is that related?

In my Keyboard Fun post from last year I talked a bit about my interest in mechanical keyboards.

Since then, I played around with a few more keyboards/switches/keycaps/…

Interesting enough, beside the actual hardware, naturally there is some software component to all these keyboards, too.

Whereas most commercial keyboards still come with proprietary firmware, there is the trend within the keyboard enthusiast scene to go for open-source firmware.

This allows you to properly update the firmware even from your Linux machine and do proper configuration of e.g. the keymap, too.

QMK Firmware

A popular project in that area is QMK.

I supports a mass of keyboards out of the box already and is actively extended by both volunteers and some companies.

That means it is deployed not only on main stream products but even in more exotic projects like the “I improve my vintage and modern Model M keyboards” by Eric S. Raymond.

VIA

Whereas QMK provides the open-source firmware part and you can do close to everything with it that is possible, given the features your hardware actually has, it is hard for simple task like “I want that my key x does y”.

Naturally you can change the keymap in your QMK port and compile & flash. But even I would call this a sub-optimal workflow, given a lot of commercial offerings at least provide some GUI to do this on the fly.

Here VIA comes into the picture.

For sure, it is an Electron based monster, but it provides a cross-platform UI for QMK based keyboards that allow on the fly configuration of at least the common things, like keymaps. And it provides trivial things like testing all your keys, which is not that unneeded, given I was too dumb to properly install all my hot-swap switches ;)

VIA UI

Actual Keyboard?

Naturally, after this talk about the software side, all this makes no sense without an actual keyboard using it.

As I use the German ISO layout for typing, I am more limited on product choices than e.g. people using the ANSI layout.

It is really frustrating that where ever you look for some cool keyboard project, in many cases no ISO variant is available. And yes, I don’t want to switch to ANSI, I like to have my umlauts easily accessible and I can’t swap all keyboards I need to use at work with ANSI variants, others would be not amused.

Therefore, if you are in need of some ISO layout keyboard, you might be interested in the information below. If you use ANSI, ignore all this, there are masses of ANSI keyboards out there to buy, with QMK, too. I have done no great research how the keyboard I did choose compares to them, for ISO there were not that many available contenders that were 75%, hot-swap and QMK ready.

After some trial and error I went with a Keychron Q1 75% keyboard. It is available in ISO layout, unfortunately only as bare bone kit, that means you must buy your own switches and keycaps. It naturally comes already with factory installed QMK, nice, above the VIA screenshot was actually from this board on my Linux machine.

For switches, I went with some BOX Navy switches, they are very heavy but have a nice click ;) Even my office neighbor is happy with the sound and hasn’t yet attacked me. I won’t link random reviews of them, you can search for that yourself if you are interested. In any case, yes, they are HEAVY, really, you can believe that from the reviews. And they are loud, but in no bad way.

For keycaps, yeah, same issue with the German ISO layout, there are not many sets that are available.

At work I now have some SA profile set from Signature Plastics, they are able to produce sets with proper legends and no missing German keys, unlike some other vendors I tried (and yes, I tried it with cheap vendors, it seems not to be trivial at all print all the proper German keys at all and not just forget them in the package…). Funny enough, shipping from US did take 4 weeks, even with air express, USPS seems to be not the fasted variant of travel. If others play with the idea to buy there, I must confess the quality is really good, but they are expensive, if you don’t require exotic layouts like German, I would rather go with some cheaper sets, for US ANSI even the cheapest I tried out were ok, without obvious faults. sting all your keys, which is not that unneeded, given I was too dumb to properly install all my hot-swap switches ;)

Keychron Q1 Ice Cap Keycaps

If you look a bit more around on the picture you will see I have still my good old Nokia rubber ducky, a sole survivor from the time Nokia owned Qt :P And no, I don’t use a Mac, that is just one we use for our compile farm.

At home I went with some MT3 profile set without any legends, that is really cheap and funny enough did take only 4 days from US to Germany with standard UPS.

Keychron Q1 MT3 /dev/tty Keycaps

:=) And no, no second Nokia ducky at home.

So far, the Q1 works nicely, both at work and at home. Having the exact same layout and switches in both places really helps to get used to it.

Using VIA works nicely, too. So far I have not flashed any updated QMK version, therefore no experience how well that works in practice.

I actually even learned a bit more about my use of the different keys. On the work picture you still see on the right the page up/down buttons (with Fn key => home/end). At home I already reprogrammed that to home/end (with Fn key => page up/down), as I use that far more often during editing whereas the page up/down stuff just rarely in the terminal. Actually, I didn’t know I would miss these two keys until they were no longer easy accessible ;=)

Categories: FLOSS Project Planets

ItsMyCode: Python String strip()

Planet Python - Sun, 2022-01-23 03:37

The Python String strip() method is a built-in function that strips both leading and trailing characters based on the arguments passed to the function and returns the copy of a string.

In this article, we will learn about the Python String strip() method with the help of examples.

If we want to remove only the leading characters in a string, we could use Python String lstrip(). Similarly, if we want to strip only the trailing characters, we could use Python String rstrip() method.

strip() Syntax

The Syntax of strip() method is:

string.strip([chars]) strip() Parameters

The strip() method takes one parameter, and it’s optional.

  • chars(optional) – set of characters representing string that needs to be removed from both left and right-hand sides of the string. 

If the chars argument is not passed, the strip() function will strip whitespaces at the start and end of the string. 

strip() Return Value

The strip() method returns a copy of the string by stripping both leading and trailing characters based on the arguments passed.

Note: 

  • If we do not pass any arguments to strip() function, by default, all the leading and trailing whitespaces are truncated from a string.
  • If the string does not have any whitespaces at the start or end, the string will be returned as-is, matching the original string.
  • If the characters passed in the arguments do not match the characters at the beginning of the string, it will stop removing the leading characters.
  • Similarly, if the characters passed in the arguments do not match the characters at the end of the string, it will stop removing the trailing characters.
Example: Working of the strip() method

Below are the various working example of the strip() method. We can use it to remove whitespace or we can strip the characters at both leading and trailing end.

  • text1.strip() – Removes the whitespace at both leading and trailing of the string
  • text3.strip(' code') – Remove the whitespace and the substring code from both leading and trailing end
  • text2.strip('code') – Removes only the substring from both leading and trailing end.
  • text4.strip('The') – Removes the substring only at the leading end.
# Leading and trailing whitespaces are removed text1 = ' Python Programming ' print(text1.strip()) # Remove the whitespace and specified character on # both leading and trailing end text3 = ' code its my code ' print(text3.strip(' code')) # Remove the specified character at # both leading and trailing end text2 = 'code its my code' print(text2.strip('code')) # strips only the beginning # of the string text4="The Coding is fun" print(text4.strip('The'))

Output

Python Programming its my its my Coding is fun Example 2 – How to use strip() method in real world?

Suppose we are fetching data from external sources like Excel, DB or third party APIs. In that case, there are higher chances that data is not formed correctly, and we may get the separators like pipe, comma, hyphen etc., appended to the string. We can use the strip() method to remove those special characters and preserve the original value.

In the below example, we have a list of languages and the hyphen is appended at both the trailing and leading end of each element. We can simply iterate the list and remove the hyphen using the strip() method as shown below.

langs = ['-Python-', '-Java-', '-Javascript-', '-C#-', '-C++-'] new_langs = [] for l in langs: new_langs.append(l.strip('-')) print(new_langs)

Output

['Python', 'Java', 'Javascript', 'C#', 'C++']
Categories: FLOSS Project Planets

ItsMyCode: Python String lstrip()

Planet Python - Sun, 2022-01-23 03:37

The Python String lstrip() method is a built-in function that strips leading characters based on the arguments passed to the function and returns the copy of a string.

In this article, we will learn about the Python String lstrip() method with the help of examples.

lstrip() Syntax

The Syntax of lstrip() method is:

string.lstrip([chars]) lstrip() Parameters

The lstrip() method takes one parameter, and it’s optional.

  • chars(optional) – set of characters representing string that needs to be removed from the left-hand side of the string. 

If the chars argument is not passed, the lstrip() function will strip whitespaces at the start of the string. 

lstrip() Return Value

The lstrip() method returns a copy of the string by stripping the leading characters based on the arguments passed.

Note: 

  • If we do not pass any arguments to lstrip() function, by default, all leading whitespaces are truncated from a string.
  • If the string does not have any whitespaces at the beginning, the string will be returned as-is, matching the original string.
  • If the characters passed in the arguments do not match the characters at the beginning of the string, it will stop removing the leading characters.
Example 1: Working of lstrip() # Only leading whitespaces are removed text1 = ' Python Programming ' print(text1.lstrip()) # Remove the whitespace and specified character at # leading end text2 = ' code its my code ' print(text2.lstrip(' code')) # Remove the specified character at # leading end text3 = 'code its my code' print(text3.lstrip('code')) Python Programming its my code its my code Example 2 – How to use lstrip() method in real world?

In the below example, we have a list of the price in dollars. However the dollar sign is appended at both the trailing and leading end of each element. We can simply iterate the list and remove the dollar symbol at the left hand side using the lstrip() method as shown below.

price = ['$100$', '$200$', '$300$', '$400$', '$500$'] new_price = [] for l in price: new_price.append(l.lstrip('$')) print(new_price)

Output

['100$', '200$', '300$', '400$', '500$']
Categories: FLOSS Project Planets

ItsMyCode: Python String rstrip()

Planet Python - Sun, 2022-01-23 03:37

The Python String rstrip() method is a built-in function that strips trailing characters based on the arguments passed to the function and returns the copy of a string.

In this article, we will learn about the Python String rstrip() method with the help of examples.

rstrip() Syntax

The Syntax of rstrip() method is:

string.rstrip([chars]) rstrip() Parameters

The rstrip() method takes one parameter, and it’s optional.

  • chars(optional) – set of characters representing string that needs to be removed from the right-hand side of the string. 

If the chars argument is not passed, the rstrip() function will strip whitespaces at the end of the string. 

rstrip() Return Value

The rstrip() method returns a copy of the string by stripping the trailing characters based on the arguments passed.

Note: 

  • If we do not pass any arguments to rstrip() function, by default, all trailing whitespaces are truncated from a string.
  • If the string does not have any whitespaces at the end, the string will be returned as-is, matching the original string.
  • If the characters passed in the arguments do not match the characters at the end of the string, it will stop removing the trailing characters.
Example 1: Working of rstrip() # Only trailing whitespaces are removed text1 = ' Python Programming ' print(text1.rstrip()) # Remove the whitespace and specified character at # trailing end text2 = ' code its my code ' print(text2.rstrip(' code')) # Remove the specified character at # trailing end text3 = 'code its my code' print(text3.rstrip('code'))

Output

Python Programming code its my code its my Example 2 – How to use rstrip() method in real world?

In the below example, we have a list of the price in dollars. However the dollar sign is appended at both the trailing and leading end of each element. We can simply iterate the list and remove the dollar symbol at the right hand side using the rstrip() method as shown below.

price = ['$100$', '$200$', '$300$', '$400$', '$500$'] new_price = [] for l in price: new_price.append(l.rstrip('$')) print(new_price)

Output

['$100', '$200', '$300', '$400', '$500']
Categories: FLOSS Project Planets

Pages