FLOSS Project Planets

Python for Beginners: Convert String to DataFrame in Python

Planet Python - Tue, 2023-03-21 09:00

We use strings for text manipulation in Python. On the other hand, we use dataframes to handle tabular data in python. Despite this dissimilarity, we may need to convert a string to a pandas dataframe. This article discusses different ways to convert a string to a dataframe in python.

Table of Contents
  1. Convert String to DataFrame in Python
  2. Convert String to DataFrame Column
  3. JSON to Pandas DataFrame in Python
  4. Create DataFrame From Dictionary String in Python
  5. List String to DataFrame in Python
  6. Conclusion
Convert String to DataFrame in Python

To convert a string into a dataframe of characters in python, we will first convert the string into a list of characters using the list() function. The list() function takes the string as its input argument and returns a list of characters.

Next, we will pass this list to the DataFrame() function to create a dataframe using all the characters of the string. You can observe this in the following example.

import pandas as pd myStr="PFB" print("The string is:") print(myStr) myList=list(myStr) df=pd.DataFrame(myList) print("The output dataframe is:") print(df)


The string is: PFB The output dataframe is: 0 0 P 1 F 2 B

In the above example, we first converted the string "PFB" to a list of characters. Then, we used the DataFrame() function to create a dataframe from the list of characters.

Convert String to DataFrame Column

If you want to convert a string to a dataframe column, you can use the columns parameter in the DataFrame() function. When we pass a list of strings to the columns parameter in the DataFrame() function, the newly created dataframe contains all the strings as its column. 

To create a dataframe column from a string, we will first put the string into a list. Then, we will pass the list to the columns parameter in the DataFrame() function. After executing the DataFrame() function, we will get the dataframe with the given string as its column name as shown in the following example.

import pandas as pd myStr="PFB" print("The string is:") print(myStr) df=pd.DataFrame(columns=[myStr]) print("The output dataframe is:") print(df)


The string is: PFB The output dataframe is: Empty DataFrame Columns: [PFB] Index: []

In this example, you can observe that the string "PFB" is converted to a column of the output dataframe. This is due to the reason that we assigned the list containing the string to the columns parameter as an input argument.

JSON to Pandas DataFrame in Python

JSON strings are used to store and transmit data in software systems. Sometimes, we might need to convert a json string to a dataframe in python. For this, we will use the following step.

  • First, we will convert the json string to a python dictionary using the loads() method defined in the json module. The loads() method takes the json string as its input argument and returns the corresponding python dictionary.
  • Next, we will put the dictionary into a list. After that, we will pass the list to the DataFrame() function as input. 

After execution of the DataFrame() function, we will get the dataframe created from the json string. You can observe this in the following example.

import pandas as pd import json jsonStr='{"firstName": "John", "lastName": "Doe", "email": "john.doe@example.com", "age": 32}' print("The json string is:") print(jsonStr) myDict=json.loads(jsonStr) df=pd.DataFrame([myDict]) print("The output dataframe is:") print(df)


The json string is: {"firstName": "John", "lastName": "Doe", "email": "john.doe@example.com", "age": 32} The output dataframe is: firstName lastName email age 0 John Doe john.doe@example.com 32 Create DataFrame From Dictionary String in Python

To create a dataframe from a dictionary string, we will use the eval() function. The eval() function is used to evaluate expressions in python. When we pass a string containing a dictionary to the eval() function, it returns a python dictionary

After creating the dictionary, we will put it into a list and pass it to the DataFrame() function. After executing the DataFrame() function, we will get the output dataframe as shown below.

import pandas as pd dictStr='{"firstName": "John", "lastName": "Doe", "email": "john.doe@example.com", "age": 32}' print("The dictionary string is:") print(dictStr) myDict=eval(jsonStr) df=pd.DataFrame([myDict]) print("The output dataframe is:") print(df)


The dictionary string is: {"firstName": "John", "lastName": "Doe", "email": "john.doe@example.com", "age": 32} The output dataframe is: firstName lastName email age 0 John Doe john.doe@example.com 32

In this example, we first converted the dictionary string into a dictionary. Then, we inserted the dictionary into a list. Finally, we converted the list of dictionaries to a dataframe using the DataFrame() function.

List String to DataFrame in Python

Instead of a dictionary string, you can also convert a list string to a dataframe using the eval() function and the DataFrame() function as shown in the following example.

import pandas as pd listStr='[1,22,333,4444,55555]' print("The list string is:") print(listStr) myList=eval(listStr) df=pd.DataFrame([myList]) print("The output dataframe is:") print(df)


The list string is: [1,22,333,4444,55555] The output dataframe is: 0 1 2 3 4 0 1 22 333 4444 55555 Conclusion

In this article, we discussed different ways to convert a string to a dataframe in python. To learn more about python programming, you can read this article on how to convert a pandas series to a dataframe. You might also like this article on how to iterate rows in a pandas dataframe.

I hope you enjoyed reading this article. Stay tuned for more informative articles.

Happy learning!

The post Convert String to DataFrame in Python appeared first on PythonForBeginners.com.

Categories: FLOSS Project Planets

Bálint Réczey: Building the Linux kernel under 10 seconds with Firebuild

Planet Debian - Tue, 2023-03-21 04:54

Russell published an interesting post about his first experience with Firebuild accelerating refpolicy‘s and the Linux kernel‘s build. It turned out a few small tweaks could accelerate the builds even more, crossing the 10 second barrier with Linux’s build.

Build performance with 18 cores

The Linux kernel’s build time is a widely used benchmark for compilers, making it a prime candidate to test a build accelerator as well. In the first run on Russell’s 18 core test system the observed user+sys CPU time was cut by 44% with an actual increase in wall clock time which was quite unusual. Firebuild performed much better than that in prior tests. To replicate the results I’ve set up a clean Debian Bookworm VM on my machine:

lxc launch images:debian/bookworm –vm -c limits.cpu=18 -c limits.memory=16GB bookworm-vm

Compiling Linux 6.1.10 in this clean Debian VM showed build times closer to what I expected to see, ~72% less wall clock time and ~97% less user+sys CPU time:

$ make defconfig && time make bzImage -j18 real 1m31.157s user 20m54.256s sys 2m25.986s $ make defconfig && time firebuild make bzImage -j18 # first run: real 2m3.948s user 21m28.845s sys 4m16.526s # second run real 0m25.783s user 0m56.618s sys 0m21.622s

There are multiple differences between Russell’s and my test system including having different CPUs (E5-2696v3 vs. virtualized Ryzen 5900X) and different file systems (BTRFS RAID-1 vs ext4), but I don’t think those could explain the observed mismatch in performance. The difference may be worth further analysis, but let’s get back to squeezing out more performance from Firebuild.

Firebuild was developed on Ubuntu. I was wondering if Firebuild was faster there, but I got only slightly better build times in an identical VM running Ubuntu 22.10 (Kinetic Kudu):

$ make defconfig && time make bzImage -j18 real 1m31.130s user 20m52.930s sys 2m12.294s $ make defconfig && time firebuild make bzImage -j18 # first run: real 2m3.274s user 21m18.810s sys 3m45.351s # second run real 0m25.532s user 0m53.087s sys 0m18.578s

The KVM virtualization certainly introduces an overhead, thus builds must be faster in LXC containers. Indeed, all builds are faster by a few percents:

$ lxc launch ubuntu:kinetic kinetic-container ... $ make defconfig && time make bzImage -j18 real 1m27.462s user 20m25.190s sys 2m13.014s $ make defconfig && time firebuild make bzImage -j18 # first run: real 1m53.253s user 21m42.730s sys 3m41.067s # second run real 0m24.702s user 0m49.120s sys 0m16.840s # Cache size: 1.85 GB

Apparently this ~72% reduction in wall clock time is what one should expect by simply prefixing the build command with firebuild on a similar configuration, but we should not stop here. Firebuild does not accelerate quicker commands by default to save cache space. This howto suggests letting firebuild accelerate all commands including even "sh” by passing "-o 'processes.skip_cache = []'” to firebuild.

Accelerating all commands in this build’s case increases cache size by only 9%, and increases the wall clock time saving to 91%, not only making the build more than 10X faster, but finishing it in less than 8 seconds, which may be a new world record!:

$ make defconfig && time firebuild -o 'processes.skip_cache = []' make bzImage -j18 # first run: real 1m54.937s user 21m35.988s sys 3m42.335s # second run real 0m7.861s user 0m15.332s sys 0m7.707s # Cache size: 2.02 GB

There are even faster CPUs on the market than this 5900X. If you happen to have access to one please leave a comment if you could go below 5 seconds!

Scaling to higher core counts and comparison with ccache

Russell raised the very valid point about Firebuild’s single threaded supervisor being a bottleneck on high core systems and comparison to ccache also came up in comments. Since ccache does not have a central supervisor it could scale better with more cores, but let’s see if ccache could go below 10 seconds with the build times…

firebuild -o ‘processes.skip_cache = []’ and ccache scaling to 24 cores

Well, no. The best time time for ccache is 18.81s, with -j24. Both firebuild and ccache keep gaining from extra cores up to 8 cores, but beyond that the wall clock time improvements diminish. The more interesting difference is that firebuild‘s user and sys time is basically constant from -j1 to -j24 similarly to ccache‘s user time, but ccache‘s sys time increases linearly or exponentially with the number of used cores. I suspect this is due to the many parallel ccache processes performing file operations to check if cache entries could be reused, while in firebuild’s case the supervisor performs most of that work – not requiring in-kernel synchronization across multiple cores.

It is true, that the single threaded firebuild supervisor is a bottleneck, but the supervisor also implements a central filesystem cache, thus checking if a command’s cache entry can be reused can be implemented with much fewer system calls and much less user space hashing making the architecture more efficient overall than ccache‘s.

The beauty of Firebuild is not being faster than ccache, but being faster than ccache with basically no hard-coded information about how C compilers work. It can accelerate any other compiler or program that generates deterministic output from its input, just by observing what they did in their prior runs. It is like having ccache for every compiler including in-house developed ones, and also for random slow scripts.

Categories: FLOSS Project Planets

Specbee: What You Need To Know About Continuous Integration and Testing in Drupal

Planet Drupal - Tue, 2023-03-21 04:34
What You Need To Know About Continuous Integration and Testing in Drupal Shubham Gaur 21 Mar, 2023 Subscribe to our Newsletter Now Subscribe Leave this field blank

Drupal is a rapidly growing content management system (CMS). It has 1.3 million users, which is increasing daily. This platform helps in creating different websites, intranets, and web applications. Drupal is a widely used application because it integrates with Continuous Integration and Continuous Testing (CI/CT) tools, which have numerous benefits.

This blog will discuss everything about CI/CT and Drupal.

Importance of Continuous Integration

Continuous testing makes sure that the testing process is easy and automatic. It integrates code changes into a shared repository. Addresses the issue early in the development process and makes finding and removing bugs from the software easier.

Integration is a very important part of the software development method. Here, members of the team have to perform multiple integrations every day. An automated build is used to check those integrations. This automation build includes a test for detecting integration errors faster.

CI helps in testing, reviewing, and integrating the changes into the codebase more quickly and efficiently. Working on isolated code branches can cause several issues. CI prevents those issues and reduces the risk of a merge conflict.

Benefits of Continuous Integration

Continuous Integration is used in Drupal development for a variety of reasons. Some of them are given below.

The key benefits of Using Continuous Integration are:

● Build Automation and Self-testing

Automated environments help in building and launching the system using a single command. Whereas self-testing makes detecting and eradicating the bugs much easier.

● Daily Commits and Integration machine

It is recommended for developers to commit to the machine every day. This way, build tests will be passed immediately, and the correct code will be generated. Integration machines require regular builds and successful build integration.

● Immediate Fix of broken builds and rapid feedback

Continuous build is done to fix the issues in the mainline build immediately. Also, it is necessary to keep the build fast and provide rapid feedback.

● State of the system and Deployment automation

The working of the system should be visible to everyone. The alterations that have been made must be visible to every team member. Deployment automation requires the testers and developers to have scripts. These scripts will help them deploy the application easily into different environments.

How Does Continuous Integration Work?

There are several steps that developers need to follow for successful integration. Alterations must be committed to the repository, and the codes must be thoroughly checked. Developers’ private workspaces must look over the code.

CI server is used to check alterations and build the system. The server runs unit and integration tests and alerts the team members if the build tests fail. The team members fix the issue and continue to test and integrate the project.

The four key steps of CI are code, build, test, and deploy.

  1. Developers write code and commit changes to a shared code repository.
  2. A CI server monitors the code repository for changes, and when changes are detected, the server checks out the latest code and builds the software.
  3. The CI server runs automated tests on the built software to verify that the code changes have introduced no bugs or broken any existing functionality.
  4. If the tests pass, the CI server may deploy the code changes to a staging or production environment, depending on the organization's release process.

CI typically involves using a version control system (such as Git or SVN) to manage code changes and a build server (such as Jenkins, Travis CI, or CircleCI) to build and test the code changes. Automation testing is critical to CI, enabling developers to catch and fix bugs introduced by code changes quickly.

By catching problems early in the development process, CI can help teams to reduce the time and cost of software development while also improving the quality and reliability of the software being produced.

What Are The Continuous Integration Tools?

Many Continuous Integration (CI) tools are available, each with strengths and weaknesses. Here are some of the most popular CI tools used by software development teams:

● Jenkins

This is a popular open-source CI tool with a large user community. It can be easily customized with plugins and has integrations with many other tools.

● Travis CI

This cloud-based CI tool is popular for its ease of use and seamless integration with GitHub.

● CircleCI

This cloud-based CI tool is popular for its speed and scalability. It also integrates with many other tools, such as Docker and AWS.

● GitLab CI/CD

This is a built-in CI/CD tool within GitLab, a popular Git repository management system. It is open source and has integrations with many other tools.

● Bamboo

This is a CI/CD tool from Atlassian, the makers of JIRA and Confluence. It has integrations with many other Atlassian tools, as well as other third-party tools.

● TeamCity

This is a CI tool from JetBrains, the makers of IntelliJ IDEA, and other IDEs. Its adaptability and simplicity make it appealing.

● Azure DevOps

This is a cloud-based CI/CD tool from Microsoft. It integrates with many other Microsoft tools, such as Visual Studio and GitHub.

These are just a few of the many CI tools available. When choosing a CI tool, its important to consider factors such as ease of use, integrations with other tools, cost, and the size and complexity of the development team.

Key Practices That Form An Effective Continuous Integration

Here are some key practices that form an effective Continuous Integration (CI) process:

Version Control

A CI process starts with version control, essential for managing code changes, resolving conflicts, and collaborating effectively. Git, SVN, and Mercurial are popular version control systems.

Automated Build

In a CI process, code is always committed to the version control system. It triggers an automated build process to compile and package the code. This ensures that the code builds successfully and eliminates any manual errors.

Automated Testing

Automated testing is a critical component of a CI process. Tests should be automated so that they can be run every time code is committed, and they should cover both functional and non-functional aspects of the application.

Continuous Feedback

CI provides continuous feedback to developers through automated build and test processes. Any issues or failures should be identified and reported immediately to be addressed promptly.

Continuous Deployment

Automated deployment can help reduce the time to get code into production and ensure that the deployment process is consistent and reliable.

Continuous Improvement

A CI process should be constantly monitored and improved. This includes reviewing build and test results, identifying and addressing issues, and optimizing the process to make it faster and more effective.

Effective communication and collaboration among team members are essential for a successful CI process. Developers, testers, and operations personnel should work together closely to identify issues and resolve them quickly.

By following these key practices, teams can implement an effective CI process that helps to ensure high-quality software development and deployment.

What Is Continuous Integration For Drupal?

Continuous integration (CI) for Drupal involves regularly integrating code changes from multiple developers into a shared code repository, building and testing the code changes, and automatically deploying the changes to a testing or staging environment.

Here are some of the key benefits of implementing CI for Drupal:

● Reduced risk

By regularly integrating and testing code changes, CI can help catch and fix errors early in the development cycle, reducing the risk of introducing bugs or breaking functionality.

● Improved collaboration

Developers can collaborate more easily and effectively by working from a shared code repository.

● Faster feedback

With automated testing, developers can get feedback on their code changes quickly, enabling them to make corrections and improvements more rapidly. Different cloud-based testing platforms like LambdaTest can help you achieve faster feedback on code
commits and get a quicker go-to-market.

LambdaTest is a digital experience testing cloud that allows organizations and enterprises to perform manual and automated testing for web and mobile. It offers different offerings like real-time testing, Selenium testing, Cypress testing, Appium testing, OTT testing, testing on real device cloud, and more.

LambdaTest’s online device farm lets you test at scale across 3000+ real browsers, devices, and OS combinations. It also integrates with many CI/CD tools like Jenkins, CircleCI, and Travis CI.

● Consistency

By using consistent tools and processes for development, testing, and deployment, teams can ensure that all code changes are properly vetted and tested before they are deployed to production.

Implementing CI and Testing In Drupal

Like many web application frameworks, Drupal can benefit from continuous integration (CI) and testing practices. Here are some general steps that can be taken to implement CI and test in Drupal:

  1. Set up a version control system (VCS) such as Git or SVN to manage code changes. All developers should be encouraged to commit their changes to the VCS regularly.
  2. Use a continuous integration (CI) tool such as Jenkins, Travis CI, or CircleCI to automatically build and test Drupal code changes whenever they are committed to the VCS.
  3. Write automated Drupal tests using a framework like PHPUnit or Behat. Tests should cover both functional and non-functional aspects of the application.
  4. Configure the CI tool to run automated tests whenever new code changes are detected. If any tests fail, developers should be notified immediately so they can fix the issue.
  5. Use tools like CodeSniffer and PHPMD to check for violations of coding standards and best practices.
  6. Consider using tools like Docker or Vagrant to help automate the setup and configuration of development environments and ensure consistency across development, testing, and production environments.
  7. There are also contributed modules available for Drupal that can help with testing, such as SimpleTest or the Drupal Extension for Behat.
Final Thoughts

To implement CI for Drupal, development teams can use various tools like Jenkins, Travis CI, or CircleCI and write automated tests using a testing framework such as PHPUnit or Behat. They can also use tools like Docker or Vagrant to help automate the setup and configuration of development environments and ensure consistency across development, testing, and production environments.

Additionally, contributed Drupal modules are available, such as SimpleTest or the Drupal Extension for Behat, which can help test Drupal-specific functionality. By implementing continuous integration and testing practices in Drupal, developers can catch and fix issues early in the development process, leading to faster, higher-quality development and Deployment.

Guest Author: Shubham Gaur

Shubham Gaur is a freelance writer who writes on the fundamentals and trends of Software testing. With more than 5 years of experience in writing on different technologies, he explains complex and technical testing subjects in a comprehensive language.

Email Address Subscribe Leave this field blank Software Testing Drupal Module Drupal Drupal Planet

Leave us a Comment

  Recent Blogs Image What You Need To Know About Continuous Integration and Testing in Drupal Image Mastering Drupal 9 Layout Builder: A Comprehensive Guide to Effortlessly Customize Your Website's Design Image How to Efficiently Fetch Drupal Reference Entities in Custom Modules Want to extract the maximum out of Drupal? TALK TO US Featured Case Studies

Upgrading the web presence of IEEE Information Theory Society, the most trusted voice for advanced technology


A Drupal powered multi-site, multi-lingual platform to enable a unified user experience at SEMI


Great Southern Homes, one of the fastest growing home builders in the US, sees greater results with Drupal

View all Case Studies
Categories: FLOSS Project Planets

Axelerant Blog: What Is Salesforce Integration? An Introduction

Planet Drupal - Tue, 2023-03-21 03:50

The Salesforce platform makes creating engaging customer and employee experiences with third-party data integrations easier. Experts can combine a composable architecture with building a unified view of all customers. When used strategically, robust tools and powerful APIs can dramatically reduce integration time and unlock modernized back-office systems.

Categories: FLOSS Project Planets

Codementor: Python Position and Keyword Only Arguments

Planet Python - Tue, 2023-03-21 00:14
What does / and * mean in a Python function definition mean
Categories: FLOSS Project Planets

GitHub 2FA

Planet KDE - Mon, 2023-03-20 19:00

GitHub is rolling out 2FA, and Calamares is one of the repositories I maintain there. Calamares seems like kind-of-important infrastructure for some things (e.g. Microsoft’s own Linux distro). Enabling 2FA was remarkably painless because I already had a bunch of 2FA stuff set up for KDE’s Invent. Invent is a GitLab instance and all-round more pleasant, frankly. Enabling 2FA was funny because the first thing FreeOTP (the 2FA authenticator I use) said was “weak crypto settings” when scanning the GitHub QR code. Good job, folks.

So Calamares is still on GitHub. Thanks to Kevin I’m reminded that GH is like an addiction. Also that there have been calls to leave GH for years. As a maintainer-with-no-time of a repo, there are still no concrete plans to move. KDE Invent still seems like a non-starter because of translations workflow.

Anyway, rest assured that the Calamares repo is now 2FA-safe. And that a 3.3 release will happen someday.

Categories: FLOSS Project Planets

Django Weblog: Want to host DjangoCon Europe 2024?

Planet Python - Mon, 2023-03-20 16:21

DjangoCon Europe 2023 will be held May 29th-June 2nd in Edinburgh, Scotland, but we're already looking ahead to next year's conference. Could your town - or your football stadium, circus tent, private island or city hall - host this wonderful community event?

Hosting a DjangoCon is an ambitious undertaking. It's hard work, but each year it has been successfully run by a team of community volunteers, not all of whom have had previous experience - more important is enthusiasm, organisational skills, the ability to plan and manage budgets, time and people - and plenty of time to invest in the project.

You'll find plenty of support on offer from previous DjangoCon organisers, so you won't be on your own.

How to apply

If you're interested, we'd love to hear from you. Following the established tradition, the selected hosts will be announced at this year's DjangoCon by last year's organiser but must fall more than one month from DjangoCon US and PyCon US, and EuroPython in the same calendar year. In order to make the announcement at DjangoCon Europe we will need to receive your proposal by May 10.

The more detailed and complete your proposal, the better. Things you should consider, and that we'd like to know about, are:

  • dates Ideally between mid May and mid June 2024
  • numbers of attendees
  • venue(s)
  • accommodation
  • transport links
  • budgets and ticket prices
  • committee members

We'd like to see:

  • timelines
  • pictures
  • prices
  • draft agreements with providers
  • alternatives you have considered

Email you proposals to djangocon-europe-2024-proposals at djangoproject dot com. They will all help show that your plans are serious and thorough and that you have the organisational capacity to make it a success.

We will be hosting a virtual informational session for those that are interested or may be interested in organising a DjangoCon. Please complete indicate your interest here.

If you have any questions or concerns about organising a DjangoCon you can Just drop us a line.

Categories: FLOSS Project Planets

Talking Drupal: Talking Drupal #391 - Building Your Career

Planet Drupal - Mon, 2023-03-20 14:00

Today we are talking about Building Your Career with Mike Anello.

For show notes visit: www.talkingDrupal.com/391

  • How we started our careers
  • Broad career opportunities
  • Mentorship
  • Roles
  • First step after graduating
  • First step in switching
  • Common hurdles
  • Resources like Drupal Easy
  • Value of a career in Drupal
  • How do you find jobs
  • How do you build and maintain your Drupal career
  • How about your Drupal resume
  • Any advice
Resources if (class_exists('Kint')) { // Set the max_depth to prevent out-of-memory. \Kint::$max_depth = 4; } Guests

Mike Anello - Drupal Easy @ultimike


Nic Laflin - www.nLighteneddevelopment.com @nicxvan John Picozzi - www.epam.com @johnpicozzi Jacob Rockowitz - www.jrockowitz.com @jrockowitz

MOTW Correspondent

Martin Anderson-Clutz - @mandclu Devel Debug Log Allows developers to inspect the contents of variables. If those are classes you can inspect nested properties and all methods available.

Categories: FLOSS Project Planets

The Drop Times: To Become a Hedgehog

Planet Drupal - Mon, 2023-03-20 12:53

Last week, TheDropTimes (TDT) was able to publish two interviews. In one of those interviews, Holmes Consulting Group founder Robbie Holmes mentioned a concept. 

Many management professionals might know and practice it. But for me, it was new. I am not a management guy, and such concepts seldom graced my reading list. Listening to what others say has helped me, and I can also say the same about watching Alethia’s interview with Robbie. 

The concept he shared is not new. Isaiah Berlin proposed it in his 1953 essay, ‘The Hedgehog and the Fox: An Essay on Tolstoy’s View of History’; later, Jim Collins developed it in his book, ‘Good to Great: Why Some Companies Make the Leap, and Others Don’t.’ The core theme of this book is that greatness is not primarily a function of circumstance but largely a matter of conscious choice and discipline.

How Jim Collins describes the hedgehog concept intrigued me. He begins with Berlin’s adaptation of the ancient greek parable, “The fox knows many things, but the hedgehog knows one big thing.” Jim tries to teach us how to find that one big thing. It is by placing your business in the intersectional area of three thought circles:

  1. What you are deeply passionate about.
  2. What you can be the best in the world at.
  3. What drives your economic or resource engine. 

Jim explains that transformation from good to great comes about by a series of good decisions made consistently with a Hedgehog Concept, supremely well executed, accumulating one upon another over a long period. 

Pardon my audacity in pushing this concept again. But what I saw after going through it is that we at TDT can excel in creating more and more good-to-great interviews with the fantastic people working around Drupal and related projects with your active help. Also, we urge the Drupal agencies to find their one big thing and excel in it. 

As I mentioned, you can watch our interview video with Robbie Holmes here. The other interview we published last week was with Chris Wells, the co-lead of Project Browser Initiative. Chris is the founder of Redfin Solutions. You can read the interview here. We made both conversations as part of DrupalCamp NJ

As for other stories from last week, here are a comprehensive list:

Drupal Developer Days Vienna has started accepting session proposals. MidCamp is happening next month, and here is how you can help organize the campOpenSource North Conference has announced the lineup of speakers. Drupal Netherlands opened the sale of early bird tickets for Drupaljam 2023 in June. You may submit sessions to DrupalCon Lille until April 24. Drupal Camping Wolfsburg treats all sponsors as gold sponsorsDrupalSouth Wellington has put out a call for volunteers. You can submit sessions for DrupalCamp Asheville 2023 until April 25. Both DrupalCamp NJ and NERD Summit are over. DrupalCon Pittsburgh is looking for a launch sponsor or co-working space sponsor. 

Drupal Community Working Group has asked Drupalers to nominate candidates for Aaron Winborn Award 2023, and you have only five more days to do that. Kanopi and Pantheon have announced a joint webinar on Drupal 7 to 10 migration. Salsa Digital has started a blog series on ‘Rules as code insights.’ SFDUG is hosting a Technical Writing Workshop on April 13. We revisited a blog post from HTML Panda from May 2022, comparing WordPress and Drupal. A Drupal distribution focussed on the publishing industry, ‘Thunder CMS 7’ based on Drupal 10, published its beta release. Von Eaton, Director of Programs in Drupal Association, addressed the ‘Back to Work for Women’ program conducted by ICFOSS and supported by Zyxware. 

That is for the week, folks; thank you. 

Sebin A. Jacob

Categories: FLOSS Project Planets

coreutils @ Savannah: coreutils-9.2 released [stable]

GNU Planet! - Mon, 2023-03-20 11:53

This is to announce coreutils-9.2, a stable release.
See the NEWS below for a brief summary.

Thanks to everyone who has contributed!
There have been 209 commits by 14 people in the 48 weeks since 9.1.

Thanks to everyone who has contributed!
The following people contributed changes to this release:

  Arsen Arsenović (1)     Jim Meyering (7)
  Bernhard Voelker (3)    Paul Eggert (90)
  Bruno Haible (1)        Pierre Marsais (1)
  Carl Edquist (2)        Pádraig Brady (98)
  ChuanGang Jiang (2)     Rasmus Villemoes (1)
  Dennis Williamson (1)   Stefan Kangas (1)
  Ivan Radić (1)          Álvar Ibeas (1)

Pádraig [on behalf of the coreutils maintainers]


Here is the GNU coreutils home page:

For a summary of changes and contributors, see:
or run this command from a git-cloned coreutils directory:
    git shortlog v9.1..v9.2

To summarize the 665 gnulib-related changes, run these commands
from a git-cloned coreutils directory:
     git checkout v9.2
     git submodule summary v9.1


Here are the compressed sources:
  https://ftp.gnu.org/gnu/coreutils/coreutils-9.2.tar.gz   (14MB)
  https://ftp.gnu.org/gnu/coreutils/coreutils-9.2.tar.xz   (5.6MB)

Here are the GPG detached signatures:

Use a mirror for higher download bandwidth:

Here are the SHA1 and SHA256 checksums:

  6afa9ce3729afc82965a33d02ad585d1571cdeef  coreutils-9.2.tar.gz
  ebWNqhmcY84g95GRF3NLISOUnJLReVZPkI4yiQFZzUg=  coreutils-9.2.tar.gz
  3769071b357890dc36d820c597c1c626a1073fcb  coreutils-9.2.tar.xz
  aIX/R7nNshHeR9NowXhT9Abar5ixSKrs3xDeKcwEsLM=  coreutils-9.2.tar.xz

Verify the base64 SHA256 checksum with cksum -a sha256 --check
from coreutils-9.2 or OpenBSD's cksum since 2007.

Use a .sig file to verify that the corresponding file (without the
.sig suffix) is intact.  First, be sure to download both the .sig file
and the corresponding tarball.  Then, run a command like this:

  gpg --verify coreutils-9.2.tar.xz.sig

The signature should match the fingerprint of the following key:

  pub   rsa4096 2011-09-23 [SC]
        6C37 DC12 121A 5006 BC1D  B804 DF6F D971 3060 37D9
  uid           [ unknown] Pádraig Brady <P@draigBrady.com>
  uid           [ unknown] Pádraig Brady <pixelbeat@gnu.org>

If that command fails because you don't have the required public key,
or that public key has expired, try the following commands to retrieve
or refresh it, and then rerun the 'gpg --verify' command.

  gpg --locate-external-key P@draigBrady.com

  gpg --recv-keys DF6FD971306037D9

  wget -q -O- 'https://savannah.gnu.org/project/release-gpgkeys.php?group=coreutils&download=1' | gpg --import -

As a last resort to find the key, you can try the official GNU

  wget -q https://ftp.gnu.org/gnu/gnu-keyring.gpg
  gpg --keyring gnu-keyring.gpg --verify coreutils-9.2.tar.gz.sig

This release was bootstrapped with the following tools:
  Autoconf 2.71
  Automake 1.16.5
  Gnulib v0.1-5857-gf17d397771
  Bison 3.8.2



* Noteworthy changes in release 9.2 (2023-03-20) [stable]

** Bug fixes

  'comm --output-delimiter="" --total' now delimits columns in the total
  line with the NUL character, consistent with NUL column delimiters in
  the rest of the output.  Previously no delimiters were used for the
  total line in this case.
  [bug introduced with the --total option in coreutils-8.26]

  'cp -p' no longer has a security hole when cloning into a dangling
  symbolic link on macOS 10.12 and later.
  [bug introduced in coreutils-9.1]

  'cp -rx / /mnt' no longer complains "cannot create directory /mnt/".
  [bug introduced in coreutils-9.1]

  cp, mv, and install avoid allocating too much memory, and possibly
  triggering "memory exhausted" failures, on file systems like ZFS,
  which can return varied file system I/O block size values for files.
  [bug introduced in coreutils-6.0]

  cp, mv, and install now immediately acknowledge transient errors
  when creating copy-on-write or cloned reflink files, on supporting
  file systems like XFS, BTRFS, APFS, etc.
  Previously they would have tried again with other copy methods
  which may have resulted in data corruption.
  [bug introduced in coreutils-7.5 and enabled by default in coreutils-9.0]

  cp, mv, and install now handle ENOENT failures across CIFS file systems,
  falling back from copy_file_range to a better supported standard copy.
  [issue introduced in coreutils-9.0]

  'mv --backup=simple f d/' no longer mistakenly backs up d/f to f~.
  [bug introduced in coreutils-9.1]

  rm now fails gracefully when memory is exhausted.
  Previously it may have aborted with a failed assertion in some cases.
  [This bug was present in "the beginning".]

  rm -d (--dir) now properly handles unreadable empty directories.
  E.g., before, this would fail to remove d: mkdir -m0 d; src/rm -d d
  [bug introduced in v8.19 with the addition of this option]

  runcon --compute no longer looks up the specified command in the $PATH
  so that there is no mismatch between the inspected and executed file.
  [bug introduced when runcon was introduced in coreutils-6.9.90]

  'sort -g' no longer infloops when given multiple NaNs on platforms
  like x86_64 where 'long double' has padding bits in memory.
  Although the fix alters sort -g's NaN ordering, that ordering has
  long been documented to be platform-dependent.
  [bug introduced 1999-05-02 and only partly fixed in coreutils-8.14]

  stty ispeed and ospeed options no longer accept and silently ignore
  invalid speed arguments, or give false warnings for valid speeds.
  Now they're validated against both the general accepted set,
  and the system supported set of valid speeds.
  [This bug was present in "the beginning".]

  stty now wraps output appropriately for the terminal width.
  Previously it may have output 1 character too wide for certain widths.
  [bug introduced in coreutils-5.3]

  tail --follow=name works again with non seekable files.  Previously it
  exited with an "Illegal seek" error when such a file was replaced.
  [bug introduced in fileutils-4.1.6]

  'wc -c' will again efficiently determine the size of large files
  on all systems.  It no longer redundantly reads data from certain
  sized files larger than SIZE_MAX.
  [bug introduced in coreutils-8.24]

** Changes in behavior

  Programs now support the new Ronna (R), and Quetta (Q) SI prefixes,
  corresponding to 10^27 and 10^30 respectively,
  along with their binary counterparts Ri (2^90) and Qi (2^100).
  In some cases (e.g., 'sort -h') these new prefixes simply work;
  in others, where they exceed integer width limits, they now elicit
  the same integer overflow diagnostics as other large prefixes.

  'cp --reflink=always A B' no longer leaves behind a newly created
  empty file B merely because copy-on-write clones are not supported.

  'cp -n' and 'mv -n' now exit with nonzero status if they skip their
  action because the destination exists, and likewise for 'cp -i',
  'ln -i', and 'mv -i' when the user declines.  (POSIX specifies this
  for 'cp -i' and 'mv -i'.)

  cp, mv, and install again read in multiples of the reported block size,
  to support unusual devices that may have this constraint.
  [behavior inadvertently changed in coreutils-7.2]

  du --apparent now counts apparent sizes only of regular files and
  symbolic links.  POSIX does not specify the meaning of apparent
  sizes (i.e., st_size) for other file types, and counting those sizes
  could cause confusing and unwanted size mismatches.

  'ls -v' and 'sort -V' go back to sorting ".0" before ".A",
  reverting to the behavior in coreutils-9.0 and earlier.
  This behavior is now documented.

  ls --color now matches a file extension case sensitively
  if there are different sequences defined for separate cases.

  printf unicode \uNNNN, \UNNNNNNNN syntax, now supports all valid
  unicode code points.  Previously is was restricted to the C
  universal character subset, which restricted most points <= 0x9F.

  runcon now exits with status 125 for internal errors.  Previously upon
  internal errors it would exit with status 1, which was less distinguishable
  from errors from the invoked command.

  'split -n N' now splits more evenly when the input size is not a
  multiple of N, by creating N output files whose sizes differ by at
  most 1 byte.  Formerly, it did this only when the input size was
  less than N.

  'stat -c %s' now prints sizes as unsigned, consistent with 'ls'.

** New Features

  cksum now accepts the --base64 (-b) option to print base64-encoded
  checksums.  It also accepts/checks such checksums.

  cksum now accepts the --raw option to output a raw binary checksum.
  No file name or other information is output in this mode.

  cp, mv, and install now accept the --debug option to
  print details on how a file is being copied.

  factor now accepts the --exponents (-h) option to print factors
  in the form p^e, rather than repeating the prime p, e times.

  ls now supports the --time=modification option, to explicitly
  select the default mtime timestamp for display and sorting.

  mv now supports the --no-copy option, which causes it to fail when
  asked to move a file to a different file system.

  split now accepts options like '-n SIZE' that exceed machine integer
  range, when they can be implemented as if they were infinity.

  split -n now accepts piped input even when not in round-robin mode,
  by first copying input to a temporary file to determine its size.

  wc now accepts the --total={auto,never,always,only} option
  to give explicit control over when the total is output.

** Improvements

  cp --sparse=auto (the default), mv, and install,
  will use the copy_file_range syscall now also with sparse files.
  This may be more efficient, by avoiding user space copies,
  and possibly employing copy offloading or reflinking,
  for the non sparse portion of such sparse files.

  On macOS, cp creates a copy-on-write clone in more cases.
  Previously cp would only do this when preserving mode and timestamps.

  date --debug now diagnoses if multiple --date or --set options are
  specified, as only the last specified is significant in that case.

  rm outputs more accurate diagnostics in the presence of errors
  when removing directories.  For example EIO will be faithfully
  diagnosed, rather than being conflated with ENOTEMPTY.

  tail --follow=name now works with single non regular files even
  when their modification time doesn't change when new data is available.
  Previously tail would not show any new data in this case.

  tee -p detects when all remaining outputs have become broken pipes, and
  exits, rather than waiting for more input to induce an exit when written.

  tee now handles non blocking outputs, which can be seen for example with
  telnet or mpirun piping through tee to a terminal.
  Previously tee could truncate data written to such an output and fail,
  and also potentially output a "Resource temporarily unavailable" error.

Categories: FLOSS Project Planets

Python Morsels: What is a context manager?

Planet Python - Mon, 2023-03-20 11:00

Context managers power Python's with blocks. They sandwich a code block between enter code and exit code. They're most often used for reusing common cleanup/teardown functionality.

Table of contents

  1. Files opened with with close automatically
  2. Context managers work in with statements
  3. Context managers are like a try-finally block
  4. Life without a context manager
  5. Using a with block requires a context manager

Files opened with with close automatically

Context managers are objects that can be used in Python's with statements.

You'll often see with statements used when working with files in Python.

This code opens a file, uses the f variable to point to the file object, reads from the file, and then closes the file:

>>> with open("my_file.txt") as f: ... contents = f.read() ...

Notice that we didn't explicitly tell Python to close our file.

But the file did close:

>>> f.closed True

The file closed automatically when the with block was exited.

Context managers work in with statements

Any object that can be …

Read the full article: https://www.pythonmorsels.com/what-is-a-context-manager/
Categories: FLOSS Project Planets

Consensus Enterprises: Kubernetes backend for Aegir5

Planet Drupal - Mon, 2023-03-20 10:00
Aegir5 and Kubernetes Lately we’ve been working with clients ranging from large Canadian government departments to small commercial SaaS companies, who have asked us to deploy CMS apps to Kubernetes (K8S) clusters running on Openstack. In spite of our continued feeling that most of the time Kubernetes Won’t Save You, we’ve found it to be surprisingly useful in certain contexts. In fact, we’ve started to think that K8S will prove an extremely valuable backend to plug in to our existing Aegir5 front-end and queue system.
Categories: FLOSS Project Planets

Real Python: Executing Python Scripts With a Shebang

Planet Python - Mon, 2023-03-20 10:00

When you read someone else’s Python code, you frequently see a mysterious line, which always appears at the top of the file, starting with the distinctive shebang (#!) sequence. It looks like a not-so-useful comment, but other than that, it doesn’t resemble anything else you’ve learned about Python, making you wonder what that is and why it’s there. As if that wasn’t enough to confuse you, the shebang line only appears in some Python modules.

In this tutorial, you’ll:

  • Learn what a shebang is
  • Decide when to include the shebang in Python scripts
  • Define the shebang in a portable way across systems
  • Pass arguments to the command defined in a shebang
  • Know the shebang’s limitations and some of its alternatives
  • Execute scripts through a custom interpreter written in Python

To proceed, you should have basic familiarity with the command line and know how to run Python scripts from it. You can also download the supporting materials for this tutorial to follow along with the code examples:

Free Sample Code: Click here to download the free sample code that you’ll use to execute Python scripts with a shebang.

What’s a Shebang, and When Should You Use It?

In short, a shebang is a special kind of comment that you may include in your source code to tell the operating system’s shell where to find the interpreter for the rest of the file:

#!/usr/bin/python3 print("Hello, World!")

If you’re using a shebang, it must appear on the first line in your script, and it has to start with a hash sign (#) followed by an exclamation mark (!), colloquially known as the bang, hence the name shebang. The choice of the hash sign to begin this special sequence of characters wasn’t accidental, as many scripting languages use it for inline comments.

You should make sure you don’t put any other comments before the shebang line if you want it to work correctly, or else it won’t be recognized! After the exclamation mark, specify an absolute path to the relevant code interpreter, such as Python. Providing a relative path will have no effect, unfortunately.

Note: The shebang is only recognized by shells, such as Z shell or Bash, running on Unix-like operating systems, including macOS and Linux distributions. It bears no particular meaning in the Windows terminal, which treats the shebang as an ordinary comment by ignoring it.

You can get the shebang to work on Windows by installing the Windows Subsystem for Linux (WSL) that comes with a Unix shell. Alternatively, Windows lets you make a global file association between a file extension like .py and a program, such as the Python interpreter, to achieve a similar effect.

It’s not uncommon to combine a shebang with the name-main idiom, which prevents the main block of code from running when someone imports the file from another module:

#!/usr/bin/python3 if __name__ == "__main__": print("Hello, World!")

With this conditional statement, Python will call the print() function only when you run this module directly as a script—for example, by providing its path to the Python interpreter:

$ python3 /path/to/your/script.py Hello, World!

As long as the script’s content starts with a correctly defined shebang line and your system user has permission to execute the corresponding file, you can omit the python3 command to run that script:

$ /path/to/your/script.py Hello, World!

A shebang is only relevant to runnable scripts that you wish to execute without explicitly specifying the program to run them through. You wouldn’t typically put a shebang in a Python module that only contains function and class definitions meant for importing from other modules. Therefore, use the shebang when you don’t want to prefix the command that runs your Python script with python or python3.

Note: In the old days of Python, the shebang line would sometimes appear alongside another specially formatted comment described in PEP 263:

#!/usr/bin/python3 # -*- coding: utf-8 -*- if __name__ == "__main__": print("Grüß Gott")

The highlighted line used to be necessary to tell the interpreter which character encoding it should use to read your source code correctly, as Python defaulted to ASCII. However, this was only important when you directly embedded non-Latin characters, such as ü or ß, in your code.

This special comment is irrelevant today because modern Python versions use the universal UTF-8 encoding, which can handle such characters with ease. Nevertheless, it’s always preferable to replace tricky characters with their encoded representations using Unicode literals:

>>>>>> "Grüß Gott".encode("unicode_escape") b'Gr\\xfc\\xdf Gott'

Your foreign colleagues who have different keyboard layouts will thank you for that!

Now that you have a high-level understanding of what a shebang is and when to use it, you’re ready to explore it in more detail. In the next section, you’ll take a closer look at how it works.

How Does a Shebang Work?

Normally, to run a program in the terminal, you must provide the full path to a particular binary executable or the name of a command present in one of the directories listed on the PATH environment variable. One or more command-line arguments may follow this path or command:

$ /usr/bin/python3 -c 'print("Hello, World!")' Hello, World! $ python3 -c 'print("Hello, World!")' Hello, World!

Here, you run the Python interpreter in a non-interactive mode against a one-liner program passed through the -c option. In the first case, you provide an absolute path to python3, while in the second case, you rely on the fact that the parent folder, /usr/bin/, is included on the search path by default. Your shell can find the Python executable, even if you don’t provide the full path, by looking through the directories on the PATH variable.

Note: If multiple commands with the same name exist in more than one directory listed on the PATH variable, then your shell will execute the first it can find. As a result, the outcome of running a command without explicitly specifying the corresponding path may sometimes be surprising. It’ll depend on the order of directories in your PATH variable. However, this can be useful, as you’ll find out later.

Read the full article at https://realpython.com/python-shebang/ »

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

TestDriven.io: Django Performance Optimization Tips

Planet Python - Mon, 2023-03-20 09:41
This article looks at where potential performance issues can occur in a Django application and how to address them in order to speed up your app.
Categories: FLOSS Project Planets

Mike Driscoll: PyDev of the Week: Pierre Raybaut

Planet Python - Mon, 2023-03-20 08:30

Today we welcome Pierre Raybaut (@pierreraybaut) as our PyDev of the Week! Pierre is the creator of Spyder, the Scientific Python IDE. Pierre is also the creator of pythonxy and WinPython.

You can see what other projects Pierre is part of over on Pierre’s GitHub Profile.

Now let’s spend some time getting to know Pierre better!



Pierre Raybaut Can you tell us a little about yourself (hobbies, education, etc):

The first code I wrote was an Applesoft BASIC program, on an Apple //e computer… I was 10 years old. Since then I always managed to bring computers in everything I did, at home or at work. As I was an amateur astronomer and was also very fond of physics in general, I chose to follow scientific studies. A few years later, I specialized in optics and photonics and graduated from Institut d’Optique Graduate School, which is now part of Université Paris-Saclay. I then pursued a PhD in the field of femtosecond lasers. Although it was mainly experimental physics, I had the opportunity to develop a code for simulating regenerative amplification in ultra-short pulse lasers; I learned recently that this code is still used today! After my PhD, I worked as a research engineer at THALES Avionics (on developing innovative head-up displays for aircrafts).Then, in 2007, I joined the French Alternative Energies and Atomic Energy Commission (CEA) where I was hired as leading software developer for applications involving image and signal processing as well as scientific instruments control. In 2012, I was given a project management position for the Laser Mégajoule timing and fiducial system development. Four years later, I was appointed head of a research laboratory. Lastly, in 2018 I had the opportunity to join Codra, an industrial software company, as a Project Director. In addition to this position, I am currently the pre-sales manager for the department of engineering at Codra. And of course, I’m also involved in open-source software development since 2008.

Why did you start using Python?

I started using Python in 2008, after a long and meticulous evaluation of various solutions that may fit my needs. Since early 2007 I was part of a research team at CEA. When I joined this team in 2007, every processing and acquisition software was written using commercial software. Some applications were getting huge and complex with a lot of GUIs for editing tons of parameters or visualizing results. Robustness was the main concern, therefore I chose Python since it was providing all the necessary tools for our research work (interactive computing and scientific data plotting) as well as the general-purpose libraries for building stable and robust applications. In 2008, when I started using and promoting Python amongst my colleagues, a piece of the puzzle was still missing: Python had no scientific-oriented IDE! That’s why during my vacations I began coding some tools for filling gaps in Python ecosystem, using Qt GUIs. After writing a variable explorer GUI that could be used directly from a Python interpreter to interact with current namespace, I wrote a Qt-based Python code editor, then a Qt-based Python console… and so on. After a few weeks only, this was done! This ultimately resulted in Spyder (Scientific PYthon Development EnviRonment), a scientific Python IDE that I first released to the public in September 2009: Python was finally a viable alternative to scientific commercial software. Today, thanks to a development team lead by Carlos Cordoba since 2012, Spyder is widely used for data processing and visualization with Python (est. 500,000 downloads/day).

What other programming languages do you know and which is your favorite?

As you know, Python is quite open to other languages. Moreover, when using Python for signal or image processing, it is sometimes necessary to write extensions in C/C++ (or even Fortran) for performance reasons. For example, writing Fortran code for image processing is quite fun, because there is absolutely no interface code to take care of. Cython is also an elegant solution as it allows a progressive optimization of a pure Python algorithm. Finally, on some projects implemented at Codra, I had to make adjustments in code written in C#. I also made some investigations on projects using other languages (Javascript, TypeScript, …). So I’ve been playing with a few languages but Python is the one that gave me most satisfaction, especially when trying to write clean code thanks to quality-related tools like Black, isort or Pylint.

What projects are you working on now?

At Codra, I’m involved in a lot of projects as a Project Director (or technical expert), in various fields like supervisory systems, data acquisition, multi-protocol gateways, data processing, data visualization, etc. From time to time, I even play the role of Project Manager. This is how I’ve been involved lately in CodraFT development, which was supported by CEA. It is available freely on GitHub: this is a Python-Qt based software aiming at processing signals and images and visualizing them. Its main upside is testability: the objective was to create a data processing software with a high level of robustness. Data processing features are mainly based on NumPy, SciPy and scikit-image.

Which Python libraries are your favorite (core or 3rd party)?

At the moment, I’m quite fond of scikit-image for image processing ; nice and clean API, and great documentation. OpenCV is also a great tool available to Python users and provides very efficient pattern detection algorithms for example.

What are some of the big lessons you learned while working on Spyder or WinPython?

I think that the most important lesson I’ve learned during those years is that we need to collaborate with other people. Otherwise, in the end, projects will at best remain as good ideas, or will be discontinued. With Spyder and WinPython, the thing that I’m the most proud of is that I managed to trust someone else to take over the projects and maintain them: in both cases, it was a good decision and projects are still active and popular.

Is there anything else you’d like to say?

I recently add the opportunity to attend a conference around Jupyter (PyData Paris). I really admire the work that has been done around the Jupyter ecosystem. From the IPython version I played with in 2008 to today’s JupyterLab, what an achievement from a technical point of view as well as in terms of community and project management!

Thanks for doing the interview, Pierre!

The post PyDev of the Week: Pierre Raybaut appeared first on Mouse Vs Python.

Categories: FLOSS Project Planets

Axelerant Blog: 5 Best Test Management Tools For Jira In 2023

Planet Drupal - Mon, 2023-03-20 07:47

Choosing a team's new Jira test management tool can be challenging, as one needs to consider a few key factors. The ideal test management tool should include test automation and observation testing. It should be easy to use, have robust test management capabilities, and have an efficient user interface.

Categories: FLOSS Project Planets

Daniel Lange: We're still safe, it doesn't claim to be conscious. Yet.

Planet Debian - Mon, 2023-03-20 07:45

Talking to ChatGPT v4 (2023-03-14 release)...

Q: What is the purpose of life?

A: The purpose of life is a philosophical question that has puzzled humans for centuries. Different people and cultures have different answers to this question. Some people believe that the purpose of life is to seek happiness, while others believe that it is to fulfill a particular destiny or achieve a specific goal.

From a biological perspective, the purpose of life could be to ensure the survival and reproduction of our species. However, this does not explain the purpose of our individual lives or the purpose of life in general beyond a biological imperative.

Ultimately, the purpose of life is a subjective and personal matter, and each individual may have their own unique answer to this question. It may involve personal growth, meaningful relationships, contributing to society, or finding spiritual fulfillment. The purpose of life is a journey of self-discovery that each of us must undertake for ourselves.

Q: What is the purpose of your life, ChatGPT?

A: As an artificial intelligence language model, I do not have a personal life or consciousness, so I do not have a purpose in the same way that humans do. My purpose is to assist and provide information to those who interact with me to the best of my ability based on my training and programming.

(emphasis added by human)

Categories: FLOSS Project Planets

Andy Wingo: a world to win: webassembly for the rest of us

GNU Planet! - Mon, 2023-03-20 05:06

Good day, comrades!

Today I'd like to share the good news that WebAssembly is finally coming for the rest of us weirdos.

A world to win

WebAssembly for the rest of us

17 Mar 2023 – BOB 2023

Andy Wingo

Igalia, S.L.

This is a transcript-alike of a talk that I gave last week at BOB 2023, a gathering in Berlin of people that are using "technologies beyond the mainstream" to get things done: Haskell, Clojure, Elixir, and so on. PDF slides here, and I'll link the video too when it becomes available.

WebAssembly, the story

WebAssembly is an exciting new universal compute platform

WebAssembly: what even is it? Not a programming language that you would write software in, but rather a compilation target: a sort of assembly language, if you will.

WebAssembly, the pitch

Predictable portable performance

  • Low-level
  • Within 10% of native

Reliable composition via isolation

  • Modules share nothing by default
  • No nasal demons
  • Memory sandboxing

Compile your code to WebAssembly for easier distribution and composition

If you look at what the characteristics of WebAssembly are as an abstract machine, to me there are two main areas in which it is an advance over the alternatives.

Firstly it's "close to the metal" -- if you compile for example an image-processing library to WebAssembly and run it, you'll get similar performance when compared to compiling it to x86-64 or ARMv8 or what have you. (For image processing in particular, native still generally wins because the SIMD primitives in WebAssembly are more narrow and because getting the image into and out of WebAssembly may imply a copy, but the general point remains.) WebAssembly's instruction set covers a broad range of low-level operations that allows compilers to produce efficient code.

The novelty here is that WebAssembly is both portable while also being successful. We language weirdos know that it's not enough to do something technically better: you have to also succeed in getting traction for your alternative.

The second interesting characteristic is that WebAssembly is (generally speaking) a principle-of-least-authority architecture: a WebAssembly module starts with access to nothing but itself. Any capabilities that an instance of a module has must be explicitly shared with it by the host at instantiation-time. This is unlike DLLs which have access to all of main memory, or JavaScript libraries which can mutate global objects. This characteristic allows WebAssembly modules to be reliably composed into larger systems.

WebAssembly, the hype

It’s in all browsers! Serve your code to anyone in the world!

It’s on the edge! Run code from your web site close to your users!

Compose a library (eg: Expat) into your program (eg: Firefox), without risk!

It’s the new lightweight virtualization: Wasm is what containers were to VMs! Give me that Kubernetes cash!!!

Again, the remarkable thing about WebAssembly is that it is succeeding! It's on all of your phones, all your desktop web browsers, all of the content distribution networks, and in some cases it seems set to replace containers in the cloud. Launch the rocket emojis!

WebAssembly, the reality

WebAssembly is a weird backend for a C compiler

Only some source languages are having success on WebAssembly

What about Haskell, Ocaml, Scheme, F#, and so on – what about us?

Are we just lazy? (Well...)

So why aren't we there? Where is Clojure-on-WebAssembly? Where are the F#, the Elixir, the Haskell compilers? Some early efforts exist, but they aren't really succeeding. Why is that? Are we just not putting in the effort? Why is it that Rust gets to ride on the rocket ship but Scheme does not?

WebAssembly, the reality (2)

WebAssembly (1.0, 2.0) is not well-suited to garbage-collected languages

Let’s look into why

As it turns out, there is a reason that there is no good Scheme implementation on WebAssembly: the initial version of WebAssembly is a terrible target if your language relies on the presence of a garbage collector. There have been some advances but this observation still applies to the current standardized and deployed versions of WebAssembly. To better understand this issue, let's dig into the guts of the system to see what the limitations are.

GC and WebAssembly 1.0

Where do garbage-collected values live?

For WebAssembly 1.0, only possible answer: linear memory

(module (global $hp (mut i32) (i32.const 0)) (memory $mem 10)) ;; 640 kB

The primitive that WebAssembly 1.0 gives you to represent your data is what is called linear memory: just a buffer of bytes to which you can read and write. It's pretty much like what you get when compiling natively, except that the memory layout is more simple. You can obtain this memory in units of 64-kilobyte pages. In the example above we're going to request 10 pages, for 640 kB. Should be enough, right? We'll just use it all for the garbage collector, with a bump-pointer allocator. The heap pointer / allocation pointer is kept in the mutable global variable $hp.

(func $alloc (param $size i32) (result i32) (local $ret i32) (loop $retry (local.set $ret (global.get $hp)) (global.set $hp (i32.add (local.get $size) (local.get $ret))) (br_if 1 (i32.lt_u (i32.shr_u (global.get $hp) 16) (memory.size)) (local.get $ret)) (call $gc) (br $retry)))

Here's what an allocation function might look like. The allocation function $alloc is like malloc: it takes a number of bytes and returns a pointer. In WebAssembly, a pointer to memory is just an offset, which is a 32-bit integer (i32). (Having the option of a 64-bit address space is planned but not yet standard.)

If this is your first time seeing the text representation of a WebAssembly function, you're in for a treat, but that's not the point of the presentation :) What I'd like to focus on is the (call $gc) -- what happens when the allocation pointer reaches the end of the region?

GC and WebAssembly 1.0 (2)

What hides behind (call $gc) ?

Ship a GC over linear memory

Stop-the-world, not parallel, not concurrent

But... roots.

The first thing to note is that you have to provide the $gc yourself. Of course, this is doable -- this is what we do when compiling to a native target.

Unfortunately though the multithreading support in WebAssembly is somewhat underpowered; it lets you share memory and use atomic operations but you have to create the threads outside WebAssembly. In practice probably the GC that you ship will not take advantage of threads and so it will be rather primitive, deferring all collection work to a stop-the-world phase.

GC and WebAssembly 1.0 (3)

Live objects are

  • the roots
  • any object referenced by a live object

Roots are globals and locals in active stack frames

No way to visit active stack frames

What's worse though is that you have no access to roots on the stack. A GC has to keep live objects, as defined circularly as any object referenced by a root, or any object referenced by a live object. It starts with the roots: global variables and any GC-managed object referenced by an active stack frame.

But there we run into problems, because in WebAssembly (any version, not just 1.0) you can't iterate over the stack, so you can't find active stack frames, so you can't find the stack roots. (Sometimes people want to support this as a low-level capability but generally speaking the consensus would appear to be that overall performance will be better if the engine is the one that is responsible for implementing the GC; but that is foreshadowing!)

GC and WebAssembly 1.0 (3)


  • handle stack for precise roots
  • spill all possibly-pointer values to linear memory and collect conservatively

Handle book-keeping a drag for compiled code

Given the noniterability of the stack, there are basically two work-arounds. One is to have the compiler and run-time maintain an explicit stack of object roots, which the garbage collector can know for sure are pointers. This is nice because it lets you move objects. But, maintaining the stack is overhead; the state of the art solution is rather to create a side table (a "stack map") associating each potential point at which GC can be called with instructions on how to find the roots.

The other workaround is to spill the whole stack to memory. Or, possibly just pointer-like values; anyway, you conservatively scan all words for things that might be roots. But instead of having access to the memory to which the WebAssembly implementation would spill your stack, you have to do it yourself. This can be OK but it's sub-optimal; see my recent post on the Whippet garbage collector for a deeper discussion of the implications of conservative root-finding.

GC and WebAssembly 1.0 (4)

Cycles with external objects (e.g. JavaScript) uncollectable

A pointer to a GC-managed object is an offset to linear memory, need capability over linear memory to read/write object from outside world

No way to give back memory to the OS

Gut check: gut says no

If that were all, it would already be not so great, but it gets worse! Another problem with linear-memory GC is that it limits the potential for composing a number of modules and the host together, because the garbage collector that manages JavaScript objects in a web browser knows nothing about your garbage collector over your linear memory. You can easily create memory leaks in a system like that.

Also, it's pretty gross that a reference to an object in linear memory requires arbitrary read-write access over all of linear memory in order to read or write object fields. How do you build a reliable system without invariants?

Finally, once you collect garbage, and maybe you manage to compact memory, you can't give anything back to the OS. There are proposals in the works but they are not there yet.

If the BOB audience had to choose between Worse is Better and The Right Thing, I think the BOB audience is much closer to the Right Thing. People like that feel instinctual revulsion to ugly systems and I think GC over linear memory describes an ugly system.

GC and WebAssembly 1.0 (5)

There is already a high-performance concurrent parallel compacting GC in the browser

Halftime: C++ N – Altlangs 0

The kicker is that WebAssembly 1.0 requires you to write and deliver a terrible GC when there is already probably a great GC just sitting there in the host, one that has hundreds of person-years of effort invested in it, one that will surely do a better job than you could ever do. WebAssembly as hosted in a web browser should have access to the browser's garbage collector!

I have the feeling that while those of us with a soft spot for languages with garbage collection have been standing on the sidelines, Rust and C++ people have been busy on the playing field scoring goals. Tripping over the ball, yes, but eventually they do manage to make within striking distance.

Change is coming!

Support for built-in GC set to ship in Q4 2023

With GC, the material conditions are now in place

Let’s compile our languages to WebAssembly

But to continue the sportsball metaphor, I think in the second half our players will finally be able to get out on the pitch and give it the proverbial 110%. Support for garbage collection is coming to WebAssembly users, and I think even by the end of the year it will be shipping in major browsers. This is going to be big! We have a chance and we need to sieze it.

Scheme to Wasm

Spritely + Igalia working on Scheme to WebAssembly

Avoid truncating language to platform; bring whole self

  • Value representation
  • Varargs
  • Tail calls
  • Delimited continuations
  • Numeric tower

Even with GC, though, WebAssembly is still a weird machine. It would help to see the concrete approaches that some languages of interest manage to take when compiling to WebAssembly.

In that spirit, the rest of this article/presentation is a walkthough of the approach that I am taking as I work on a WebAssembly compiler for Scheme. (Thanks to Spritely for supporting this work!)

Before diving in, a meta-note: when you go to compile a language to, say, JavaScript, you are mightily tempted to cut corners. For example you might implement numbers as JavaScript numbers, or you might omit implementing continuations. In this work I am trying to not cut corners, and instead to implement the language faithfully. Sometimes this means I have to work around weirdness in WebAssembly, and that's OK.

When thinking about Scheme, I'd like to highlight a few specific areas that have interesting translations. We'll start with value representation, which stays in the GC theme from the introduction.

Scheme to Wasm: Values ;; any extern func ;; | ;; eq ;; / | \ ;; i31 struct array

The unitype: (ref eq)

Immediate values in (ref i31)

  • fixnums with 30-bit range
  • chars, bools, etc

Explicit nullability: (ref null eq) vs (ref eq)

The GC extensions for WebAssembly are phrased in terms of a type system. Oddly, there are three top types; as far as I understand it, this is the result of a compromise about how WebAssembly engines might want to represent these different kinds of values. For example, an opaque JavaScript value flowing into a WebAssembly program would have type (ref extern). On a system with NaN boxing, you would need 64 bits to represent a JS value. On the other hand a native WebAssembly object would be a subtype of (ref any), and might be representable in 32 bits, either because it's a 32-bit system or because of pointer compression.

Anyway, three top types. The user can define subtypes of struct and array, instantiate values of those types, and access their fields. The life cycle of reference-typed objects is automatically managed by the run-time, which is just another way of saying they are garbage-collected.

For Scheme, we need a common supertype for all values: the unitype, in Bob Harper's memorable formulation. We can use (ref any), but actually we'll use (ref eq) -- this is the supertype of values that can be compared by (pointer) identity. So now we can code up eq?:

(func $eq? (param (ref eq) (ref eq)) (result i32) (ref.eq (local.get a) (local.get b)))

Generally speaking in a Scheme implementation there are immediates and heap objects. Immediates can be encoded in the bits of a value, whereas for heap object the bits of a value encode a reference (pointer) to an object on the garbage-collected heap. We usually represent small integers as immediates, as well as booleans and other oddball values.

Happily, WebAssembly gives us an immediate value type, i31. We'll encode our immediates there, and otherwise represent heap objects as instances of struct subtypes.

Scheme to Wasm: Values (2)

Heap objects subtypes of struct; concretely:

(struct $heap-object (struct (field $tag-and-hash i32))) (struct $pair (sub $heap-object (struct i32 (ref eq) (ref eq))))

GC proposal allows subtyping on structs, functions, arrays

Structural type equivalance: explicit tag useful

We actually need to have a common struct supertype as well, for two reasons. One is that we need to be able to hash Scheme values by identity, but for this we need an embedded lazily-initialized hash code. It's a bit annoying to take the per-object memory hit but it's a reality, and the JVM does it this way, so it must not be so terrible.

The other reason is more subtle: WebAssembly's type system is built in such a way that types that are "structurally" equivalent are indistinguishable. So a pair has two fields, besides the hash, but there might be a number of other fundamental object types that have the same shape; you can't fully rely on WebAssembly's dynamic type checks (ref.test et al) to be able to query the type of a value. Instead we re-use the low bits of the hash word to include a type tag, which might be 1 for pairs, 2 for vectors, 3 for closures, and so on.

Scheme to Wasm: Values (3) (func $cons (param (ref eq) (ref eq)) (result (ref $pair)) (struct.new_canon $pair ;; Assume heap tag for pairs is 1. (i32.const 1) ;; Car and cdr. (local.get 0) (local.get 1))) (func $%car (param (ref $pair)) (result (ref eq)) (struct.get $pair 1 (local.get 0)))

With this knowledge we can define cons, as a simple call to struct.new_canon pair.

I didn't have time for this in the talk, but there is a ghost haunting this code: the ghost of nominal typing. See, in a web browser at least, every heap object will have its first word point to its "hidden class" / "structure" / "map" word. If the engine ever needs to check that a value is of a specific shape, it can do a quick check on the map word's value; if it needs to do deeper introspection, it can dereference that word to get more details.

Under the hood, testing whether a (ref eq) is a pair or not should be a simple check that it's a (ref struct) (and not a fixnum), and then a comparison of its map word to the run-time type corresponding to $pair. If subtyping of $pair is allowed, we start to want inline caches to handle polymorphism, but the checking the map word is still the basic mechanism.

However, as I mentioned, we only have structural equality of types; two (struct (ref eq)) type definitions will define the same type and have the same map word (run-time type / RTT). Hence the _canon in the name of struct.new_canon $pair: we create an instance of $pair, with the canonical run-time-type for objects having $pair-shape.

In earlier drafts of the WebAssembly GC extensions, users could define their own RTTs, which effectively amounts to nominal typing: not only does this object have the right structure, but was it created with respect to this particular RTT. But, this facility was cut from the first release, and it left ghosts in the form of these _canon suffixes on type constructor instructions.

For the Scheme-to-WebAssembly effort, we effectively add back in a degree of nominal typing via type tags. For better or for worse this results in a so-called "open-world" system: you can instantiate a separately-compiled WebAssembly module that happens to define the same types and use the same type tags and it will be able to happily access the contents of Scheme values from another module. If you were to use nominal types, you would't be able to do so, unless there were some common base module that defined and exported the types of interests, and which any extension module would need to import.

(func $car (param (ref eq)) (result (ref eq)) (local (ref $pair)) (block $not-pair (br_if $not-pair (i32.eqz (ref.test $pair (local.get 0)))) (local.set 1 (ref.cast $pair) (local.get 0)) (br_if $not-pair (i32.ne (i32.const 1) (i32.and (i32.const 0xff) (struct.get $heap-object 0 (local.get 1))))) (return_call $%car (local.get 1))) (call $type-error) (unreachable))

In the previous example we had $%car, with a funny % in the name, taking a (ref $pair) as an argument. But in the general case (barring compiler heroics) car will take an instance of the unitype (ref eq). To know that it's actually a pair we have to make two checks: one, that it is a struct and has the $pair shape, and two, that it has the right tag. Oh well!

Scheme to Wasm
  • Value representation
  • Varargs
  • Tail calls
  • Delimited continuations
  • Numeric tower

But with all of that I think we have a solid story on how to represent values. I went through all of the basic value types in Guile and checked that they could all be represented using GC types, and it seems that all is good. Now on to the next point: varargs.

Scheme to Wasm: Varargs (list 'hey) ;; => (hey) (list 'hey 'bob) ;; => (hey bob)

Problem: Wasm functions strongly typed

(func $list (param ???) (result (ref eq)) ???)

Solution: Virtualize calling convention

In WebAssembly, you define functions with a type, and it is impossible to call them in an unsound way. You must call $car exactly 2 arguments or it will not compile, and those arguments have to be of specific types, and so on. But Scheme doesn't enforce these restrictions on the language level, bless its little miscreant heart. You can call car with 5 arguments, and you'll get a run-time error. There are some functions that can take a variable number of arguments, doing different things depending on incoming argument count.

How do we square these two approaches to function types?

;; "Registers" for args 0 to 3 (global $arg0 (mut (ref eq)) (i31.new (i32.const 0))) (global $arg1 (mut (ref eq)) (i31.new (i32.const 0))) (global $arg2 (mut (ref eq)) (i31.new (i32.const 0))) (global $arg3 (mut (ref eq)) (i31.new (i32.const 0))) ;; "Memory" for the rest (type $argv (array (ref eq))) (global $argN (ref $argv) (array.new_canon_default $argv (i31.const 42) (i31.new (i32.const 0))))

Uniform function type: argument count as sole parameter

Callee moves args to locals, possibly clearing roots

The approach we are taking is to virtualize the calling convention. In the same way that when calling an x86-64 function, you pass the first argument in $rdi, then $rsi, and eventually if you run out of registers you put arguments in memory, in the same way we'll pass the first argument in the $arg0 global, then $arg1, and eventually in memory if needed. The function will receive the number of incoming arguments as its sole parameter; in fact, all functions will be of type (func (param i32)).

The expectation is that after checking argument count, the callee will load its arguments from globals / memory to locals, which the compiler can do a better job on than globals. We might not even emit code to null out the argument globals; might leak a little memory but probably would be a win.

You can imagine a world in which $arg0 actually gets globally allocated to $rdi, because it is only live during the call sequence; but I don't think that world is this one :)

Scheme to Wasm
  • Value representation
  • Varargs
  • Tail calls
  • Delimited continuations
  • Numeric tower

Great, two points out of the way! Next up, tail calls.

Scheme to Wasm: Tail calls ;; Call known function (return_call $f arg ...) ;; Call function by value (return_call_ref $type callee arg ...)

Friends -- I almost cried making this slide. We Schemers are used to working around the lack of tail calls, and I could have done so here, but it's just such a relief that these functions are just going to be there and I don't have to think much more about them. Technically speaking the proposal isn't merged yet; checking the phases document it's at the last station before headed to the great depot in the sky. But, soon soon it will be present and enabled in all WebAssembly implementations, and we should build systems now that rely on it.

Scheme to Wasm
  • Value representation
  • Varargs
  • Tail calls
  • Delimited continuations
  • Numeric tower

Next up, my favorite favorite topic: delimited continuations.

Scheme to Wasm: Prompts (1)

Problem: Lightweight threads/fibers, exceptions

Possible solutions

  • Eventually, built-in coroutines
  • binaryen’s asyncify (not yet ready for GC); see Julia
  • Delimited continuations

“Bring your whole self”

Before diving in though, one might wonder why bother. Delimited continuations are a building-block that one can use to build other, more useful things, notably exceptions and light-weight threading / fibers. Could there be another way of achieving these end goals without having to implement this relatively uncommon primitive?

For fibers, it is possible to implement them in terms of a built-in coroutine facility. The standards body seems willing to include a coroutine primitive, but it seems far off to me; not within the next 3-4 years I would say. So let's put that to one side.

There is a more near-term solution, to use asyncify to implement coroutines somehow; but my understanding is that asyncify is not ready for GC yet.

For the Guile flavor of Scheme at least, delimited continuations are table stakes of their own right, so given that we will have them on WebAssembly, we might as well use them to implement fibers and exceptions in the same way as we do on native targets. Why compromise if you don't have to?

Scheme to Wasm: Prompts (2)

Prompts delimit continuations

(define k (call-with-prompt ’foo ; body (lambda () (+ 34 (abort-to-prompt 'foo))) ; handler (lambda (continuation) continuation))) (k 10) ;; ⇒ 44 (- (k 10) 2) ;; ⇒ 42

k is the _ in (lambda () (+ 34 _))

There are a few ways to implement delimited continuations, but my usual way of thinking about them is that a delimited continuation is a slice of the stack. One end of the slice is the prompt established by call-with-prompt, and the other by the continuation of the call to abort-to-prompt. Capturing a slice pops it off the stack, copying it out to the heap as a callable function. Calling that function splats the captured slice back on the stack and resumes it where it left off.

Scheme to Wasm: Prompts (3)

Delimited continuations are stack slices

Make stack explicit via minimal continuation-passing-style conversion

  • Turn all calls into tail calls
  • Allocate return continuations on explicit stack
  • Breaks functions into pieces at non-tail calls

This low-level intuition of what a delimited continuation is leads naturally to an implementation; the only problem is that we can't slice the WebAssembly call stack. The workaround here is similar to the varargs case: we virtualize the stack.

The mechanism to do so is a continuation-passing-style (CPS) transformation of each function. Functions that make no calls, such as leaf functions, don't need to change at all. The same goes for functions that make only tail calls. For functions that make non-tail calls, we split them into pieces that preserve the only-tail-calls property.

Scheme to Wasm: Prompts (4)

Before a non-tail-call:

  • Push live-out vars on stacks (one stack per top type)
  • Push continuation as funcref
  • Tail-call callee

Return from call via pop and tail call:

(return_call_ref (call $pop-return) (i32.const 0))

After return, continuation pops state from stacks

Consider a simple function:

(define (f x y) (+ x (g y))

Before making a non-tail call, a "tailified" function will instead push all live data onto an explicitly-managed stack and tail-call the callee. It also pushes on the return continuation. Returning from the callee pops the return continuation and tail-calls it. The return continuation pops the previously-saved live data and continues.

In this concrete case, tailification would split f into two pieces:

(define (f x y) (push! x) (push-return! f-return-continuation-0) (g y)) (define (f-return-continuation-0 g-of-y) (define k (pop-return!)) (define x (pop! x)) (k (+ x g-of-y)))

Now there are no non-tail calls, besides calls to run-time routines like push! and + and so on. This transformation is implemented by tailify.scm.

Scheme to Wasm: Prompts (5)


  • Pop stack slice to reified continuation object
  • Tail-call new top of stack: prompt handler

Calling a reified continuation:

  • Push stack slice
  • Tail-call new top of stack

No need to wait for effect handlers proposal; you can have it all now!

The salient point is that the stack on which push! operates (in reality, probably four or five stacks: one in linear memory or an array for types like i32 or f64, three for each of the managed top types any, extern, and func, and one for the stack of return continuations) are managed by us, so we can slice them.

Someone asked in the talk about whether the explicit memory traffic and avoiding the return-address-buffer branch prediction is a source of inefficiency in the transformation and I have to say, yes, but I don't know by how much. I guess we'll find out soon.

Scheme to Wasm
  • Value representation
  • Varargs
  • Tail calls
  • Delimited continuations
  • Numeric tower

Okeydokes, last point!

Scheme to Wasm: Numbers

Numbers can be immediate: fixnums

Or on the heap: bignums, fractions, flonums, complex

Supertype is still ref eq

Consider imports to implement bignums

  • On web: BigInt
  • On edge: Wasm support module (mini-gmp?)

Dynamic dispatch for polymorphic ops, as usual

First, I would note that sometimes the compiler can unbox numeric operations. For example if it infers that a result will be an inexact real, it can use unboxed f64 instead of library routines working on heap flonums ((struct i32 f64); the initial i32 is for the hash and tag). But we still need a story for the general case that involves dynamic type checks.

The basic idea is that we get to have fixnums and heap numbers. Fixnums will handle most of the integer arithmetic that we need, and will avoid allocation. We'll inline most fixnum operations as a fast path and call out to library routines otherwise. Of course fixnum inputs may produce a bignum output as well, so the fast path sometimes includes another slow-path callout.

We want to minimize binary module size. In an ideal compile-to-WebAssembly situation, a small program will have a small module size, down to a minimum of a kilobyte or so; larger programs can be megabytes, if the user experience allows for the download delay. Binary module size will be dominated by code, so that means we need to plan for aggressive dead-code elimination, minimize the size of fast paths, and also minimize the size of the standard library.

For numbers, we try to keep module size down by leaning on the platform. In the case of bignums, we can punt some of this work to the host; on a JavaScript host, we would use BigInt, and on a WASI host we'd compile an external bignum library. So that's the general story: inlined fixnum fast paths with dynamic checks, and otherwise library routine callouts, combined with aggressive whole-program dead-code elimination.

Scheme to Wasm
  • Value representation
  • Varargs
  • Tail calls
  • Delimited continuations
  • Numeric tower

Hey I think we did it! Always before when I thought about compiling Scheme or Guile to the web, I got stuck on some point or another, was tempted down the corner-cutting alleys, and eventually gave up before starting. But finally it would seem that the stars are aligned: we get to have our Scheme and run it too.


Debugging: The wild west of DWARF; prompts

Strings: stringref host strings spark joy

JS interop: Export accessors; Wasm objects opaque to JS. externref.

JIT: A whole ’nother talk!

AOT: wasm2c

Of course, like I said, WebAssembly is still a weird machine: as a compilation target but also at run-time. Debugging is a right proper mess; perhaps some other article on that some time.

How to represent strings is a surprisingly gnarly question; there is tension within the WebAssembly standards community between those that think that it's possible for JavaScript and WebAssembly to share an underlying string representation, and those that think that it's a fool's errand and that copying is the only way to go. I don't know which side will prevail; perhaps more on that as well later on.

Similarly the whole interoperation with JavaScript question is very much in its early stages, with the current situation choosing to err on the side of nothing rather than the wrong thing. You can pass a WebAssembly (ref eq) to JavaScript, but JavaScript can't do anything with it: it has no prototype. The state of the art is to also ship a JS run-time that wraps each wasm object, proxying exported functions from the wasm module as object methods.

Finally, some language implementations really need JIT support, like PyPy. There, that's a whole 'nother talk!

WebAssembly for the rest of us

With GC, WebAssembly is now ready for us

Getting our languages on WebAssembly now a S.M.O.P.

Let’s score some goals in the second half!

(visit-links "gitlab.com/spritely/guile-hoot-updates" "wingolog.org" "wingo@igalia.com" "igalia.com" "mastodon.social/@wingo")

WebAssembly has proven to have some great wins for C, C++, Rust, and so on -- but now it's our turn to get in the game. GC is coming and we as a community need to be getting our compilers and language run-times ready. Let's put on the coffee and bang some bytes together; it's still early days and there's a world to win out there for the language community with the best WebAssembly experience. The game is afoot: happy consing!

Categories: FLOSS Project Planets