FLOSS Project Planets

Robin Wilson: Easily hiding items from the legend in matplotlib

Planet Python - Tue, 2019-10-08 06:32

When producing some graphs for a client recently, I wanted to hide some labels from a legend in matplotlib. I started investigating complex arguments to the plt.legend function, but it turned out that there was a really simple way to do it…

If you start your label for a plot item with an underscore (_) then that item will be hidden from the legend.

For example:

plt.plot(np.random.rand(20), label='Random 1') plt.plot(np.random.rand(20), label='Random 2') plt.plot(np.random.rand(20), label='_Hidden label') plt.legend()

produces a plot like this:

You can see that the third line is hidden from the legend – just because we started its label with an underscore.

I found this particularly useful when I wanted to plot a load of lines in the same colour to show all the data for something, and then highlight a few lines that meant specific things. For example:

for i in range(20): plt.plot(np.random.rand(20), label='_Hidden', color='gray', alpha=0.3) plt.plot(np.random.rand(20), label='Important Line 1') plt.plot(np.random.rand(20), label='Important Line 2') plt.legend()

My next step was to do this when plotting from pandas. In this case I had a dataframe that had a column for each line I wanted to plot in the ‘background’, and then a separate dataframe with each of the ‘special’ lines to highlight.

This code will create a couple of example dataframes:

df = pd.DataFrame() for i in range(20): df[f'Data{i}'] = np.random.rand(20) special = pd.Series(data=np.random.rand(20))

Plotting this produces a legend with all the individual lines showing:

df.plot(color='gray', alpha=0.3)

However, just by changing the column names to start with an underscore you can hide all the entries in the legend. In this example, I actually set one of the columns to a name without an underscore, so that column can be used as a label to represent all of these lines:

cols = ["_" + col for col in df.columns] cols[0] = 'All other data' df.columns = cols

Plotting again using exactly the same command as above gives us this – along with some warnings saying that a load of legend items are going to be ignored (in case we accidentally had pandas columns starting with _)

Putting it all together, we can plot both dataframes, with a sensible legend:

ax = df.plot(color='gray', alpha=0.3) special.plot(ax=ax, label='Special data') plt.legend()

Advert: I do freelance data science work – please see here for more details.

Categories: FLOSS Project Planets

Codementor: How I access Microsoft SharePoint in my Python scripts

Planet Python - Tue, 2019-10-08 04:29
Python & SharePoint might seem like a odd combination, but you can get useful information out of SharePoint by using a simple Python script.
Categories: FLOSS Project Planets

ADCI Solutions: Drupal Global Training Day #9

Planet Drupal - Tue, 2019-10-08 04:19

 

It’s a pleasure for us to bring to a focus GTD in a city. Last weekend ADCI Solutions showed how entertaining can be creating a website using the Drupal platform. It was “Star Wars Global Training Day #9” where only the quickest teams got awards! Check out how it was.

Categories: FLOSS Project Planets

S. Lott: Spreadsheet Regrets

Planet Python - Tue, 2019-10-08 04:00
I can't emphasize this enough.

Some people, when confronted with a problem, think
“I know, I'll use a spreadsheet.”   Now they have two problems.

(This was originally about regular expressions. And AWK. See http://regex.info/blog/2006-09-15/247)

Fiction writer F. L. Stevens got a list of literary agents from AAR Online. This became a spreadsheet driving queries for representation. After a bunch of rejections, another query against AAR Online provided a second list of agents.

Apple's Numbers product will readily translate the AAR Online HTML table into a usable spreadsheet table. But after initial success the spreadsheet as tool of choice collapses into a pile of rubble. The spreadsheet data model is hopelessly ineffective for the problem domain.

What is the problem domain?

There are two user stories:
  1. Author needs to deduplicate agents and agencies. It's considered poor form to badger agents with repeated queries for the same title. It's also bad form to query two agents at the same agency. You have to get rejected by one before contacting the other. 
  2. Author needs to track activities at the Agent and Agency level to optimize querying. This mostly involves sending queries and tracking rejections. Ideally, an agent acceptance should lead to notification to other agents that the manuscript is being withdrawn. This is so rare as to not require much automation.
Agents come and go. Periodically, an agent will be closed to queries for some period of time, and then reopen. Their interests vary with the whims of the marketplace they're trying to serve. Traditional fiction publishing is quite complex; agents are the gatekeepers.
To an extent, we can decompose the processing like this. 
1. Sourcing. There are several sources: AAR Online and Agent Query are two big sources. These sites have usable query engines and the HTML can be scraped to get a list of currently active agents with a uniform representation. This is elegant Python and Beautiful Soup. 
2. Deduplication. Agency and Agent deduplication is central. Query results may involve state changes to an agent (open to queries, interested in new genres.) Query results may involve simple duplicates, which have to be discarded to avoid repeated queries. It's a huge pain when attempted with a spreadsheet. The simplistic string equality test for name matching is defeated by whitespace variations, for example. This is elegant Python, however. 
3. Agent web site checks. These have to be done manually. Agency web pages are often art projects, larded up with javascript that produces elegant rolling animations of books, authors, agents, background art, and text. These sites aren't really set up to help authors. It's impossible to automate a check to confirm the source query results. This has to be done manually: F. L. is required to click and update status. 
4. State Changes. Queries and Rejections are the important state changes. Open and Closed to queries is also part of the state that needs to be tracked. Additionally, there's a multiple agent per agency check that makes this more complex. The state changes are painful to track in a simple spreadsheet-like data structure: a rejection by one agent can free up another agent at the same agency. This multi-row state change is simply horrible to deal with.

Bonus confusion! Time-to-Live rules: a query over 60 days old is more-or-less a de facto rejection. This means that periodic scans of the data are required to close a query to one agent in an agency, freeing up subsequent agents in the same agency.
Manuscript Wish Lists (MSWLs) are a source for agents actively searching for manuscripts. This is more-or-less a Twitter query. Using the various aggregating web sites seems slightly easier than using Twitter directly. However, additional Twitter lookups are required to locate agent details, so this is interesting web-scraping.

Of course F. L. Stevens has a legacy spreadsheet with at least four "similar" (but not really identical) tabs filled with agencies, agents, and query status.
I don't have an implementation to share -- yet. I'm working on it slowly.

I think it will be an interesting tutorial in cleaning up semi-structured data.
Categories: FLOSS Project Planets

Jonathan McDowell: Ada Lovelace Day: 5 Amazing Women in Tech

Planet Debian - Tue, 2019-10-08 03:00

It’s Ada Lovelace day and I’ve been lax in previous years about celebrating some of the talented women in technology I know or follow on the interwebs. So, to make up for it, here are 5 amazing technologists.

Allison Randal

I was initially aware of Allison through her work on Perl, was vaguely aware of the fact she was working on Ubunutu, briefly overlapped with her at HPE (and thought it was impressive HP were hiring such high calibre of Free Software folk) when she was working on OpenStack, and have had the pleasure of meeting her in person due to the fact we both work on Debian. In the continuing theme of being able to do all things tech she’s currently studying a PhD at Cambridge (the real one), and has already written a fascinating paper about about the security misconceptions around virtual machines and containers. She’s also been doing things with home automation, properly, with local speech recognition rather than relying on any external assistant service (I will, eventually, find the time to follow her advice and try this out for myself).

Alyssa Rosenzweig

Graphics are one of the many things I just can’t do. I’m not artistic and I’m in awe of anyone who is capable of wrangling bits to make computers do graphical magic. People who can reverse engineer graphics hardware that would otherwise only be supported by icky binary blobs impress me even more. Alyssa is such a person, working on the Panfrost driver for ARM’s Mali Midgard + Bifrost GPUs. The lack of a Free driver stack for this hardware is a real problem for the ARM ecosystem and she has been tirelessly working to bring this to many ARM based platforms. I was delighted when I saw one of my favourite Free Software consultancies, Collabora, had given her an internship over the summer. (Selfishly I’m hoping it means the Gemini PDA will eventually be able to run an upstream kernel with accelerated graphics.)

Angie McKeown

The first time I saw Angie talk it was about the user experience of Virtual Reality, and how it had an entirely different set of guidelines to conventional user interfaces. In particular the premise of not trying to shock or surprise the user while they’re in what can be a very immersive environment. Obvious once someone explains it to you! Turns out she was also involved in the early days of custom PC builds and internet cafes in Northern Ireland, and has interesting stories to tell. These days she’s concentrating on cyber security - I’ve her to thank for convincing me to persevere with Ghidra - having looked at Bluetooth security as part of her Masters. She’s also deeply aware of the implications of the GDPR and has done some interesting work on thinking about how it affects the computer gaming industry - both from the perspective of the author, and the player.

Claire Wilgar

I’m not particularly fond of modern web design. That’s unfair of me, but web designers seem happy to load megabytes of Javascript from all over the internet just to display the most basic of holding pages. Indeed it seems that such things now require all the includes rather than being simply a matter of HTML, CSS and some graphics, all from the same server. Claire talked at Women Techmakers Belfast about moving away from all of this bloat and back to a minimalistic approach with improved performance, responsiveness and usability, without sacrificing functionality or presentation. She said all the things I want to say to web designers, but from a position of authority, being a front end developer as her day job. It’s great to see someone passionate about front-end development who wants to do things the right way, and talks about it in a way that even people without direct experience of the technologies involved (like me) can understand and appreciate.

Karen Sandler

There aren’t enough people out there who understand law and technology well. Karen is one of the few I’ve encountered who do, and not only that, but really, really gets Free software and the impact of the four freedoms on users in a way many pure technologists do not. She’s had a successful legal career that’s transitioned into being the general counsel for the Software Freedom Law Center, been the executive director of GNOME and is now the executive director of the Software Freedom Conservancy. As someone who likes to think he knows a little bit about law and technology I found Karen’s wealth of knowledge and eloquence slightly intimidating the first time I saw her speak (I think at some event in San Francisco), but I’ve subsequently (gratefully) discovered she has an incredible amount of patience (and ability) when trying to explain the nuances of free software legal issues.

Categories: FLOSS Project Planets

PreviousNext: Announcing Skpr: Control on Command

Planet Drupal - Mon, 2019-10-07 21:03

Skpr - pronounced Skipper - is a cloud hosting platform specifically designed to maximise the productivity of development teams by giving them full control right from the command line.

by Nick Schuch / 8 October 2019

During our consulting engagements with large organisations, we recognised a clear trend; they were moving away from narrow, single-site hosting services and building bespoke platforms on top of Kubernetes to support their multi-site, multi-technology initiatives.

Back in 2016 we had this exact need for hosting our entire portfolio of sites. Throughout this journey we found that providing developers with a simple Command Line Interface (CLI), has led to huge improvements in our team's efficiency and the overall quality of our products.

So, today we’re announcing the public launch of our hosting platform, Skpr. The platform for teams who want a simple command line tool, backed by a range of industry-leading services and supported by our own team of experts.

Why Skpr is different

Many hosting platforms provide a web interface where deployments can be dragged-and-dropped between environments.

While these solutions are more effective for non-developers, they fall short in integration and extendability within the workflow of the developers actually doing the job. Having a Command Line Interface (CLI) means that not only do we provide the same level of control, we provide the flexibility to extend those workflows. 

  • Scripts - Having a CLI means that Skpr can integrate into existing automation, along with CI tools such as CircleCI.
  • Documentation - Complex tasks carried out via a GUI are very difficult to document. CLIs mean you spend less time describing a user interface and more time documenting the actual process.
Control on Command

With a few commands, developers have the control to package, deploy, configure and monitor their services right from the command line.

And while we want to provide a platform that's powerful, reliable and secure, we're passionate about making it easy-to-use as well.

To find out more, visit skpr.io.

Tagged Skpr, Cloud Hosting, Drupal Hosting
Categories: FLOSS Project Planets

Wingware News: Wing Python IDE 7.1.2 - October 8, 2019

Planet Python - Mon, 2019-10-07 21:00

Wing 7.1.2 adds a How-To for using Wing with Docker, allows disabling code warnings from the tooltip displayed over the editor, adds support for macOS 10.15 (Catalina), supports code folding in JSON files, adds optional word wrapping for output in the Testing tool, and fixes about 25 minor usability issues.


Download Wing 7.1.2 Now: Wing Pro | Wing Personal | Wing 101 | Compare Products


Some Highlights of Wing 7.1
Support for Python 3.8

Wing 7.1 supports editing, testing, and debugging code written for Python 3.8, so you can take advantage of assignment expressions and other improvements introduced in this new version of Python.

Improved Code Warnings

Wing 7.1 adds unused symbol warnings for imports, variables, and arguments found in Python code. This release also improves code warnings configuration, making it easier to disable unwanted warnings.

Cosmetic Improvements

Wing 7.1 improves the auto-completer, project tool, and code browser with redesigned icons that make use of Wing's icon color configuration. This release also improves text display on some Linux systems, supports Dark Mode on macOS, and improves display of Python code and icons found in documentation.

And More

Wing 7.1 also adds a How-To for using Wing with Docker, the ability to disable code warnings from tooltips on the editor, support for macOS 10.15 (Catalina), code folding in JSON files, word wrapping for output in the Testing tool, support for Windows 10 native OpenSSH installations for remote development, and many minor improvements. This release drops support for macOS 10.11. System requirements remain unchanged on Windows and Linux.

For details see the change log.

For a complete list of new features in Wing 7, see What's New in Wing 7.


Try Wing 7.1 Now!

Wing 7.1 is an exciting new step for Wingware's Python IDE product line. Find out how Wing 7.1 can turbocharge your Python development by trying it today.

Downloads: Wing Pro | Wing Personal | Wing 101 | Compare Products

See Upgrading for details on upgrading from Wing 6 and earlier, and Migrating from Older Versions for a list of compatibility notes.

Categories: FLOSS Project Planets

Podcast.__init__: Network Automation At Enterprise Scale With Python

Planet Python - Mon, 2019-10-07 20:45
Designing and maintaining enterprise networks and the associated hardware is a complex and time consuming task. Network automation tools allow network engineers to codify their workflows and make them repeatable. In this episode Antoine Fourmy describes his work on eNMS and how it can be used to automate enterprise grade networks. He explains how his background in telecom networking led him to build an open source platform for network engineers, how it is architected, and how you can use it for creating your own workflows. This is definitely worth listening to as a way to gain some appreciation for all of the work that goes on behind the scenes to make the internet possible.Summary

Designing and maintaining enterprise networks and the associated hardware is a complex and time consuming task. Network automation tools allow network engineers to codify their workflows and make them repeatable. In this episode Antoine Fourmy describes his work on eNMS and how it can be used to automate enterprise grade networks. He explains how his background in telecom networking led him to build an open source platform for network engineers, how it is architected, and how you can use it for creating your own workflows. This is definitely worth listening to as a way to gain some appreciation for all of the work that goes on behind the scenes to make the internet possible.

Announcements
  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With 200 Gbit/s private networking, scalable shared block storage, node balancers, and a 40 Gbit/s public network, all controlled by a brand new API you’ve got everything you need to scale up. And for your tasks that need fast computation, such as training machine learning models, they just launched dedicated CPU instances. Go to pythonpodcast.com/linode to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show!
  • You listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, Corinium Global Intelligence, Alluxio, and Data Council. Upcoming events include the combined events of the Data Architecture Summit and Graphorum, the Data Orchestration Summit, and Data Council in NYC. Go to pythonpodcast.com/conferences to learn more about these and other events, and take advantage of our partner discounts to save money when you register today.
  • Your host as usual is Tobias Macey and today I’m interviewing Antoine Fourmy about eNMS, an enterprise-grade vendor-agnostic network automation platform.
Interview
  • Introductions
  • How did you get introduced to Python?
  • Can you start by explaining what eNMS is
  • What was your motivation for creating it?
  • Who are the target users of eNMS and how much background knowledge of network management is required to be effective with it?
  • What are some of the alternative tools that exist in this space and why might a network operator choose to use eNMS in their place?
  • What are some of the most challenging aspects of network creation and maintenance and how does eNMS assist with them?
  • What are some of the mundane and/or error-prone tasks that can be replaced or automated with eNMS?
  • What are some of the additional features that come into play for more complex networking tasks?
  • Can you describe the system architecture of eNMS and how it has evolved since you first began working on it?
  • eNMS is an impressive project that looks to have a substantial amount of polish. How large is the overall community of users and contributors?
    • For someone who wants to get involved in contributing to eNMS what are some of the types of skills and background that would be helpful?
  • What are some of the most innovative/unexpected ways that you have seen eNMS used?
  • When is eNMS the wrong choice?
  • What do you have planned for the future of the project?
Keep In Touch Picks Closing Announcements
  • Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management.
  • Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
  • If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@podcastinit.com) with your story.
  • To help other people find the show please leave a review on iTunes and tell your friends and co-workers
  • Join the community in the new Zulip chat workspace at pythonpodcast.com/chat
Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Categories: FLOSS Project Planets

Palantir: HonorHealth

Planet Drupal - Mon, 2019-10-07 20:24
honorhealth.com Making healthcare simple and personal with a consumer-centric approach On Highlights
  • Point One
  • Point Two
  • Point Three

HonorHealth is well-known as a community health system drawing from a strong legacy of care for those in their community for more than 100 years in Phoenix, AZ. HonorHealth provides critical, life-saving services; they focus their energy on providing first-in-class medical services, training, and humanitarian assistance. HonorHealth needed a content management platform to help them communicate not only their services, but also their innovations in treatments, technology, and clinical research in a way that is comprehensible and effective for their audiences.

The Challenge

The previous honorhealth.com was a Drupal 7 site that offered inconsistent search interfaces, a splintered “make an appointment” experience, and an overall menu and content structure that was based on internal organizational needs. With a renewed focus on their customers, HonorHealth wanted to build a new website that would improve the crucial “find care” experience for their users.

More specifically, they wanted to make it seamless for patients to:

  • Find and receive the right care
  • Get access to their health information
  • Manage their care easily
  • Maintain their health or cope with a condition

HonorHealth engaged Palantir.net to create an easy-to-administer, modernized Drupal 8 platform. The new consumer-centric honorhealth.com:

  • Builds and fosters trust through design and good usability
  • Makes it easy for users to find care and locations
  • Features a site structure driven by actual user priorities
  • Is WCAG 2.0 AA compliant
How We Did It

With an eye on the future, Palantir focused work on creating a platform that will be easily extendable as HonorHealth expands the services they provide to their community over time. Palantir knew it was essential to create a strong platform in order to enable HonorHealth to integrate more technologies in the future as part of their continuum of care.

Collaborative Discovery Phase

The project began with an in-person workshop that joined together the Palantir team with a large group of client stakeholders (15+), each with varied levels of project context, technical knowledge, and ongoing involvement in the overall project. As Palantir interviewed the organizational stakeholders, it became clear that putting patients and prospective patients first and foremost was going to be critical to the success of the project. Palantir worked with their partners at HonorHealth to make sure the team was answering every challenge through the lens of “does this first serve patients and prospective patients?”

Inception focused heavily on user testing of HonorHealth’s existing personas, which were validated through abbreviated top task research in order to get data on the specific priorities of the primary audiences. The results of this work were incorporated into the page designs and informed the Information Architecture (IA) hypothesis, which was tested further using a tree test. The tree test helped us determine which tasks users could easily complete on the site and which tasks were especially difficult to complete.

Defining Features

Palantir used a series of “sketch sessions”–workshops where the team and client stakeholders collaboratively define key user flows and experiences. The outcome is low-fidelity, annotated wireframes that can initiate conversations around the build and move the project forward.

Both the UX and development team at Palantir participated in these sessions, which enabled them to overlap design work with development work much longer, creating a more integrated approach to developing the HonorHealth site. This innovative approach enabled Palantir to show the site–with real content–in it much sooner than traditionally possible, which meant HonorHealth could provide incremental feedback on the site. This process of migrating data early and evolving it in-place also made it easier to showcase data integrations as they lived alongside CMS-controlled content.

Building a Valuable IA

Using data from the top tasks research, Palantir developed a framework of “most valuable experiences” for HonorHealth’s audiences. The site’s information architecture and hierarchy, calls-to-action, navigation, and layout are all anchored around these experiences – always surfacing the information users need in the most contextually relevant places and laying the groundwork for personalization opportunities in the future.

The most notable highlights of the new site include:

  • A consolidated appointment flow (universal button, main landing page) that helps users decide how to proceed
  • A straightforward, consistent search experience for finding doctors, locations, clinical trials, articles and events
Flexible Components

With the previous site, content design elements were created mostly inline on individual pages by a web manager with HTML knowledge. This limited who on the team could make changes to the site and slowed down the process of updating the site with critical content. With the new honorhealth.com, the editorial team has a set of dynamic, flexible, field-based content components that enables content editors to craft compelling narrative content pages, without needing to know HTML.

Modern Design

The new honorhealth.com cleverly incorporates their existing brand elements while modernizing the design. Simple and straightforward, the updated design is easy to use and connects and resonates with key audiences.

Integrating With Key Systems

One of the overall goals when embarking on this redesign was to create an experience that mimicked the streamlined usability of an ecommerce site. HonorHealth wanted to create a place where patients could easily make choices about their care and schedule appointments with minimal effort. In order to provide this experience, it was imperative that the new platform could play well with HonorHealth’s external services and created a firm foundation to integrate more self-service tools in the future.

In particular, Palantir integrated with three services as part of the build: SymphonyRM, Docscores, and Clockwise.

  • SymphonyRM offers a rich set of data served by a dynamic API. HonorHealth leverages SymphonyRM’s Provider Operations services to hold its practice location and physician data. Palantir worked closely with Symphony to help define the data structure. Through this work, HonorHealth was able to reduce the number of steps and people required to maintain their provider and location data. By leveraging the strategy work done and the technical consultation of Palantir’s Technical Architecture Owner, HonorHealth was able to keep focused on the most valuable content to their users throughout all of their integrated systems.
  • Docscores provides a platform for gathering high-quality ratings and review data on healthcare practitioners and locations. Palantir integrated this data directly with the physician and location data provided from SymphonyRM to create a research and discovery tool for HonorHealth users. On the new HonorHealth website, users can find physicians near a specific location and read about other patients’ experiences with them.
  • Clockwise provides real-time wait estimates for people looking for Urgent Care services in their area. Each of these individual “under the hood” integrations don’t represent a significant shift for website users, but when all of these changes are coupled with the intense focus on putting the user experience of the site first, the result speaks for itself: a beautiful website that works well and empowers people to engage in their ongoing healthcare in meaningful ways.

Each of these individual “under the hood” integrations don’t represent a significant shift for website users, but when all of these changes are coupled with the intense focus on putting the user experience of the site first, the result speaks for itself: a beautiful website that works well and empowers people to engage in their ongoing healthcare in meaningful ways.

At risk of sounding trite and cliche, the term ‘best in class’ legitimately does apply.

Jake Kelly

Web Specialist

Key Results

Each and every page of the new Drupal 8 site fits neatly in a hierarchy of user-centric top-tasks. Each page is outcome- or action-oriented and centers on the question: “What are our users trying to accomplish when they come to this page?” The new site surfaces valuable, decision-influencing information like Emergency Department wait times and easy appointment request forms in key places to help users get the services they need and want, quickly and efficiently.

The new HonorHealth site has taken a strong step forward in effectiveness for not only patients, but the internal team as well. By taking a strategic approach to how their data is managed across all of their vendors and integrations, Palantir was able to find efficiencies and make that process easier, so that HonorHealth can focus on what they do best – providing top-notch care to the communities that they serve.

Building a modernized Drupal 8 site structure prioritized by user priorities
Categories: FLOSS Project Planets

GNU Guix: Guix Reduces Bootstrap Seed by 50%

GNU Planet! - Mon, 2019-10-07 20:00

We are delighted to announce that the first reduction by 50% of the Guix bootstrap binaries has now been officially released!

This is a very important step because the ~250MB seed of binary code was practically non-auditable, which makes it hard to establish what source code produced them.

Every unauditable binary also leaves us vulnerable to compiler backdoors as described by Ken Thompson in the 1984 paper Reflections on Trusting Trust and beautifully explained by Carl Dong in his Bitcoin Build System Security talk.

It is therefore equally important that we continue towards our final goal: A Full Source bootstrap; removing all unauditable binary seeds.

Guix’ Rigorous Regular Bootstrap

GNU Guix takes a rigorous approach to bootstrapping. Bootstrapping in this context refers to how the distribution gets built from nothing.

The GNU system is primarily made of C code, with glibc at its core. The GNU build system itself assumes the availability of a Bourne shell and command-line tools provided by the Core GNU Utilities and Awk, Grep, Make, Sed, and some others.

The build environment of a package is a container that contains nothing but the package’s declared inputs. To be able to build anything at all in this container, Guix needs pre-built binaries of Guile, GCC, Binutils, glibc, and the other tools mentioned above.

So there is an obvious chicken-and-egg problem: How does the first package get built? How does the first compiler get compiled?

gcc-final ^ | cross-gcc-boot ^ | gcc-boot0 (5.5.0) ^ | binutils-boot0, libstdc++-boot0 ^ | diffutils-boot0, findutils-boot0, file-boot0 ^ | make-boot0 ^ | * bootstrap-binutils, bootstrap-gcc, bootstrap-glibc (~130MB) bootstrap-bash, bootstrap-coreutils&co, bootstrap-guile (~120MB)

full graph

The answer to this starts with bootstrap binaries. The first package that gets built with these bootstrap binaries is make, next are diffutils, findutils, and file. Eventually a gcc-final is built: the compiler used to build regular packages.

i686/x86_64 Reduced Binary Seed bootstrap

The Guix development branch we just merged introduces a reduced binary seed bootstrap for x86_64 and i686, where the bottom of the dependency graph looks like this:

gcc-mesboot (4.9.4) ^ | glibc-mesboot (2.16.0) ^ | gcc-mesboot1 (4.7.4) ^ | binutils-mesboot (2.20.1a) ^ | gcc-mesboot0 (2.95.3) ^ | glibc-mesboot0 (2.2.5) ^ | gcc-core-mesboot (2.95.3) ^ | make-mesboot0, diffutils-mesboot, binutils-mesboot0 (2.20.1a) ^ | tcc-boot ^ | tcc-boot0 ^ | mes-boot ^ | * bootstrap-mescc-tools, bootstrap-mes (~10MB) bootstrap-bash, bootstrap-coreutils&co, bootstrap-guile (~120MB)

full graph

The new Reduced Binary Seed bootstrap removes Binutils, GCC, and glibc and replaces them by GNU Mes and MesCC Tools. This reduces the trusted binary seed by ~120MB - half of it!

As a user, it means your package manager has a formal description of how to build all your applications, in a reproducible way, starting from nothing but this ~120MB seed. It means you can rebuild any of those software artifacts locally without trusting a single binary provider.

For comparison, traditional distros often have an informally specified bootstrap story, usually relying on much bigger binary seeds. We estimate those seeds to weigh in at ~550MB (the size of debootstrap --arch=i386 --include=build-essential,dpkg-dev,debhelper,gcc,libc6-dev,make,texinfo bullseye ./bullseye-chroot http://deb.debian.org/debian, with bullseye-chroot/var/cache/apt/archives removed) in the case of Debian—ignoring cycles that show up higher in the graph.

These bootstrap binaries can now be re-created by doing

guix build bootstrap-binaries

Work started three years ago with a simple LISP-1.5 interpreter.

A year later, Mes 0.5 had become a tiny Scheme interpreter written in simple subset of C that came with a simple C compiler in Scheme. And yes, these were mutual self-hosting.

The next step was to find a path towards compiling Guix’s default GCC (5.5.0). Sadly, bootstrapping GCC compilers has been becoming increasingly difficult over the years. We looked at GCC 1.42: not easy to bootstrap (100,000 LOC) and it depends on Bison. Reluctantly, we started looking for non-GNU alternatives 8cc, pcc, cc500 but finally settled on TinyCC. TinyCC (TCC) can compile GCC (4.7.4) which is currently the most recent release of GCC that can be built without a C++ compiler.

Another year later, Mes 0.13 has grown its own tiny C library and compiles a heavily patched and simplified TCC. This looked very promising and we suggested for TinyCC to help our bootstrapping effort by moving towards a simplified C subset. Instead we were encouraged to make MesCC a full blown C99 compliant compiler. That felt as a setback but it gave us the perspective of removing TCC from the bootstrap later on. Using Nyacc, the amazing parser framework with C99 parser by Matt Wette, has even made that a feasible perspective.

It took only half a year to mature into Mes 0.19 so that building TinyCC (25,000 LOC) now only takes ~8min instead of the initial 5h.

With a bootstrapped TCC we tried building some versions of GCC (1.4, 2.6.3, 2.95.3, 3.0, 3.1, 3.2.3, 3.4.0, 3.4.6, 4.1.1, 4.1.2) to try to build some versions of glibc (1.06.4, 1.09.1, 2.0.1. 2.1.3, 2.3, 2.3.6, 2.2.5, 2.12.2, 2.15, 2.16.0, 2.28) using slightly less versions of Binutils (1.9, 2.5.1, 2.5.2, 2.6, 2.7, 2.10.1, 2.14, 2.14a, 2.15a, 2.17a, 2.18a, 2.20.1a, 2.23.2, 2.25, 2.28). There were many interesting dependencies, tradeoffs, patching of generated Autotools outputs, especially if you are using your own tiny C library and headers.

Typically, newer versions of the tool chain fix all kinds of bugs in the build system and C code compliance, which is great. However, simultaneously new features are introduced or dependencies are added that are not necessary for bootstrapping, increasing the bootstrap hurdle. Sometimes, newer tools are more strict or old configure scripts do not recognise newer tool versions.

Also, can you spot the triplets of tool versions that combine into integral versions of the tool chain? ;-)

Scheme-only Bootstrap

Our next target will be another reduction by ~50%; the Scheme-only bootstrap will replace the Bash, Coreutils, etc. binaries by Gash and Gash Core Utils.

Gash is a work-in-progress implementation of a POSIX shell in Scheme, that is already capable-enough to interpret Autoconf-generated configure scripts. It can run on Guile but it is designed to also run on Mes, meaning that we can use it early on during bootstrap.

We are excited that the Nlnet Foundation is now sponsoring this work!

Creating a GNU Tool Chain Bootstrap Story

The Reduced Binary Seed bootstrap starts by building ancient GNU software, notably GCC (2.95.3), glibc (2.2.5).

This amazing achievement is mirrored only by its terrible clumsiness. Is this really how we want to secure the bootstrap of our GNU system?

gcc-mesboot (4.6.4) ^ | tcc-boot ^ | mes-boot ^ | *

Maybe if we could go straight from TinyCC to GCC (4.6.4) we need no longer depend on an ancient GNU tool chain and have a somewhat more modern and more maintainable bootstrap path.

Now that we have shown it can be done, we think it is time for GNU tool chain developers to step in and help create a better version of our tool chain bootstrap story.

Towards a Full Source Bootstrap

We expect many interesting challenges before we approach this lofty target.

The stage0 project by Jeremiah Orians starts everything from ~512 bytes; virtually nothing. Have a look at this incredible project if you haven’t already done so.

Jeremiah is also leading the Mes-M2 effort that is about bootstrapping Mes from stage0. The Mes Scheme interpreter is being rewritten in an even more simple subset of C, without preprocessor macros even. That C-like language is called M2-Planet, after its transpiler.

About Bootstrappable Builds and GNU Mes

Software is bootstrappable when it does not depend on a binary seed that cannot be built from source. Software that is not bootstrappable - even if it is free software - is a serious security risk for a variety of reasons. The Bootstrappable Builds project aims to reduce the number and size of binary seeds to a bare minimum.

GNU Mes is closely related to the Bootstrappable Builds project. Mes aims to create an entirely source-based bootstrapping path for the Guix System and other interested GNU/Linux distributions. The goal is to start from a minimal, easily inspectable binary (which should be readable as source) and bootstrap into something close to R6RS Scheme.

Currently, Mes consists of a mutual self-hosting scheme interpreter and C compiler. It also implements a C library. Mes, the scheme interpreter, is written in about 5,000 lines of code of simple C. MesCC, the C compiler, is written in scheme. Together, Mes and MesCC can compile a lightly patched TinyCC that is self-hosting. Using this TinyCC and the Mes C library, it is possible to bootstrap the entire Guix System for i686-linux and x86_64-linux.

About GNU Guix

GNU Guix is a transactional package manager and an advanced distribution of the GNU system that respects user freedom. Guix can be used on top of any system running the kernel Linux, or it can be used as a standalone operating system distribution for i686, x86_64, ARMv7, and AArch64 machines.

In addition to standard package management features, Guix supports transactional upgrades and roll-backs, unprivileged package management, per-user profiles, and garbage collection. When used as a standalone GNU/Linux distribution, Guix offers a declarative, stateless approach to operating system configuration management. Guix is highly customizable and hackable through Guile programming interfaces and extensions to the Scheme language.

Categories: FLOSS Project Planets

Linux App Summit 2019 schedule is out!

Planet KDE - Mon, 2019-10-07 18:43

We published the Linux App Summit 2019 schedule last week.

We have a bunch of interesting talks (sadly we had to leave almost 40 almost as interesting talks out, we had lots of awesome submissions!) ranging from flatpak and snaps to how product management is good for Free Software projects with some thought provoking talks thrown in to make us think what's the future of our platform.

I am going and you should totally come too!

The attendance is free but please register at https://events.linuxappsummit.org/


Categories: FLOSS Project Planets

Dataquest: Tutorial: Getting Music Data with the Last.fm API using Python

Planet Python - Mon, 2019-10-07 16:12

APIs allow us to make requests from servers to retrieve data. APIs are useful for many things, but one is to be able to create a unique dataset for a data science project. In this tutorial, we’re going to learn some advanced techniques for working with the Last.fm API. In our beginner Python API tutorial, […]

The post Tutorial: Getting Music Data with the Last.fm API using Python appeared first on Dataquest.

Categories: FLOSS Project Planets

TEN7 Blog's Drupal Posts: Case Study: Lutheran Social Service of Minnesota

Planet Drupal - Mon, 2019-10-07 15:14
Becoming Their Go-To Drupal Experts

Lutheran Social Service of Minnesota (LSS) is one of the state’s largest private nonprofit social service organizations, providing a vast array of services in all 87 Minnesota counties for children and families, older adults and people with disabilities.

Categories: FLOSS Project Planets

Jacob Rockowitz: Acquia, Automattic, and Microsoft should lobby governments to fix accessibility issues in Open Source

Planet Drupal - Mon, 2019-10-07 13:53

Recently, I nudged governments to get more involved in fixing accessibility issues in Open Source projects. Getting governments to do anything can feel like a monumental challenge. Maybe we need to build better alliances and then collectively lobby the governments to change how they approach Open Source.

Approach

Dries Buytaert, the founder of Drupal, recently published a blog post titled, "Balancing Makers and Takers to sustain and scale Open Source," and while reading it I wondered, “Are we approaching the problem of sustainability too much as developers? Should we step back and look at the challenge of sustainability from a business and political perspective?”

Is changing an Open Source project's license going to change how other organizations contribute to Open Source? Changing the licensing is a different approach. The recent "Open-source licensing war" felt like a few individual companies are trying to make a significant shift in Open Source, however lacking a unified front. If Open Source companies are going to take on Amazon, they are going to have to do it together by building alliances.

Alliances

The definition of alliance sounds very much like what happens in Open Source communities.

Political alliances (a.k.a. parties) are what powers most governments. The scale of some open source projects has required better governance. In Dries' blog post, he spends time exploring how organizations use Open Source (a.k.a. Takers) without helping to build the software or community (a.k.a. Makers). His post ends with three valuable suggestions that are focused on appealing to organizations and rethinking how the Open Source...Read More

Categories: FLOSS Project Planets

GNU Guix: Joint statement on the GNU Project

GNU Planet! - Mon, 2019-10-07 12:15

We, the undersigned GNU maintainers and developers, owe a debt of gratitude to Richard Stallman for his decades of important work in the free software movement. Stallman tirelessly emphasized the importance of computer user freedom and laid the foundation for his vision to become a reality by starting the development of the GNU operating system. For that we are truly grateful.

Yet, we must also acknowledge that Stallman’s behavior over the years has undermined a core value of the GNU project: the empowerment of all computer users. GNU is not fulfilling its mission when the behavior of its leader alienates a large part of those we want to reach out to.

We believe that Richard Stallman cannot represent all of GNU. We think it is now time for GNU maintainers to collectively decide about the organization of the project. The GNU Project we want to build is one that everyone can trust to defend their freedom.

  1. Ludovic Courtès (GNU Guix, GNU Guile)

  2. Ricardo Wurmus (GNU Guix, GNU GWL)

  3. Matt Lee (GNU Social)

  4. Andreas Enge (GNU MPC)

  5. Samuel Thibault (GNU Hurd, GNU libc)

  6. Carlos O'Donell (GNU libc)

  7. Andy Wingo (GNU Guile)

  8. Jordi Gutiérrez Hermoso (GNU Octave)

  9. Mark Wielaard (GNU Classpath)

  10. Ian Lance Taylor (GCC, GNU Binutils)

  11. Werner Koch (GnuPG)

  12. Daiki Ueno (GNU gettext, GNU libiconv, GNU libunistring)

  13. Christopher Lemmer Webber (GNU MediaGoblin)

  14. Jan Nieuwenhuizen (GNU Mes, GNU LilyPond)

  15. John Wiegley (GNU Emacs)

  16. Tom Tromey (GCC, GDB)

  17. Jeff Law (GCC, Binutils — not signing on behalf of the GCC Steering Committee)

  18. Han-Wen Nienhuys (GNU LilyPond)

  19. Joshua Gay (GNU and Free Software Speaker)

  20. Ian Jackson (GNU adns, GNU userv)

Categories: FLOSS Project Planets

Codementor: Choosing Python for Web Development: Top 16 Pros and Cons

Planet Python - Mon, 2019-10-07 11:44
Did you know that Python was named after Monty Python? One of the world’s most popular coding languages (https://stackoverflow.blog/2017/09/06/incredible-growth-python/), Python was first...
Categories: FLOSS Project Planets

Real Python: Building a Python C Extension Module

Planet Python - Mon, 2019-10-07 10:00

There are several ways in which you can extend the functionality of Python. One of these is to write your Python module in C or C++. This process can lead to improved performance and better access to C library functions and system calls. In this tutorial, you’ll discover how to use the Python API to write Python C extension modules.

You’ll learn how to:

  • Invoke C functions from within Python
  • Pass arguments from Python to C and parse them accordingly
  • Raise exceptions from C code and create custom Python exceptions in C
  • Define global constants in C and make them accessible in Python
  • Test, package, and distribute your Python C extension module

Free Bonus: Click here to get access to a chapter from Python Tricks: The Book that shows you Python's best practices with simple examples you can apply instantly to write more beautiful + Pythonic code.

Extending Your Python Program

One of the lesser-known yet incredibly powerful features of Python is its ability to call functions and libraries defined in compiled languages such as C or C++. This allows you to extend the capabilities of your program beyond what Python’s built-in features have to offer.

There are many languages you could choose from to extend the functionality of Python. So, why should you use C? Here are a few reasons why you might decide to build a Python C extension module:

  1. To implement new built-in object types: It’s possible to write a Python class in C, and then instantiate and extend that class from Python itself. There can be many reasons for doing this, but more often than not, performance is primarily what drives developers to turn to C. Such a situation is rare, but it’s good to know the extent to which Python can be extended.

  2. To call C library functions and system calls: Many programming languages provide interfaces to the most commonly used system calls. Still, there may be other lesser-used system calls that are only accessible through C. The os module in Python is one example.

This is not an exhaustive list, but it gives you the gist of what can be done when extending Python using C or any other language.

To write Python modules in C, you’ll need to use the Python API, which defines the various functions, macros, and variables that allow the Python interpreter to call your C code. All of these tools and more are collectively bundled in the Python.h header file.

Writing a Python Interface in C

In this tutorial, you’ll write a small wrapper for a C library function, which you’ll then invoke from within Python. Implementing a wrapper yourself will give you a better idea about when and how to use C to extend your Python module.

Understanding fputs()

fputs() is the C library function that you’ll be wrapping:

int fputs(const char *, FILE *)

This function takes two arguments:

  1. const char * is an array of characters.
  2. FILE * is a file stream pointer.

fputs() writes the character array to the file specified by the file stream and returns a non-negative value. If the operation is successful, then this value will denote the number of bytes written to the file. If there’s an error, then it returns EOF. You can read more about this C library function and its other variants in the manual page entry.

Writing the C Function for fputs()

This is a basic C program that uses fputs() to write a string to a file stream:

#include <stdio.h> #include <stdlib.h> #include <unistd.h> int main() { FILE *fp = fopen("write.txt", "w"); fputs("Real Python!", fp); fclose(fp); return 1; }

This snippet of code can be summarized as follows:

  1. Open the file write.txt.
  2. Write the string "Real Python!" to the file.

Note: The C code in this article should build on most systems. It has been tested on GCC without using any special flags.

In the following section, you’ll write a wrapper for this C function.

Wrapping fputs()

It might seem a little weird to see the full code before an explanation of how it works. However, taking a moment to inspect the final product will supplement your understanding in the following sections. The code block below shows the final wrapped version of your C code:

1 static PyObject *method_fputs(PyObject *self, PyObject *args) { 2 char *str, *filename = NULL; 3 int bytes_copied = -1; 4 5 /* Parse arguments */ 6 if(!PyArg_ParseTuple(args, "ss", &str, &filename)) { 7 return NULL; 8 } 9 10 FILE *fp = fopen(filename, "w"); 11 bytes_copied = fputs(str, fp); 12 fclose(fp); 13 14 return PyLong_FromLong(bytes_copied); 15 }

This code snippet references three object structures:

  1. PyObject
  2. PyArg_ParseTuple()
  3. PyLong_FromLong()

These are used for data type definition for the Python language. You’ll go through each of them now.

PyObject

PyObject is an object structure that you use to define object types for Python. All Python objects share a small number of fields that are defined using the PyObject structure. All other object types are extensions of this type.

PyObject tells the Python interpreter to treat a pointer to an object as an object. For instance, setting the return type of the above function as PyObject defines the common fields that are required by the Python interpreter in order to recognize this as a valid Python type.

Take another look at the first few lines of your C code:

1 static PyObject *method_fputs(PyObject *self, PyObject *args) { 2 char *str, *filename = NULL; 3 int bytes_copied = -1; 4 5 /* Snip */

In line 2, you declare the argument types you wish to receive from your Python code:

  1. char *str is the string you want to write to the file stream.
  2. char *filename is the name of the file to write to.
PyArg_ParseTuple()

PyArg_ParseTuple() parses the arguments you’ll receive from your Python program into local variables:

1 static PyObject *method_fputs(PyObject *self, PyObject *args) { 2 char *str, *filename = NULL; 3 int bytes_copied = -1; 4 5 /* Parse arguments */ 6 if(!PyArg_ParseTuple(args, "ss", &str, &filename)) { 7 return NULL; 8 } 9 10 /* Snip */

If you look at line 6, then you’ll see that PyArg_ParseTuple() takes the following arguments:

  • args are of type PyObject.

  • "ss" is the format specifier that specifies the data type of the arguments to parse. (You can check out the official documentation for a complete reference.)

  • &str and &filename are pointers to local variables to which the parsed values will be assigned.

PyArg_ParseTuple() evaluates to false on failure. If it fails, then the function will return NULL and not proceed any further.

fputs()

As you’ve seen before, fputs() takes two arguments, one of which is the FILE * object. Since you can’t parse a Python textIOwrapper object using the Python API in C, you’ll have to use a workaround:

1 static PyObject *method_fputs(PyObject *self, PyObject *args) { 2 char *str, *filename = NULL; 3 int bytes_copied = -1; 4 5 /* Parse arguments */ 6 if(!PyArg_ParseTuple(args, "ss", &str, &filename)) { 7 return NULL; 8 } 9 10 FILE *fp = fopen(filename, "w"); 11 bytes_copied = fputs(str, fp); 12 fclose(fp); 13 14 return PyLong_FromLong(bytes_copied); 15 }

Here’s a breakdown of what this code does:

  • In line 10, you’re passing the name of the file that you’ll use to create a FILE * object and pass it on to the function.
  • In line 11, you call fputs() with the following arguments:
    • str is the string you want to write to the file.
    • fp is the FILE * object you defined in line 10.

You then store the return value of fputs() in bytes_copied. This integer variable will be returned to the fputs() invocation within the Python interpreter.

PyLong_FromLong(bytes_copied)

PyLong_FromLong() returns a PyLongObject, which represents an integer object in Python. You can find it at the very end of your C code:

1 static PyObject *method_fputs(PyObject *self, PyObject *args) { 2 char *str, *filename = NULL; 3 int bytes_copied = -1; 4 5 /* Parse arguments */ 6 if(!PyArg_ParseTuple(args, "ss", &str, &filename)) { 7 return NULL; 8 } 9 10 FILE *fp = fopen(filename, "w"); 11 bytes_copied = fputs(str, fp); 12 fclose(fp); 13 14 return PyLong_FromLong(bytes_copied); 15 }

Line 14 generates a PyLongObject for bytes_copied, the variable to be returned when the function is invoked in Python. You must return a PyObject* from your Python C extension module back to the Python interpreter.

Writing the Init Function

You’ve written the code that makes up the core functionality of your Python C extension module. However, there are still a few extra functions that are necessary to get your module up and running. You’ll need to write definitions of your module and the methods it contains, like so:

static PyMethodDef FputsMethods[] = { {"fputs", method_fputs, METH_VARARGS, "Python interface for fputs C library function"}, {NULL, NULL, 0, NULL} }; static struct PyModuleDef fputsmodule = { PyModuleDef_HEAD_INIT, "fputs", "Python interface for the fputs C library function", -1, FputsMethods };

These functions include meta information about your module that will be used by the Python interpreter. Let’s go through each of the structs above to see how they work.

PyMethodDef

In order to call the methods defined in your module, you’ll need to tell the Python interpreter about them first. To do this, you can use PyMethodDef. This is a structure with 4 members representing a single method in your module.

Ideally, there will be more than one method in your Python C extension module that you want to be callable from the Python interpreter. This is why you need to define an array of PyMethodDef structs:

static PyMethodDef FputsMethods[] = { {"fputs", method_fputs, METH_VARARGS, "Python interface for fputs C library function"}, {NULL, NULL, 0, NULL} };

Each individual member of the struct holds the following info:

  • "fputs" is the name the user would write to invoke this particular function.

  • method_fputs is the name of the C function to invoke.

  • METH_VARARGS is a flag that tells the interpreter that the function will accept two arguments of type PyObject*:

    1. self is the module object.
    2. args is a tuple containing the actual arguments to your function. As explained previously, these arguments are unpacked using PyArg_ParseTuple().

  • The final string is a value to represent the method docstring.
PyModuleDef

Just as PyMethodDef holds information about the methods in your Python C extension module, the PyModuleDef struct holds information about your module itself. It is not an array of structures, but rather a single structure that’s used for module definition:

static struct PyModuleDef fputsmodule = { PyModuleDef_HEAD_INIT, "fputs", "Python interface for the fputs C library function", -1, FputsMethods };

There are a total of 9 members in this struct, but not all of them are required. In the code block above, you initialize the following five:

  1. PyModuleDef_HEAD_INIT is a member of type PyModuleDef_Base, which is advised to have just this one value.

  2. "fputs" is the name of your Python C extension module.

  3. The string is the value that represents your module docstring. You can use NULL to have no docstring, or you can specify a docstring by passing a const char * as shown in the snippet above. It is of type Py_ssize_t. You can also use PyDoc_STRVAR() to define a docstring for your module.

  4. -1 is the amount of memory needed to store your program state. It’s helpful when your module is used in multiple sub-interpreters, and it can have the following values:

    • A negative value indicates that this module doesn’t have support for sub-interpreters.
    • A non-negative value enables the re-initialization of your module. It also specifies the memory requirement of your module to be allocated on each sub-interpreter session.

  5. FputsMethods is the reference to your method table. This is the array of PyMethodDef structs you defined earlier.

For more information, check out the official Python documentation on PyModuleDef.

PyMODINIT_FUNC

Now that you’ve defined your Python C extension module and method structures, it’s time to put them to use. When a Python program imports your module for the first time, it will call PyInit_fputs():

PyMODINIT_FUNC PyInit_fputs(void) { return PyModule_Create(&fputsmodule); }

PyMODINIT_FUNC does 3 things implicitly when stated as the function return type:

  1. It implicitly sets the return type of the function as PyObject*.
  2. It declares any special linkages.
  3. It declares the function as extern “C.” In case you’re using C++, it tells the C++ compiler not to do name-mangling on the symbols.

PyModule_Create() will return a new module object of type PyObject *. For the argument, you’ll pass the address of the method structure that you’ve already defined previously, fputsmodule.

Note: In Python 3, your init function must return a PyObject * type. However, if you’re using Python 2, then PyMODINIT_FUNC declares the function return type as void.

Putting It All Together

Now that you’ve written the necessary parts of your Python C extension module, let’s take a step back to see how it all fits together. The following diagram shows the components of your module and how they interact with the Python interpreter:

When you import your Python C extension module, PyInit_fputs() is the first method to be invoked. However, before a reference is returned to the Python interpreter, the function makes a subsequent call to PyModule_Create(). This will initialize the structures PyModuleDef and PyMethodDef, which hold meta information about your module. It makes sense to have them ready since you’ll make use of them in your init function.

Once this is complete, a reference to the module object is finally returned to the Python interpreter. The following diagram shows the internal flow of your module:

The module object returned by PyModule_Create() has a reference to the module structure PyModuleDef, which in turn has a reference to the method table PyMethodDef. When you call a method defined in your Python C extension module, the Python interpreter uses the module object and all of the references it carries to execute the specific method. (While this isn’t exactly how the Python interpreter handles things under the hood, it’ll give you an idea of how it works.)

Similarly, you can access various other methods and properties of your module, such as the module docstring or the method docstring. These are defined inside their respective structures.

Now you have an idea of what happens when you call fputs() from the Python interpreter. The interpreter uses your module object as well as the module and method references to invoke the method. Finally, let’s take a look at how the interpreter handles the actual execution of your Python C extension module:

fputs Function Flow" />fputs Function Flow" />fputs Function Flow" />fputs Function Flow"/>

Once method_fputs() is invoked, the program executes the following steps:

  1. Parse the arguments you passed from the Python interpreter with PyArg_ParseTuple()
  2. Pass these arguments to fputs(), the C library function that forms the crux of your module
  3. Use PyLong_FromLong to return the value from fputs()

To see these same steps in code, take a look at method_fputs() again:

1 static PyObject *method_fputs(PyObject *self, PyObject *args) { 2 char *str, *filename = NULL; 3 int bytes_copied = -1; 4 5 /* Parse arguments */ 6 if(!PyArg_ParseTuple(args, "ss", &str, &filename)) { 7 return NULL; 8 } 9 10 FILE *fp = fopen(filename, "w"); 11 bytes_copied = fputs(str, fp); 12 fclose(fp); 13 14 return PyLong_FromLong(bytes_copied); 15 }

To recap, your method will parse the arguments passed to your module, send them on to fputs(), and return the results.

Packaging Your Python C Extension Module

Before you can import your new module, you first need to build it. You can do this by using the Python package distutils.

You’ll need a file called setup.py to install your application. For this tutorial, you’ll be focusing on the part specific to the Python C extension module. For a full primer, check out How to Publish an Open-Source Python Package to PyPI.

A minimal setup.py file for your module should look like this:

from distutils.core import setup, Extension def main(): setup(name="fputs", version="1.0.0", description="Python interface for the fputs C library function", author="<your name>", author_email="your_email@gmail.com", ext_modules=[Extension("fputs", ["fputsmodule.c"])]) if __name__ == "__main__": main()

The code block above shows the standard arguments that are passed to setup(). Take a closer look at the last positional argument, ext_modules. This takes a list of objects of the Extensions class. An object of the Extensions class describes a single C or C++ extension module in a setup script. Here, you pass two keyword arguments to its constructor, namely:

  • name is the name of the module.
  • [filename] is a list of paths to files with the source code, relative to the setup script.
Building Your Module

Now that you have your setup.py file, you can use it to build your Python C extension module. It’s strongly advised that you use a virtual environment to avoid conflicts with your Python environment.

Navigate to the directory containing setup.py and run the following command:

$ python3 setup.py install

This command will compile and install your Python C extension module in the current directory. If there are any errors or warnings, then your program will throw them now. Make sure you fix these before you try to import your module.

By default, the Python interpreter uses clang for compiling the C code. If you want to use gcc or any other C compiler for the job, then you need to set the CC environment variable accordingly, either inside the setup script or directly on the command line. For instance, you can tell the Python interpreter to use gcc to compile and build your module this way:

$ CC=gcc python3 setup.py install

However, the Python interpreter will automatically fall back to gcc if clang is not available.

Running Your Module

Now that everything is in place, it’s time to see your module in action! Once it’s successfully built, fire up the interpreter to test run your Python C extension module:

>>>>>> import fputs >>> fputs.__doc__ 'Python interface for the fputs C library function' >>> fputs.__name__ 'fputs' >>> # Write to an empty file named `write.txt` >>> fputs.fputs("Real Python!", "write.txt") 13 >>> with open("write.txt", "r") as f: >>> print(f.read()) 'Real Python!'

Your function performs as expected! You pass a string "Real Python!" and a file to write this string to, write.txt. The call to fputs() returns the number of bytes written to the file. You can verify this by printing the contents of the file.

Also recall how you passed certain arguments to the PyModuleDef and PyMethodDef structures. You can see from this output that Python has used these structures to assign things like the function name and docstring.

With that, you have a basic version of your module ready, but there’s a lot more that you can do! You can improve your module by adding things like custom exceptions and constants.

Raising Exceptions

Python exceptions are very different from C++ exceptions. If you want to raise Python exceptions from your C extension module, then you can use the Python API to do so. Some of the functions provided by the Python API for exception raising are as follows:

Function Description PyErr_SetString(PyObject *type,
     const char *message) Takes two arguments: a PyObject * type argument specifying the type of exception, and a custom message to display to the user PyErr_Format(PyObject *type,
     const char *format) Takes two arguments: a PyObject * type argument specifying the type of exception, and a formatted custom message to display to the user PyErr_SetObject(PyObject *type,
     PyObject *value) Takes two arguments, both of type PyObject *: the first specifies the type of exception, and the second sets an arbitrary Python object as the exception value

You can use any of these to raise an exception. However, which to use and when depends entirely on your requirements. The Python API has all the standard exceptions pre-defined as PyObject types.

Raising Exceptions From C Code

While you can’t raise exceptions in C, the Python API will allow you to raise exceptions from your Python C extension module. Let’s test this functionality by adding PyErr_SetString() to your code. This will raise an exception whenever the length of the string to be written is less than 10 characters:

1 static PyObject *method_fputs(PyObject *self, PyObject *args) { 2 char *str, *filename = NULL; 3 int bytes_copied = -1; 4 5 /* Parse arguments */ 6 if(!PyArg_ParseTuple(args, "ss", &str, &fd)) { 7 return NULL; 8 } 9 10 if (strlen(str) < 10) { 11 PyErr_SetString(PyExc_ValueError, "String length must be greater than 10"); 12 return NULL; 13 } 14 15 fp = fopen(filename, "w"); 16 bytes_copied = fputs(str, fp); 17 fclose(fp); 18 19 return PyLong_FromLong(bytes_copied); 20 }

Here, you check the length of the input string immediately after you parse the arguments and before you call fputs(). If the string passed by the user is shorter than 10 characters, then your program will raise a ValueError with a custom message. The program execution stops as soon as the exception occurs.

Note how method_fputs() returns NULL after raising the exception. This is because whenever you raise an exception using PyErr_*(), it automatically sets an internal entry in the exception table and returns it. The calling function is not required to subsequently set the entry again. For this reason, the calling function returns a value that indicates failure, usually NULL or -1. (This should also explain why there was a need to return NULL when you parse arguments in method_fputs() using PyArg_ParseTuple().)

Raising Custom Exceptions

You can also raise custom exceptions in your Python C extension module. However, things are a bit different. Previously, in PyMODINIT_FUNC, you were simply returning the instance returned by PyModule_Create and calling it a day. But for your custom exception to be accessible by the user of your module, you need to add your custom exception to your module instance before you return it:

static PyObject *StringTooShortError = NULL; PyMODINIT_FUNC PyInit_fputs(void) { /* Assign module value */ PyObject *module = PyModule_Create(&fputsmodule); /* Initialize new exception object */ StringTooShortError = PyErr_NewException("fputs.StringTooShortError", NULL, NULL); /* Add exception object to your module */ PyModule_AddObject(module, "StringTooShortError", StringTooShortError); return module; }

As before, you start off by creating a module object. Then you create a new exception object using PyErr_NewException. This takes a string of the form module.classname as the name of the exception class that you wish to create. Choose something descriptive to make it easier for the user to interpret what has actually gone wrong.

Next, you add this to your module object using PyModule_AddObject. This takes your module object, the name of the new object being added, and the custom exception object itself as arguments. Finally, you return your module object.

Now that you’ve defined a custom exception for your module to raise, you need to update method_fputs() so that it raises the appropriate exception:

1 static PyObject *method_fputs(PyObject *self, PyObject *args) { 2 char *str, *filename = NULL; 3 int bytes_copied = -1; 4 5 /* Parse arguments */ 6 if(!PyArg_ParseTuple(args, "ss", &str, &fd)) { 7 return NULL; 8 } 9 10 if (strlen(str) < 10) { 11 /* Passing custom exception */ 12 PyErr_SetString(StringTooShortError, "String length must be greater than 10"); 13 return NULL; 14 } 15 16 fp = fopen(filename, "w"); 17 bytes_copied = fputs(str, fp); 18 fclose(fp); 19 20 return PyLong_FromLong(bytes_copied); 21 }

After building the module with the new changes, you can test that your custom exception is working as expected by trying to write a string that is less than 10 characters in length:

>>>>>> import fputs >>> # Custom exception >>> fputs.fputs("RP!", fp.fileno()) Traceback (most recent call last): File "<stdin>", line 1, in <module> fputs.StringTooShortError: String length must be greater than 10

When you try to write a string with fewer than 10 characters, your custom exception is raised with a message explaining what went wrong.

Defining Constants

There are cases where you’ll want to use or define constants in your Python C extension module. This is quite similar to how you defined custom exceptions in the previous section. You can define a new constant and add it to your module instance using PyModule_AddIntConstant():

PyMODINIT_FUNC PyInit_fputs(void) { /* Assign module value */ PyObject *module = PyModule_Create(&fputsmodule); /* Add int constant by name */ PyModule_AddIntConstant(module, "FPUTS_FLAG", 64); /* Define int macro */ #define FPUTS_MACRO 256 /* Add macro to module */ PyModule_AddIntMacro(module, FPUTS_MACRO); return module; }

This Python API function takes the following arguments:

  • The instance of your module
  • The name of the constant
  • The value of the constant

You can do the same for macros using PyModule_AddIntMacro():

PyMODINIT_FUNC PyInit_fputs(void) { /* Assign module value */ PyObject *module = PyModule_Create(&fputsmodule); /* Add int constant by name */ PyModule_AddIntConstant(module, "FPUTS_FLAG", 64); /* Define int macro */ #define FPUTS_MACRO 256 /* Add macro to module */ PyModule_AddIntMacro(module, FPUTS_MACRO); return module; }

This function takes the following arguments:

  • The instance of your module
  • The name of the macro that has already been defined

Note: If you want to add string constants or macros to your module, then you can use PyModule_AddStringConstant() and PyModule_AddStringMacro(), respectively.

Open up the Python interpreter to see if your constants and macros are working as expected:

>>>>>> import fputs >>> # Constants >>> fputs.FPUTS_FLAG 64 >>> fputs.FPUTS_MACRO 256

Here, you can see that the constants are accessible from within the Python interpreter.

Testing Your Module

You can test your Python C extension module just as you would any other Python module. This can be demonstrated by writing a small test function for pytest:

import fputs def test_copy_data(): content_to_copy = "Real Python!" bytes_copied = fputs.fputs(content_to_copy, 'test_write.txt') with open('test_write.txt', 'r') as f: content_copied = f.read() assert content_copied == content_to_copy

In the test script above, you use fputs.fputs() to write the string "Real Python!" to an empty file named test_write.txt. Then, you read in the contents of this file and use an assert statement to compare it to what you had originally written.

You can run this test suite to make sure your module is working as expected:

$ pytest -q test_fputs.py [100%] 1 passed in 0.03 seconds

For a more in-depth introduction, check out Getting Started With Testing in Python.

Considering Alternatives

In this tutorial, you’ve built an interface for a C library function to understand how to write Python C extension modules. However, there are times when all you need to do is invoke some system calls or a few C library functions, and you want to avoid the overhead of writing two different languages. In these cases, you can use Python libraries such as ctypes or cffi.

These are Foreign Function libraries for Python that provide access to C library functions and data types. Though the community itself is divided as to which library is best, both have their benefits and drawbacks. In other words, either would make a good choice for any given project, but there are a few things to keep in mind when you need to decide between the two:

  • The ctypes library comes included in the Python standard library. This is very important if you want to avoid external dependencies. It allows you to write wrappers for other languages in Python.

  • The cffi library is not yet included in the standard library. This might be a dealbreaker for your particular project. In general, it’s more Pythonic in nature, but it doesn’t handle preprocessing for you.

For more information on these libraries, check out Extending Python With C Libraries and the “ctypes” Module and Interfacing Python and C: The CFFI Module.

Note: Apart from ctypes and cffi, there are various other tools available. For instance, you can also use swig and boost::Py.

Conclusion

In this tutorial, you’ve learned how to write a Python interface in the C programming language using the Python API. You wrote a Python wrapper for the fputs() C library function. You also added custom exceptions and constants to your module before building and testing it.

The Python API provides a host of features for writing complex Python interfaces in the C programming language. At the same time, libraries such as cffi or ctypes can lower the amount of overhead involved in writing Python C extension modules. Make sure you weigh all the factors before making a decision!

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Catalin George Festila: Python 3.7.4 : Example with subprocess - part 001.

Planet Python - Mon, 2019-10-07 09:16
This is a simple example with the python 3 subprocess package. The source code is simple to understand. The execute_proceess_with_communicate let run the ls command with the sudo user permissions: import os import sys import string import subprocess import codecs inp = '' cmd = 'ls' password = '' def execute_proceess_with_communicate(inp): """Return a list of hops from traceroute command.""
Categories: FLOSS Project Planets

Ned Batchelder: Sponsor me on GitHub?

Planet Python - Mon, 2019-10-07 08:52

tl;dr: You can sponsor me on GitHub, but I’m not sure why you would.

In May, GitHub launched GitHub Sponsors, a feature on their site for people to support each other financially. It’s still in beta, but now I’m in the program, so you can sponsor me if you want.

I’m very interested in the question of how the creators of open source software can benefit more from what they create, considering how much value others get from it.

To be honest, I’m not sure GitHub Sponsors is going to make a big difference. It’s another form of what I’ve called an internet tip jar: it focuses on one person giving another person money. Don’t get me wrong: I’m all for enabling interpersonal connections of all sorts. But I don’t think that will scale to improve the situation meaningfully.

I think a significant shift will only come with a change in how businesses give back to open source, since they are the major beneficiaries. See my post about Tidelift and “Corporations and open source, why and how” for more about this.

I’m participating in GitHub Sponsors because I want to try every possible avenue. Since it’s on GitHub, it will get more attention than most tip jars, so maybe it will work out differently. Participating is a good way for me to understand it.

GitHub lets me define tiers of sponsorship, with different incentives, similar to Kickstarter. I don’t know what will motivate people, and I don’t have existing incentives at my fingertips to offer, so I’ve just created three generic tiers ($3, $10, $30 per month). If GitHub Sponsors appeals to you, let me know what I could do with a tier that might attract other people.

The question mark in the title is not because I’m making a request of you. It’s because I’m uncertain whether and why people will become sponsors through GitHub Sponsors. We’ll see what happens.

Categories: FLOSS Project Planets

Akademy 2019 in Milan

Planet KDE - Mon, 2019-10-07 06:00

Last month I attended KDE’s annual gathering Akademy, which took place at the University of Bicocca in Milan, Italy. Never before had I been to an Akademy where I was interested in so many workshops and discussions that I hardly wrote any actual code.

It’s important to stay healthy during a conference – a bunch of KDE developers taking a swim in Lake Como

I arrived quite late on Friday evening, just in time before the kitchen closed at the welcome event. As usual, Saturday and Sunday were packed with presentations: after Lydia welcomed us at the conference, Lars Knoll, CTO of The Qt Company, gave a presentation on their plans for Qt 6. We were glad to hear that Qt is moving towards using CMake as build system which is what KDE has been using for over a decade now. Another point that got us excited was that they’re aiming to provide a unified styling engine for Qt Widgets and Qt Quick. Currently, we try to fill that gap with our qqc2-desktop-style Framework which works pretty well but also has its shortcomings since it just uses QStyle for painting widgets to a texture and none of QtQuick’s hardware-accelerated capabilities.

My talk (slides) this year was a quick rundown of Plasma’s new notification system. I showed some of its features, such as do not disturb mode, explained new APIs for application developers to use, and gave an outlook on what’s planned in the future. One of the ideas on the crazy side was to have Plasma Browser Integration help reduce duplicate notifications synced through KDE Connect. Often I have a dedicated app on my phone but just use the web version on my computer. What if, when a notification is synced from my phone, KDE Connect can ask my browser whether I have the respective website open and then filter out the notification since you probably got one from the browser already anyway?

Organized like a pro

There were so many interesting BoF sessions this year that I had to actually schedule where to go well in advance. Starting off the week early in the morning was a planning session on KDE Frameworks 6, where we came up with a giant work board of things to do. See David’s email for all details. Later that day I attended sessions on Snapcraft (we’re “all about the apps”, after all), openQA, GitLab, and KDE neon.

Tuesday morning I scheduled a BoF on notifications. It was a brainstorming session on how to improve notifications both for application developers and end users. The main focus was how to make the history more useful, what ways there are for applications to manage their notifications properly anytime, no matter if it is currently shown in a popup or two pages down the history. Also, we did some planning for a KNotificationV2 class with fewer dependencies, first-class QML bindings, and proper platform backends for Android, Windows, macOS, etc. See the meeting notes for more information.

The Plasma BoF afterwards was mainly about Wayland (screen rotation, window thumbnails, virtual keyboard improvements, stability), theming, System Settings reorganization, and Plasma Mobile. Following the successful Plasma 5.12 LTS and based on distro feedback we decided that Plasma 5.18 will be another LTS release. Check out the meeting notes, too.

Traditionally Wednesday afternoon is when Akademy attendees venture out explore the area they all traveled to. This year’s day trip went to the North to Varenna near Lake Como, where we hiked up to Vezio Castle, and took the ferry to the other side of the lake. Of course, with sunny weather and beautiful landscape around, we just had to take a swim in the lake.

Gorgeous view from Castello di Vezio

On Thursday morning there was a session on how to write custom KItinerary extractors. I’m a huge fan of KDE Itinerary and in the few hours between discussions I actually had to write some code, I moved forward my secret master plan to augment Plasma Browser Integration with Itinerary and structured data extraction capabilities. Stay tuned for a follow-up blog post on that. :)

Carl already worked on automatically extracting all Plasmoid configuration keys for the new sysadmin documentation page

In the afternoon we scheduled a two hour track for everything enterprise. The big topic right now is KIO Fuse which will finally allow non-KDE applications to seamlessly access network shares and other remote locations. We were also glad to hear our friends in Munich are quite happy with Plasma 5.12 LTS. The second hour was mostly spent on touching up our sysadmin documentation and we decided to migrate it to a Sphinx page, similar to our new HIG page. The idea is to use as much auto-generated content from actual config files and source code as possible to keep it from getting outdated that easily. Again, there’s some notes with more details.

Thanks to everyone involved in making this event possible! It has been a great week of discussion and planning with many new faces that I’m looking forward to seeing again in the future.

Categories: FLOSS Project Planets

Pages