Feeds

OSI at PyCon US: engaging with AI practitioners and developers as we reach OSAID’s first release candidate

Open Source Initiative - Wed, 2024-05-29 08:00

As part of the Open Source AI Definition roadshow and as we approach the first release candidate of the draft, the Open Source Initiative (OSI) participated at PyCon US 2024, the annual gathering of the Python community. This opportunity was important because PyCon US brings together AI practitioners and developers alike, and having their input regarding what constitutes Open Source AI is of most value. The OSI organized a workshop and had a community booth there.

OSAID Workshop: compiling a FAQ to make the definition clear and easy to use

The OSI has embarked on a co-design process with multiple stakeholders to arrive at the Open Source AI Definition (OSAID). This process has been led by Mer Joyce, the co-design expert and facilitator, and Stefano Maffulli, the executive director of the OSI.

At the workshop organized at PyCon US, Mer provided an overview of the co-design process so far, summarized below.

The first step of the co-design process was to identify the freedoms needed for Open Source AI. After various online and in-person activities and discussions, including five workshops across the world, the community identified four freedoms:

  1. To Use the system for any purpose and without having to ask for permission.
  2. To Study how the system works and inspect its components.
  3. To Modify the system for any purpose, including to change its output.
  4. To Share the system for others to use with or without modifications, for any purpose.

The next step was to form four working groups to initially analyze four AI systems. To achieve better representation, special attention was given to diversity, equity and inclusion. Over 50% of the working group participants are people of color, 30% are black, 75% were born outside the US and 25% are women, trans and nonbinary.

These working groups discussed and voted on which AI system components should be required to satisfy the four freedoms for AI. The components we adopted are described in the Model Openness Framework developed by the Linux Foundation.

The vote compilation was performed based on the mean total votes per component (μ). Components which received over 2μ votes were marked as required and between 1.5μ and 2μ were marked likely required. Components that received between 0.5μ and μ were marked likely not required and less than 0.5μ as not required.

The working groups evaluated legal frameworks and legal documents for each component. Finally, each working group published a recommendation report. The end result is the OSAID with a comprehensive definition checklist encompassing a total of 17 components. More working groups are being formed to evaluate how well other AI systems align with the definition.

OSAID multi-stakeholder process: from component list to a definition checklist

After providing an overview of the co-design process, Mer went on to organize an exercise with the participants to compile a FAQ.

The questions raised at the workshop revolved around the following topics:

  • End user comprehension: how and why are AI systems different from Open Source software? As an end-user, why should they care if an AI system is open?
  • Datasets: Why is data itself not required? Should Open Source AI datasets be required to prove copyright compliance? How can one audit these systems for bias without the data? What does data provenance and data labeling entail?
  • Models: How can proper attribution of model parameters be enforced? What is the ownership/attribution of model parameters which were trained by one author and then “fine-tuned” by another?
  • Code: Can projects that include only source code (no data info or model weights) still use a regular Open Source license (MIT, Apache, etc.)?
  • Governance: For a specific AI, who determines whether the information provided about the training, dataset, process, etc. is “sufficient” and how?
  • Adoption of the OSAID: What are incentives for people/companies to adopt this standard?
  • Legal weight: Is the OSAID supposed to have legal weight?

These questions and answers raised at the workshop will be important for enhancing the existing FAQ, which will be made available along with the OSAID.

OSAID workshop: a collection of post-its with questions raised by participants. Community Booth: gathering feedback on the “Unlock the OSAID” visualization

At the community booth, the OSI held two activities to draw in participants interested in Open Source AI. The first activity was a quiz developed by Ariel Jolo, program coordinator at the OSI, to assess participants’ knowledge of  Python and AI/ML. Once we had an understanding of their skills, we went on to the second and main activity, which was to gather feedback on the OSAID using a novel way to visualize how different AI systems match the current draft definition as described below.

Making it easy for different stakeholders to visualize whether or not an AI system matches the OSAID is a challenge, especially because there are so many components involved. This is where the visualization concept we named “Unlock the OSAID” came in. 

The OSI keyhole is a well recognized logo that represents the source code that unlocks the freedoms to use, study, modify, and share software. With the Unlock the OSAID, we played on that same idea, but now for AI systems. We displayed three keyholes representing the three domains these 17 components fall within: code, model and data information.

Here is the image representing the “code keyhole” with the required components to unlock the OSAID:

On the inner ring we have the required components to unlock the OSAID, while on the outer ring we have optional components. The required code components are: libraries and tools; inference; training, validation and testing; data pre-processing. The optional components are: inference for benchmark and evaluation code. 

To fully unlock the OSAID, an AI system must have all the required components for code, model and data information. To better understand how the “Unlock the OSAID” visualization works, let’s look at two hypothetical AI systems: example 1 and example 2.

Let’s start looking at example 1 (in red) and see if this system unlocks the OSAID for code:

Example 1 only provides inference code, so the key (in red) doesn’t “fit” the code keyhole (in green).

Now let’s look at example 2 (in blue):

Example 2 provides all required components (and more), so the key (in blue) fits the code keyhole (in green). Therefore, example 2 unlocks the OSAID for code. For example 2 to be considered Open Source AI, it would also have to unlock the OSAID for model and data information: 

We received good feedback from participants about the “Unlock the OSAID” visualization. Once participants grasped the concept of the keyholes and which components were required or optional, it was easy to identify if an AI system unlocks the OSAID or not. They could visually see if the keys fit the keyholes or not. If all keys fit, then that AI system adheres to the OSAID.

Final thoughts: engaging with the community and promoting Open Source principles

For me, the highlight of PyCon US was the opportunity to finally meet members of the OSI and the Python community in person, both new and old acquaintances. I had good conversations with Deb Nicholson (Python Software Foundation), Hannah Aubry (Fastly), Ana Hevesi (Uploop), Tom “spot” Callaway (AWS), Julia Ferraioli (AWS), Tony Kipkemboi (Streamlit), Michael Winser (Alpha-Omega), Jason C. MacDonald (OWASP), Cheuk Ting Ho (CMD Limes), Kamile Demir (Adobe), Mariatta Wijaya (PSF), Loren Clary (PSF) and Miaolai Zhou (AWS). I also interacted with many folks from the following communities: Python Brazil, Python en Español, PyLadies and Black Python Devs. It was great to bump into great legends like Seth Larson (PSF), Peter Wang (Anaconda) and Guido van Rossum.

I loved all the keynotes, in particular from Sumana Harihareswara about how she has improved Python Software Foundation’s infrastructure, and from Simon Willison about how we can all benefit from Open Source AI.

We also had a special dinner hosted by Stefano to celebrate this special milestone of the OSAID, with Stefano, Mer and I overlooking Pittsburgh.

Overall, our participation at PyCon US was a success. We shared the work OSI has been doing toward the first release candidate of the Open Source AI Definition, and we did it in an entertaining and engaging way, with plenty of connection throughout.

Photo credits: Ana Hevesi, Mer Joyce, and Nick Vidal

Categories: FLOSS Research

Talk Python to Me: #464: Seeing code flows and generating tests with Kolo

Planet Python - Wed, 2024-05-29 04:00
Do you want to look inside your Django request? How about all of your requests in development and see where they overlap? If that sounds useful, you should check out Kolo. It's a pretty incredible extension for your editor (VS Code at the moment, more editors to come most likely). We have Wilhelm Klopp on to tell us all about it.<br/> <br/> <strong>Episode sponsors</strong><br/> <br/> <a href='https://talkpython.fm/sentry'>Sentry Error Monitoring, Code TALKPYTHON</a><br> <a href='https://talkpython.fm/training'>Talk Python Courses</a><br/> <br/> <strong>Links from the show</strong><br/> <br/> <div><b>Wil on Twitter</b>: <a href="https://twitter.com/wilhelmklopp" target="_blank" rel="noopener">@wilhelmklopp</a><br/> <b>Kolo</b>: <a href="https://kolo.app" target="_blank" rel="noopener">kolo.app</a><br/> <b>Kolo's info repo</b>: <a href="https://github.com/kolofordjango/kolo" target="_blank" rel="noopener">github.com</a><br/> <b>Kolo Playground</b>: <a href="https://play.kolo.app/" target="_blank" rel="noopener">play.kolo.app</a><br/> <b>Generating tests with Kolo</b>: <a href="https://blog.kolo.app/tests-no-joy.html" target="_blank" rel="noopener">kolo.app</a><br/> <b>Watch this episode on YouTube</b>: <a href="https://www.youtube.com/watch?v=NV6IfmrDY44" target="_blank" rel="noopener">youtube.com</a><br/> <b>Episode transcripts</b>: <a href="https://talkpython.fm/episodes/transcript/464/seeing-code-flows-and-generating-tests-with-kolo" target="_blank" rel="noopener">talkpython.fm</a><br/> <br/> <b>--- Stay in touch with us ---</b><br/> <b>Subscribe to us on YouTube</b>: <a href="https://talkpython.fm/youtube" target="_blank" rel="noopener">youtube.com</a><br/> <b>Follow Talk Python on Mastodon</b>: <a href="https://fosstodon.org/web/@talkpython" target="_blank" rel="noopener"><i class="fa-brands fa-mastodon"></i>talkpython</a><br/> <b>Follow Michael on Mastodon</b>: <a href="https://fosstodon.org/web/@mkennedy" target="_blank" rel="noopener"><i class="fa-brands fa-mastodon"></i>mkennedy</a><br/></div>
Categories: FLOSS Project Planets

Drupal Starshot blog: Announcing Drupal Starshot sessions

Planet Drupal - Wed, 2024-05-29 03:12

A few weeks ago at DrupalCon Portland, I announced Drupal Starshot, a project to create the new default download of Drupal. Built on Drupal Core, Drupal Starshot will include popular features from the contributed project ecosystem. It focuses on delivering a great user experience right out of the box. Drupal Starshot builds on recent initiatives like Recipes, Project Browser, and Automatic Updates to elevate Drupal to new heights.

The response has been incredible! Hundreds of people have pledged their support on the Drupal Starshot page, and many more have asked how to get involved. Over the past few weeks, we have been planning and preparing, so I'm excited to share some next steps!

We're launching a series of sessions to get everyone up to speed and involved. These will be held as interactive Zoom calls, and the recordings will be shared publicly for everyone to watch at their convenience.

The main goal of these Zoom sessions is to help you get involved in each area. We'll cover details not included in my keynote, update you on our progress, and give you practical advice on where and how you can contribute.

We've scheduled six sessions, and we invite everyone to attend. The first one will be on this Friday on participation, funding, and governance! You can find the latest schedule online at https://www.drupal.org/starshot#sessions and the core calendar in the sidebar of the Drupal core news page.

We look forward to seeing you there and working together to make Drupal Starshot a success!
 

Categories: FLOSS Project Planets

Drupal Core News: Announcing Drupal Starshot sessions

Planet Drupal - Wed, 2024-05-29 03:12

A few weeks ago at DrupalCon Portland, I announced Drupal Starshot, a project to create the new default download of Drupal. Built on Drupal Core, Drupal Starshot will include popular features from the contributed project ecosystem. It focuses on delivering a great user experience right out of the box. Drupal Starshot builds on recent initiatives like Recipes, Project Browser, and Automatic Updates to elevate Drupal to new heights.

The response has been incredible! Hundreds of people have pledged their support on the Drupal Starshot page, and many more have asked how to get involved. Over the past few weeks, we have been planning and preparing, so I'm excited to share some next steps!

We're launching a series of sessions to get everyone up to speed and involved. These will be held as interactive Zoom calls, and the recordings will be shared publicly for everyone to watch at their convenience.

The main goal of these Zoom sessions is to help you get involved in each area. We'll cover details not included in my keynote, update you on our progress, and give you practical advice on where and how you can contribute.

We've scheduled six sessions, and we invite everyone to attend. The first one will be on this Friday on participation, funding, and governance! You can find the latest schedule online at https://www.drupal.org/starshot#sessions and the core calendar in the sidebar of the Drupal core news page.

We look forward to seeing you there and working together to make Drupal Starshot a success!
 

Categories: FLOSS Project Planets

simonbaese - blog: Drupal: Asynchronously send emails with Symfony Mailer Queue

Planet Drupal - Wed, 2024-05-29 02:24
Recently, we built a queue worker to send emails asynchronously, meeting a client's unique requirements to ensure email delivery. There is a lot of movement in the Drupal contribution space to innovate on the mailer. Traditionally, Drupal uses a plain PHP mailer to deliver transactional emails such as sign-up confirmation or password reset instructions. Nowadays, many websites rely on the contribution module Drupal Symfony Mailer to use the framework mailer by Symfony and leverage the flexible setup of mailer policies, transport, and HTML theming with templates. What needs to be added to the mix are easy-to-implement ways to send emails asynchronously. Today, we announce the first stable release of the new contribution module Symfony Mailer Queue.
Categories: FLOSS Project Planets

Opt Green: KDE Eco's New Sustainable Software Project

Planet KDE - Tue, 2024-05-28 20:00

Inspired by the successes of the "Blauer Engel Für FOSS" (BE4FOSS) project and KDE's ongoing Sustainable Software goal, KDE Eco has begun a new initiative: "Opt Green: Sustainable Software For Sustainable Hardware" (German: Nachhaltige Software Für Nachhaltige Hardware, or NS4NH).

Opt Green: Sustainable Software For Sustainable Hardware

By design, Free Software guarantees transparency and user autonomy. This gives you, the user, control of your hardware by removing unnecessary vendor dependencies. With Free Software, you're able to use your devices how you want, for as long as you want. There's no bloatware and you can block unwanted data use and ads from driving up energy demands and slowing down your device—while shutting the door to uninvited snooping in your private life as well. With software made for your needs and not the vendors', you can choose applications designed for the hardware you already own. Say goodbye to premature hardware obsolescence: lean, efficient Free Software runs on devices which are decades old!

Independent and sustainable Free Software is good for the users, and good for the environment.

Figure : The "Think Global, Act Local" campaign urged people to consider global health while taking action in their local communities. This new project urges people to do the same, but with computing. (Image from Karanjot Singh published under a CC-BY-4.0 license.)

Over the next two years, the "Opt Green" initiative will bring what KDE Eco has been doing for sustainable software directly to end users. A particular target group for the project is those whose consumer behavior is driven by principles related to the environment, and not just price or convenience: the "eco-consumers".

Through online and offline campaigns as well as installation workshops, we will demonstrate the power of Free Software to drive down resource and energy consumption, and keep devices in use for the lifespan of the hardware, not the software.

Our motto: The most environmentally-friendly device is the one you already own.

The topic of software-driven sustainability is relevant for all Free Software applications and developers. We'd love to have you join us and become partners in combatting the issue of software-driven environmental harm. Check out the project's Invent repository or the contact page to get involved today!

Software's Environmental Harm

On 14 October 2025, the end of support for Windows 10 is estimated to make e-waste out of 240 million computers ineligible for the upgrade to Windows 11. Moreover, macOS support for Intel-based Apple computers—the last of which were sold in 2020—is predicted to end (at the earliest) one year later in 2026, rendering even more millions upon millions of functioning devices obsolete. When users have no control over the software they rely on, they are left at a security risk when software support ends ... unless, of course, they purchase a new computer. (By comparison, consider that only in 2022 did Linus Torvalds first suggest ending Linux kernel support for Intel 486 processors from 1989. That's 33 years of support!)

Vendors frequently require buying a new device to support software updates. All too often, this is driven by economic imperatives rather than technological requirements. Moreover, while new hardware has become more and more powerful, new software offering similar or identical functionality has frequently become less efficient and more energy-intensive, which has rendered older, less powerful devices useless.

Already in 2015 Achim Steiner, former Executive Director of the UN Environment Programme (UNEP), warned of the "tsunami of electronic waste rolling out over the world".

In 2016, 44.7 million tonnes of e-waste were generated, estimated to be equivalent to 4500 Eiffel Towers. If you were to stack those Eiffel Towers on top of each other, the result would be 17 times higher than Mount Everest.

By 2017, United Nations University determined e-waste to be the fastest growing waste stream in the world.

In 2022, the amount of e-waste reached 59.4 million tonnes, a 33% increase since 2016.

The flow of e-waste continues to rise today.

Figure : In 2016, 44.7 million metric tons of e-waste was generated. This is estimated to be equivalent to 4,500 Eiffel Towers, which, when stacked, is 17 times higher than Mount Everest. (Image from KDE published under a CC-BY-SA-4.0 license. Design by Anita Sengupta.)

Software is a frequently unacknowledged yet significant factor for sustainability. Software determines a hardware's energy consumption and minimum system requirements. It determines how long a device can remain safely in use. With software running on everyday devices, from coffee machines to smartphones, from trains to drones, the role of software in keeping functioning hardware in use and out of the landfill grows more critical every day.

For consumers, the environmental harm may be out-of-sight and out-of-mind. Yet the environment is registering its effects, from the CO2 pumped into the atmosphere to the landfills that receive our discarded devices at their end of life, and the air, soil, and waters around them—not to mention the people and animals.

Figure : A young man is pictured burning electrical wires to recover copper at Agbogbloshie, Ghana, as another metal scrap worker arrives with more wires to be burned. A 2018 article in the "International Journal of Cancer" reports a correlation between proximity to e-waste burn sites and childhood lymphoma. (Image by Muntaka Chasant, published under a CC-BY-SA-4.0 license.)

It is particularly devastating when you consider the environmental and social harm caused by e-waste, especially when e-waste is generated earlier than necessary because of premature obsolescence. The production and transportation of a device accounts for 50–80+% of its carbon footprint over its lifecycle. A German Environment Report estimates you’d need to use a computer for over 30 years before efficiency gains in newly-produced devices justify their purchase.

Furthermore, the extraction of rare earth metals in production consumes copious amounts of energy and takes place under miserable social conditions, often in the Global South. For disposal, devices are typically returned to the Global South for end-of-life treatment, where they pollute the environment as toxic waste and cause enormous damage to workers' health or even death.

Figure : Apple's carbon footprint. From Apple (2019), "Environmental Responsibility Report: 2019 Progress Report, covering fiscal year 2018". (Image from KDE published under a CC-BY-SA-4.0 license. Design by Anita Sengupta.) Giving Consumers What They Want

Globally, interest in environmental harm and sustainable goods has been rising steadily from 2015 to 2021. In Europe, a 2020 Eurobarometer poll found that 50% of consumers indicate that two reasons they purchase new devices are performance issues and non-functioning software, and 8 in 10 consumers believe manufacturers should be required to make it easier to repair digital devices.

Free Software already gives consumers what they want, but most don't know it yet. Transparency in code makes lightweight, highly performative software possible, even on much older devices, while user autonomy ensures the right to repair when applications stop functioning.

Figure : KDE’s popular multi-platform PDF reader and universal document viewer Okular was awarded the Blue Angel ecolabel in 2022. (Image from KDE published under a CC-BY-4.0 license.)

In fact, the Blue Angel criteria for desktop software are at the forefront in recognizing the critical role of transparency and user autonomy in sustainable software design. From 2021-2023, the KDE Eco project "Blauer Engel Für FOSS" (BE4FOSS) had the goal of collecting and spreading information about the Blue Angel ecolabel among developer communities. In 2022, KDE’s popular PDF and universal document reader Okular became the first ever Blue Angel eco-certified software! The BE4FOSS project culminated with the KDE Eco handbook "Applying The Blue Angel Criteria To Free Software", which you can read here. KDE's Sustainable Software goal has continued this work by developing emulation tools like KdeEcoTest and Selenium AT-SPI to measure software's energy consumption in KDE's KEcoLab.

Now we want to take what we have achieved and bring it directly to eco-consumers.

Through educational campaigns and workshops, the "Opt Green" project aims to combat e-waste by keeping hardware in use with Free Software. Although the problem of software-driven e-waste is relevant for an increasing number of digital devices, the focus will be on desktop PCs, laptops, and, when possible, smartphones and tablets. We are planning to set up info-stands at fair-trade, organic, and artisanal markets as well as sustainability festivals such as the Umweltfestival in Berlin. We will distribute leaflets to consumers, and demo vendor-abandoned devices which are not only usable, but also a joy to use thanks to the tireless work of inspiring Free Software communities. Installation workshops will give users the know-how to keep their devices in use for as long as they want.

Consumers don’t need a new computer to get secure, cutting-edge software; they just need the right software. Free Software already gives consumers what they want today, and we will be working hard to make sure they know that.

Ready To Join Us?

Consumers want sustainable and repairable digital devices. We believe that providing users the software to keep devices in use and out of the landfill will drive demand for Free Software products and enable long-term hardware use.

Do you want to join us in this movement to combat e-waste with Free Software? See our contact info to get involved.

We need volunteers like you to bring the "Opt Green" campaign to towns and cities around the world. We need volunteers like you to design engaging guides and beautiful materials for global distribution. We need volunteers like you to report on the project in magazines and newspapers. Let's work together to bring sustainable software to your community!

Maybe you are interested in contributing to the development of measurement tools like KdeEcoTest and Selenium AT-SPI or improving KEcoLab automation? Or using such tools to measure your software application's energy consumption? Let's collaborate to make energy transparency a part of Free Software development today!

Or maybe you actively contribute to a Free Software project that will keep hardware in use for longer. Please be in touch! We want to promote the amazing work you do directly with consumers.

Additional ideas are more than welcome. Part of the project will be figuring out what works and engagement by people like you will make this project a success. We would love to have you join us. Learn more: https://eco.kde.org/get-involved/

Funding Notice

The NS4NH project is funded by the Federal Environment Agency and the Federal Ministry for the Environment, Nature Conservation, Nuclear Safety and Consumer Protection (BMUV1). The funds are made available by resolution of the German Bundestag.

The publisher is responsible for the content of this publication.

Categories: FLOSS Project Planets

Python Morsels: Equality versus identity in Python

Planet Python - Tue, 2024-05-28 19:52

Equality checks whether two objects represent the same value. Identity checks whether two variables point to the same object.

Table of contents

  1. The equality operator in Python
  2. The is operator in Python
  3. How equality and identity work differently?
  4. Inequality and non-identity operators
  5. Where are identity checks used?
  6. Equality vs. Identity

The equality operator in Python

Let's say we have two variables that point to two lists:

>>> a = [2, 1, 3] >>> b = [2, 1, 3, 4]

When we use the == operator to check whether these lists are equal, we'll see that they are not equal:

>>> a == b False

These lists don't have the same values right now, so they're not equal.

Let's update the first list so that these two lists do have equivalent values:

>>> a.append(4) >>> a [2, 1, 3, 4]

If we use == again, we'll see that these lists are equal now:

>>> a == b True

Python's == operator checks for equality. Two objects are equal if they represent the same data.

The is operator in Python

Python also has an is …

Read the full article: https://www.pythonmorsels.com/equality-vs-identity/
Categories: FLOSS Project Planets

Russell Coker: Creating a Micro Users’ Group

Planet Debian - Tue, 2024-05-28 18:08

Fosdem had a great lecture Building an Open Source Community One Friend at a Time [1]. I recommend that everyone who is involved in the FOSS community watches this lecture to get some ideas.

For some time I’ve been periodically inviting a few friends to visit for lunch, chat about Linux, maybe do some coding, and watch some anime between coding. It seems that I have accidentally created a micro users’ group.

LUGs were really big in the mid to late 90s and still quite vibrant in the early 2000’s. But they seem to have decreased in popularity even before Covid19 and since Covid19 a lot of people have stopped attending large meetings to avoid health risks. I think that a large part of the decline of users’ groups has been due to the success of YouTube. Being able to choose from thousands of hours of lectures about computers on YouTube is a disincentive to spending the time and effort needed to attend a meeting with content that’s probably not your first choice of topic. Attending a formal meeting where someone you don’t know has arranged a lecture might not have a topic that’s really interesting to you. Having lunch with a couple of friends and watching a YouTube video that one of your friends assures you is really good is something more people will find interesting.

In recent times homeschooling [2] has become more widely known. The same factors that allow learning about computers at home also make homeschooling easier. The difference between the traditional LUG model of having everyone meet at a fixed time for a lecture and a micro LUG of a small group of people having an informal meeting is similar to the difference between traditional schools and homeschooling.

I encourage everyone to create their own micro LUG. All you have to do is choose a suitable time and place and invite some people who are interested. Have a BBQ in a park if the weather is good, meet at a cafe or restaurant, or invite people to visit you for lunch on a weekend.

Related posts:

  1. Creating a Micro Conference The TEDxVolcano The TED conference franchise has been extended to...
  2. BLUG This weekend I went to the Ballarat install-fest, mini-conf, and...
  3. Recruiting at a LUG Meeting I’m at the main meeting of Linux Users of Victoria...
Categories: FLOSS Project Planets

Trey Hunner: PyCon 2024 Reflection

Planet Python - Tue, 2024-05-28 16:00

I traveled back home from PyCon US 2024 last week. This is my reflection on my time at PyCon.

Attempting to eat vegan

Since 2020, I’ve been gradually eating more plant-based and a few months ago I decided to take PyCon as an opportunity to attempt exclusively vegan eating outside my own home. As I noted on Mastodon, it was a challenge and I failed every day at least once but I found the experience worthwhile. Our food system is very dairy-oriented.

Staying hydrated and fed

One of the first things I did before heading to the convention center was walk to Target and buy snacks and drinks. When at PyCon, I prefer to spend 30 minutes and $20 to have a backup plan for last minute hydration and calories (even if not the greatest calories). I never quite know when I might sleep through breakfast, find lunch lacking, or wish I’d eaten more dinner.

A tutorial, an orientation, a lightning talk, and open spaces

My responsibilities at PyCon this year included teaching a tutorial and helping run the Newcomer’s Orientation with Kojo and Sumana.

Yngve and Marie offered to act as teaching assistants during my tutorial and I was very grateful for their help! Rodrigo and Krishna also offered to TA just before my tutorial started and I was extra grateful to have even more help than I’d expected. The attendees were mostly better prepared than I expected they would be, which was also great. It’s always great to spend less time on setup and more time exploring Python together.

The newcomer’s orientation the next day went well. We kept it fairly brief and were able to address about 10 minutes of audience questions before the opening reception started.

Once my PyCon responsibilities completed, I invented a few more (light) responsibilities for myself. 😅 I signed up to give a lightning talk on how to give a lightning talk. They slotted it as the first talk of the first lightning talk session on Friday night. I kept this talk pretty much the same as the one I presented DjangoCon 2016. I could have made the transitions fancier, but I decided to embrace the idea of simplicity with the hope that audience members might think “look if that first speaker can give such a simple and succinct presentation, maybe I can too.”

On Saturday I ran an open space on Python Learning. Some of you showed up because you’re on my mailing list or you’re paying Python Morsels subscribers. Many folks showed up because the topic was interesting, either as a learner or as a teacher. I really enjoyed the round-table-style conversation we had.

I also ran a Cabo Card game open space during lunch on Sunday on the 4th floor rooftop. Cabo is my usual conference ice breaker game and I played it at least a few nights in The Westin lobby as well.

Seeing conference friends, old and new

For me, PyCon is largely about having conversations. The talks and tutorials are great for starting me thinking about an idea. The hallway track, open spaces, and meals are great for continuing conversations about those ideas (or other ideas).

My first morning in Pittsburgh, I chatted with Naomi Ceder and Reuven Lerner. I’m glad I ran into them before the conference kicked off because (as often happens at PyCon) I only very briefly saw either of them during the rest of PyCon!

After my tutorial that afternoon, I did dinner with Marie, Yngve, and Rodrigo at Rosewater Mediterranean (good vegan options, assuming you enjoy falafel and various sauces). As sometimes happens at PyCon, another PyCon attendee, Sachin, joined our table because we noticed him eating on his own at a table near us and invited him to join us.

On Saturday, Melanie, David, Jay, and I had a sort of mini San Diego Python study group reunion dinner before inviting folks to join us for Cabo and Knucklebones one night. The 4 of us originally met each other (along with Carol and other wonderful Python folks) at the San Diego Python study group about 10 years ago.

I had some wonderful conversations about ways to improve the Python documentation over dinner (at Nicky’s Thai) on Sunday night with so many docs-concerned folks who I highly respect. I’m really excited that Python has the documentation editorial board and I’m hopeful that that board, with the help of many others community members, will usher in big improvements to the documentation in the coming years.

I also met a number of Internet acquaintances IRL for the first time at PyCon. I met Tereza and Jessica, who I know from our work in the PSF Code of Conduct workgroup. I met Steve Lott, who I originally knew as a prolific question-answerer. I also met Hugo, a CPython core dev, the Python 3.14 & 3.15 release manager, and a social media user (which is how I’ve primarily interacted with him because the Internet is occasionally lovely). I was also very excited to meet many Python Morsels members as well as folks who know me through my weekly Python tips newsletter.

I was grateful to chat with Hynek and Al about creating talks, YouTube videos, and other online content. I also enjoyed chatting with Glyph a bit about our experiences consulting and training and (in hindsight) wished I’d planned an open space for either consultants or trainers, both of which have been held at PyCon before but it just takes someone to stick it on the open space board.

Many folks I only saw very briefly (I said a quick hi and bye to Andrew over lunch during the sprints) and some I didn’t see at all (Frank was at PyCon but we never ran into each other). Some I essentially saw through playing a few rounds of Cabo (Thomas and Ethan among many others). We also ran into at least 4 other PyCon attendees in the airport on Tuesday afternoon, including Bob and Julian, who it’s always a pleasure to see.

A Mastodon-oriented PyCon

On Thursday night I had the feeling that the number of Mastodon posts I saw on the #PyConUS hashtag was greater than the number of Twitter posts. I (very unscientifically) counted up the number of posts I was seeing on each and found that my perception was correct: Mastodon seemed to slightly overtake Twitter at PyCon this year.

Over dinner on Wednesday, I tried to convince Marie, Yngve, and Rodrigo to get Mastodon accounts just to follow the hashtag during PyCon. I succeeded: Marie and Yngve and Rodrigo!

Mastodon will never be the social media platform. Its decentralized nature is too much of a barrier for many folks. However, it does seem to be used by enough somewhat nerdy Python folks to now be one the most used social media platform for PyCon posting.

The talks

I ended up spending little time in the talks during PyCon. This wasn’t on purpose. I just happened to attend many open spaces, take personal breaks, and end up in hallway conversations often. I did see many of the lightning talks live, as well as Jay, Simon, and Sumana’s keynotes (all of them were exceptional) and the opening and closing remarks. I also watched a few talks from my hotel room while taking breaks.

While I’m often a bit light on my talk load at PyCon, I do recommend folks attend a good handful of live talks during PyCon, as Jon and others recommend. I wish I had seen more talks live. I also wish I had attended a few open spaces that I missed.

At any one time, I know that I’m always missing about 90% of what’s scheduled during PyCon (if you include the talks and the open spaces). That’s assuming I don’t ditch the conference entirely for a few hours and walk across a bridge or ride a funicular (neither of which I did, as I stuck around the venue the whole time this year). I am glad I saw, did, and talked about everything I did, but there’s always something I wish I’d seen/done!

The sprints

Thanks to the documentation dinner, I had a couple documentation-related ideas in mind on the first day of sprints. But I’m also really excited about the new Python REPL coming in Python 3.13 (in case you can’t tell from how much I talk about it), so I sprinted on that instead. Łukasz assigned me the task of researching keyboard shortcuts that the new REPL is missing (compared to the current one on Linux and Mac) so I spent some time researching that. I got to see the REPL running on Anthony’s laptop on Windows and I am so excited that Windows support will be included before 3.13.0 lands! 🎉

Partly inspired by Carol Willing’s PyCon preview message, I also thanked Pablo, Łukasz, and Lysandros in-person for all their work on the new Python REPL. 🤗

Until next year

I’ll be keynoting at PyOhio this year.

Besides PyOhio, I’m not sure whether I’ll make it to another conference until PyCon US next year. I’d love to attend all of them, but I do have work and personal goals that need accomplishing too!

I hope to see you at PyCon US 2025! In the meantime, if you’re wishing we’d exchanged contact details or met in-person, please feel free to stay in touch through Mastodon, LinkedIn, my weekly emails, YouTube, or Twitter.

Categories: FLOSS Project Planets

PyCoder’s Weekly: Issue #631 (May 28, 2024)

Planet Python - Tue, 2024-05-28 15:30

#631 – MAY 28, 2024
View in Browser »

Building a Python GUI Application With Tkinter

In this video course, you’ll learn the basics of GUI programming with Tkinter, the de facto Python GUI framework. Master GUI programming concepts such as widgets, geometry managers, and event handlers. Then, put it all together by building two applications: a temperature converter and a text editor.
REAL PYTHON course

pyastgrep and Custom Linting

This article from the developer of pyastgrep introduces you to the tool which can now be used as a library. The post talks about how to use it and what kind of linting it does best.
LUKE PLANT

Upgrade Python Versions Without the Pain

Stop wasting 30% of your team’s sprint on maintaining legacy codebases. Automatically migrate and keep up-to-date on Python versions, so that you can focus on being productive while staying secure, without the risk of breaking changes - Get a code assessment today →
ACTIVESTATE sponsor

What’s New in Django 5.1

Django 5.1 has gone alpha so the list of features targeting this release has more or less solidified. This article introduces you to what is coming in Django 5.1.
JEFF TRIPLETT

Quiz: How to Create Pivot Tables With Pandas

This quiz is designed to push your knowledge of pivot tables a little bit further. You won’t find all the answers by reading the tutorial, so you’ll need to do some investigating on your own. By finding all the answers, you’re sure to learn some other interesting things along the way.
REAL PYTHON

PEP 649 Re-targeted to 3.14

Python Enhancement Proposal 649: Deferred Evaluation Of Annotations Using Descriptors has been re-targeted to the Python 3.14 release
PYTHON.ORG

JupyterLab 4.2 and Notebook 7.2 Released

JUPYTER

Articles & Tutorials Testing With Python: The Different Types of Tests

This is part 5 of a deep dive into writing automated tests, but also works well as an independent article. This post talks about the taxonomy of testing, like the differences between unit and integration tests, and how nobody can quite agree on a definition of either.
BITECODE

Python’s Built-in Exceptions: A Walkthrough With Examples

In this tutorial, you’ll get to know some of the most commonly used built-in exceptions in Python. You’ll learn when these exceptions can appear in your code and how to handle them. Finally, you’ll learn how to raise some of these exceptions in your code.
REAL PYTHON

Software Engineering Hiring and Firing

This article is a deep dive on the hiring and firing practices in the software field, and unlike most articles focuses on senior engineering roles. It isn’t a “first job” post, but a “how the decision process works” article.
ED CREWE

Enabling Async MongoDB Operations in Streamlit

Streamlit is a wonderful tool for building dashboards with its peculiar execution model, but using asyncio data sources with it can be a real pain. This article is about how you correctly use those two technologies together.
HANDMADESOFTWARE • Shared by Thorin Schiffer

EuroPython 2024 Announces Keynote Speakers

EuroPython happens in Prague July 8-14 and as the conference approaches more and more is happening. This posting from their May newsletter highlights the keynotes and other announcements.
EUROPYTHON

Writing Commit Messages

This guide admits to being “yet another”, but unlike most that are out there, spends less time discussing the cosmetic aspects of a good commit message and more time on the content.
SIMON TATHAM

PSF Announces 5-Year Sponsorship Commitment From Fastly

Python Software Foundation securing this sponsorship affects the entire Python ecosystem, most notably the security and reliability of the Python Package Index (PyPI).
SOCKET.DEV • Shared by Sarah Gooding

Untold Stories From 6 Years Working on Python Packaging

Sumana gave the closing keynote address at PyCon US this year and this posting shares all the links and references from the talk.
SUMANA HARIHARESWARA

The Python calendar Module: Create Calendars With Python

Learn to use the Python calendar module to create and customize calendars in plain text, HTML or directly in your terminal.
REAL PYTHON

TIL: Accessibility Resources #2

This post is a collection of accessibility resources mostly for web sites, but some tools can be used elsewhere as well.
SARAH ABDEREMANE

Projects & Code PgQueuer: Python & PostgreSQL Job Queuing Library

GITHUB.COM/JANBJORGE

Tapyr: Shiny for Python Application Template

GITHUB.COM/APPSILON • Shared by Appsilon

Oven: Explore Python Packages

FMING.DEV

tkforge: Drag & Drop in Figma to Create a Python GUI

GITHUB.COM/AXORAX

tach: Enforce a Modular, Decoupled Package Architecture

GITHUB.COM/NEVER-OVER

Events Weekly Real Python Office Hours Q&A (Virtual)

May 29, 2024
REALPYTHON.COM

SPb Python Drinkup

May 30, 2024
MEETUP.COM

Building Python Communities Yaounde

June 1 to June 3, 2024
NOKIDBEHIND.ORG

Django Girls Medellín

June 1 to June 2, 2024
DJANGOGIRLS.ORG

PyDelhi User Group Meetup

June 1, 2024
MEETUP.COM

Melbourne Python Users Group, Australia

June 3, 2024
J.MP

DjangoCon Europe 2024

June 5 to June 10, 2024
DJANGOCON.EU

PyCon Colombia 2024

June 7 to June 10, 2024
PYCON.CO

Happy Pythoning!
This was PyCoder’s Weekly Issue #631.
View in Browser »

[ Subscribe to 🐍 PyCoder’s Weekly 💌 – Get the best Python news, articles, and tutorials delivered to your inbox once a week >> Click here to learn more ]

Categories: FLOSS Project Planets

Ned Batchelder: One way to fix Python circular imports

Planet Python - Tue, 2024-05-28 13:46

In Python, a circular import is when two files each try to import the other, causing a failure when a module isn’t fully initialized. The best way to fix this situation is to organize your code in layers so that the importing relationships naturally flow in just one direction. But sometimes it works to simply change the style of import statement you use. I’ll show you.

Let’s say you have these files:

1# one.py
2from two import func_two
3
4def func_one():
5    func_two()
1# two.py
2from one import func_one
3
4def do_work():
5    func_one()
6
7def func_two():
8    print("Hello, world!")
1# main.py
2from two import do_work
3do_work()

If we run main.py, we get this:

% python main.py
Traceback (most recent call last):
  File "main.py", line 2, in <module>
    from two import do_work
  File "two.py", line 2, in <module>
    from one import func_one
  File "one.py", line 2, in <module>
    from two import func_two
ImportError: cannot import name 'func_two' from partially initialized
  module 'two' (most likely due to a circular import) (two.py)

When Python imports a module, it executes the file line by line. Every global in the file (top-level name including functions and classes) becomes an attribute on the module object being constructed. In two.py, we import from one.py at line 2. At that moment, the two module has been created, but it has no attributes yet because nothing has been defined yet. It will eventually have do_work and func_two, but we haven’t executed those def statements yet, so they don’t exist. Like a function call, when the import statement is run, it begins executing the imported file, and doesn’t come back to the current file until the import is done.

The import of one.py starts, and its line 2 tries to get a name from the two module. As we just said, the two module exists, but has no names defined yet. That gives us the error.

Instead of importing names from modules, we can import whole modules instead. All we do is change the form of the imports, and how we reference the functions from the imported modules, like this:

1# one.py
2import two              # was:  from two import func_two
3
4def func_one():
5    two.func_two()      # was:  func_two()
1# two.py
2import one              # was:  from one import func_one
3
4def do_work():
5    one.func_one()      # was:  func_one()
6
7def func_two():
8    print("Hello, world!")
1# main.py
2from two import do_work
3do_work()

Running the fixed code, we get this:

% python main.py
Hello, world!

It works because two.py imports one at line 2, and then one.py imports two at its line 2. That works just fine, because the two module exists. It’s still empty like it was before the fix, but now we aren’t trying to find a name in it during the import. Once all of the imports are done, the one and two modules both have all their names defined, and we can access them from inside our functions.

The key idea here is that “from two import func_two” tries to find func_two during the import, before it exists. Deferring the name lookup to the body of the function by using “import two” lets all of the modules get themselves fully initialized before we try to use them, avoiding the circular import error.

As I mentioned at the top, the best way to fix circular imports is to structure your code so that modules don’t have mutual dependencies like this. But that isn’t always easy, and this can buy you a little time to get your code working again.

Categories: FLOSS Project Planets

Evolving Web: Evolving Web Wins Pantheon Award for Social Impact

Planet Drupal - Tue, 2024-05-28 13:23

We are thrilled to announce that Evolving Web has been honored with the Social Impact Award in the Inaugural Pantheon Partner Awards for our work on the Planned Parenthood Direct website

The winners were announced at the Pantheon Partner dinner, held during DrupalCon Portland on May 6, 2024. Congratulations to the other winners who took to the stage with us: 

  • Elevated Third – Partner of the Year Award
  • WebMD Ignite – Innovation Award
  • HoundER – Rookie of the Year Award
  • Forum One – Customer First Award
  • Danny Pfeiffer – Friends of Pantheon Partners Award

Pantheon’s Partner Awards recognize the outstanding contributions of digital agencies that drive positive change. We’re proud to be acknowledged for our role in the Planned Parenthood Direct project, which supports reproductive rights and enhancing access to reproductive and sexual healthcare. Our work on the project demonstrates our commitment to creating impact through user-centric digital experiences.

A Mission-Driven Collaboration

In the U.S., reproductive and sexual health care services vary from state to state. Planned Parenthood Direct (PPD) aims to provide trusted care from anywhere by offering “on-the-go” services. We collaborated with PPD to build a secure, mobile-first website on that informs users of available services in their state. The site also encourages users to download the PPD app, which they can use to order birth control.

Designing for Impact and Inclusion

Our team undertook the challenge of creating a highly informative, accessible website that appeals to a younger audience.

  • We created dedicated pages for each state, ensuring they’re easy for PPD to update and optimized for search engines.
  • We created a new visual brand identity that incorporates bold design principles for a youthful, reassuring, and non-stigmatizing user experience.
  • Our mobile-first approach ensured that the site meets the needs of an audience who prefer mobile devices.
  • We also followed accessibility best practices to ensure a user-friendly experience for all, including users with disabilities. 

Protecting Users with Exceptional Security

Security was a paramount concern, given the political climate surrounding reproductive rights. We ensured a highly secure online experience using a decoupled architecture with Next.js for the front-end and Drupal 10 for the back-end. Hosting on Pantheon added additional layers of security, including HTTPS certificates and DDoS protection.


Setting PPD Up For Success & Growth 

Our work on the Planned Parenthood Direct website included the development of 17 custom components and 14 content types in Layout Builder. This empowers PPD’s content editors to create flexible, engaging, and visually appealing layouts. The results is streamlined content creation and management, allowing PPD to maintain and grow their website effectively.

Outstanding Results & Continued Commitment

The new Planned Parenthood Direct website has been instrumental in continuing PPD’s mission to support human rights and ensure access to sexual and reproductive healthcare.

A big thank you to Pantheon for recognizing our efforts, and to Planned Parenthood Direct for trusting us with this important project. We’re honoured to have partnered with you both.

As we celebrate this award, we’re reminded of the importance of our work and the impact it has on communities. We look forward to future opportunities to make a difference.

Partner with us to turn your vision into a powerful digital experience that drives change. 

+ more awesome articles by Evolving Web
Categories: FLOSS Project Planets

Evolving Web: Evolving Web Wins Pantheon Award for Social Impact

Planet Drupal - Tue, 2024-05-28 13:23

We are thrilled to announce that Evolving Web has been honored with the Social Impact Award in the Inaugural Pantheon Partner Awards for our work on the Planned Parenthood Direct website

The winners were announced at the Pantheon Partner dinner, held during DrupalCon Portland on May 6, 2024. Congratulations to the other winners who took to the stage with us: 

  • Elevated Third – Partner of the Year Award
  • WebMD Ignite – Innovation Award
  • HoundER – Rookie of the Year Award
  • Forum One – Customer First Award
  • Danny Pfeiffer – Friends of Pantheon Partners Award

Pantheon’s Partner Awards recognize the outstanding contributions of digital agencies that drive positive change. We’re proud to be acknowledged for our role in the Planned Parenthood Direct project, which supports reproductive rights and enhancing access to reproductive and sexual healthcare. Our work on the project demonstrates our commitment to creating impact through user-centric digital experiences.

A Mission-Driven Collaboration

In the U.S., reproductive and sexual health care services vary from state to state. Planned Parenthood Direct (PPD) aims to provide trusted care from anywhere by offering “on-the-go” services. We collaborated with PPD to build a secure, mobile-first website on that informs users of available services in their state. The site also encourages users to download the PPD app, which they can use to order birth control.

Designing for Impact and Inclusion

Our team undertook the challenge of creating a highly informative, accessible website that appeals to a younger audience.

  • We created dedicated pages for each state, ensuring they’re easy for PPD to update and optimized for search engines.
  • We created a new visual brand identity that incorporates bold design principles for a youthful, reassuring, and non-stigmatizing user experience.
  • Our mobile-first approach ensured that the site meets the needs of an audience who prefer mobile devices.
  • We also followed accessibility best practices to ensure a user-friendly experience for all, including users with disabilities. 

Protecting Users with Exceptional Security

Security was a paramount concern, given the political climate surrounding reproductive rights. We ensured a highly secure online experience using a decoupled architecture with Next.js for the front-end and Drupal 10 for the back-end. Hosting on Pantheon added additional layers of security, including HTTPS certificates and DDoS protection.


Setting PPD Up For Success & Growth 

Our work on the Planned Parenthood Direct website included the development of 17 custom components and 14 content types in Layout Builder. This empowers PPD’s content editors to create flexible, engaging, and visually appealing layouts. The results is streamlined content creation and management, allowing PPD to maintain and grow their website effectively.

Outstanding Results & Continued Commitment

The new Planned Parenthood Direct website has been instrumental in continuing PPD’s mission to support human rights and ensure access to sexual and reproductive healthcare.

A big thank you to Pantheon for recognizing our efforts, and to Planned Parenthood Direct for trusting us with this important project. We’re honoured to have partnered with you both.

As we celebrate this award, we’re reminded of the importance of our work and the impact it has on communities. We look forward to future opportunities to make a difference.

Partner with us to turn your vision into a powerful digital experience that drives change. 

+ more awesome articles by Evolving Web
Categories: FLOSS Project Planets

Go Deh: Recreating the CVM algorithm for estimating distinct elements gives problems

Planet Python - Tue, 2024-05-28 12:15

 

 Someone at work posted a link to this Quanta Magazine article. It describes a novel, and seemingly straight-forward way to estimate the number of distinct elements in a datastream. 

Quanta describes the algorithm, and as an example gives "counting the number of distinct words in Hamlet".

Following Quanta

I looked at the description and decided to follow their text. They carefully described each round of the algorithm which I coded up and then looked for the generalizations and implemented a loop over alll items in the stream ....

It did not work! I got silly numbers. I could download Hamlet split it into words, (around 32,000), do len(set(words) to get the exact number of distinct words, (around 7,000), then run it through the algorithm and get a stupid result with tens of digits for the estimated number of distinct words.
I re-checked my implementation of the Quanta-described algorithm and couldn't see any mistake, but I had originally noticed a link to the original paper. I did not follow it at first as original papers can be heavily into maths notation and I prefer reading algorithms described in code/pseudocode. 

I decided to take a look at the original.

The CVM Original Paper

I scanned the paper.

I read the paper.

I looked at Algorithm 1 as a probable candidate to decypher into Python, but the description was cryptic. Heres that description taken from the paper:

AI To the rescue!?

I had a brainwave💡lets chuck it at two AI's and see what they do. I had Gemini and I had Copilot to hand and asked them each to express Algorithm 1 as Python. Gemini did something, and Copilot finally did something but I first had to open the page in Microsoft Edge.
There followed hours of me reading and cross-comparing between the algorithm and the AI's. If I did not understand where something came from I would ask the generating AI; If I found an error I would first, (and second and...), try to get the AI to make a fix I suggested.

At this stage I was also trying to get a feel for how the AI's could help me, (now way past what I thought the algorithm should be, just to see what it would take to get those AI's to cross T's and dot I's on a good solution).
Not a good use of time! I now know that asking questions to update one of the 20 to 30 lines of the Python function might fix that line, but unfix another line you had fixed before. Code from the AI does not have line numbers making it difficult to state what needs changing, and where.They can suggest type hints and create the beginnings of docstrings, but, for example, it pulled out the wrong authors for the name of the algorithm.
In line 1 of the algorithm, the initialisation of thresh is clearly shown, I thought, but both AI's had difficulty getting the Python right. eventually I cut-n-pasted the text into each AI, where they confidentially said "OF course...", made a change, and then I had to re-check for any other changes.

My Code

I first created this function:

def F0_Estimator(stream: Collection[Any], epsilon: float, delta: float) -> float:    """    ...    """    p = 1    X = set()    m = len(stream)    thresh = math.ceil(12 / (epsilon ** 2) * math.log(8 * m / delta))
    for item in stream:        X.discard(item)        if random.random() < p:            X.add(item)        if len(X) == thresh:            X = {x_item for x_item in X                    if random.random() < 0.5}            p /= 2    return len(X) / p

I tested it with Hamlet data and it made OK estimates.

Elated, I took a break.

Hacker News

The next evening I decided to do a search to see If anyone else was talking about the algorithm and found a thread on Hacker News that was right up my street. People were discussing those same problems found in the Quanta Article - and getting similar ginormous answers. They had one of the original Authors of the paper making comments! And others had created code from the actual paper and said it was also easier than the Quanta description.

The author mentioned that no less than Donald Knuth had taken an interest in their algorithm and had noted that the expression starting `X = ...` four lines from the end could, thoretically, make no change to X, and the solution was to encase the assignment in a while loop that only exited if len(X) < thresh.

Code update

I decided to add that change:

def F0_Estimator(stream: Collection[Any], epsilon: float, delta: float) -> float:    """    Estimates the number of distinct elements in the input stream.
    This function implements the CVM algorithm for the problem of     estimating the number of distinct elements in a stream of data.        The stream object must support an initial call to __len__
    Parameters:    stream (Collection[Any]): The input stream as a collection of hashable         items.    epsilon (float): The desired relative error in the estimate. It must be in         the range (0, 1).    delta (float): The desired probability of the estimate being within the         relative error. It must be in the range (0, 1).
    Returns:    float: An estimate of the number of distinct elements in the input stream.    """    p = 1    X = set()    m = len(stream)    thresh = math.ceil(12 / (epsilon ** 2) * math.log(8 * m / delta))
    for item in stream:        X.discard(item)        if random.random() < p:            X.add(item)        if len(X) == thresh:            while len(X) == thresh:  # Force a change                X = {x_item for x_item in X                     if random.random() < 0.5}  # Random, so could do nothing            p /= 2    return len(X) / p


thresh

In the code above, the variable thresh, (threshhold), named from Algorithm 1, is used in the Quanta article to describe the maximum storage available to keep items from the stream that have been seen before. You must know the length of the stream - m, epsilon, and delta to calculate thresh.

If you were to have just the stream and  thresh as the arguments you could return both the estimate of the number of distinct items in the stream as well as counting the number of total elements in the stream.
Epsilon could be calculated from the numbers we now know.

def F0_Estimator2(stream: Iterable[Any],                 thresh: int,                  ) -> tuple[float, int]:    """    Estimates the number of distinct elements in the input stream.
    This function implements the CVM algorithm for the problem of     estimating the number of distinct elements in a stream of data.        The stream object does NOT have to support a call to __len__
    Parameters:    stream (Iterable[Any]): The input stream as an iterable of hashable         items.    thresh (int): The max threshhold of stream items used in the estimation.py
    Returns:    tuple[float, int]: An estimate of the number of distinct elements in the         input stream, and the count of the number of items in stream.    """    p = 1    X = set()    m = 0  # Count of items in stream
    for item in stream:        m += 1        X.discard(item)        if random.random() < p:            X.add(item)        if len(X) == thresh:            while len(X) == thresh:  # Force a change                X = {x_item for x_item in X                     if random.random() < 0.5}  # Random, so could do nothing            p /= 2                return len(X) / p, m
def F0_epsilon(               thresh: int,               m: int,               delta: float=0.05,  #  0.05 is 95%              ) -> float:    """    Calculate the relative error in the estimate from F0_Estimator2(...)
    Parameters:    thresh (int): The thresh value used in the call TO F0_Estimator2.    m (int): The count of items in the stream FROM F0_Estimator2.    delta (float): The desired probability of the estimate being within the         relative error. It must be in the range (0, 1) and is usually 0.05        to 0.01, (95% to 99% certainty).
    Returns:    float: The calculated relative error in the estimate
    """    return math.sqrt(12 / thresh * math.log(8 * m / delta))

Testingdef stream_gen(k: int=30_000, r: int=7_000) -> list[int]:    "Create a randomised list of k ints of up to r different values."    return random.choices(range(r), k=k)
def stream_stats(s: list[Any]) -> tuple[int, int]:    length, distinct = len(s), len(set(s))    return length, distinct
# %% print("CVM ALGORITHM ESTIMATION OF NUMBER OF UNIQUE VALUES IN A STREAM")
stream_size = 2**18reps = 5target_uniques = 1while target_uniques < stream_size:    the_stream = stream_gen(stream_size+1, target_uniques)    target_uniques *= 4    size, unique = stream_stats(the_stream)
    print(f"\n  Actual:\n    {size = :_}, {unique = :_}\n  Estimations:")
    delta = 0.05    threshhold = 2    print(f"    All runs using {delta = :.2f} and with estimate averaged from {reps} runs:")    while threshhold < size:        estimate, esize = F0_Estimator2(the_stream.copy(), threshhold)        estimate = sum([estimate] +                    [F0_Estimator2(the_stream.copy(), threshhold)[0]                        for _ in range(reps - 1)]) / reps        estimate = int(estimate + 0.5)        epsilon = F0_epsilon(threshhold, esize, delta)        print(f"      With {threshhold = :7_} -> "            f"{estimate = :_}, +/-{epsilon*100:.0f}%"            + (f" {esize = :_}" if esize != size else ""))        threshhold *= 8

The algorithm generates an estimate based on random sampling, so I run it multiple times for the same input and report the mean estimate from those runs.

Sample output

 

CVM ALGORITHM ESTIMATION OF NUMBER OF UNIQUE VALUES IN A STREAM
  Actual:    size = 262_145, unique = 1  Estimations:    All runs using delta = 0.05 and with estimate averaged from 5 runs:      With threshhold =       2 -> estimate = 1, +/-1026%      With threshhold =      16 -> estimate = 1, +/-363%      With threshhold =     128 -> estimate = 1, +/-128%      With threshhold =   1_024 -> estimate = 1, +/-45%      With threshhold =   8_192 -> estimate = 1, +/-16%      With threshhold =  65_536 -> estimate = 1, +/-6%
  Actual:    ...   Actual:    size = 262_145, unique = 1_024  Estimations:    All runs using delta = 0.05 and with estimate averaged from 5 runs:      With threshhold =       2 -> estimate = 16_384, +/-1026%      With threshhold =      16 -> estimate = 768, +/-363%      With threshhold =     128 -> estimate = 1_101, +/-128%      With threshhold =   1_024 -> estimate = 1_018, +/-45%      With threshhold =   8_192 -> estimate = 1_024, +/-16%      With threshhold =  65_536 -> estimate = 1_024, +/-6%
  Actual:    size = 262_145, unique = 4_096  Estimations:    All runs using delta = 0.05 and with estimate averaged from 5 runs:      With threshhold =       2 -> estimate = 13_107, +/-1026%      With threshhold =      16 -> estimate = 3_686, +/-363%      With threshhold =     128 -> estimate = 3_814, +/-128%      With threshhold =   1_024 -> estimate = 4_083, +/-45%      With threshhold =   8_192 -> estimate = 4_096, +/-16%      With threshhold =  65_536 -> estimate = 4_096, +/-6%
  Actual:    size = 262_145, unique = 16_384  Estimations:    All runs using delta = 0.05 and with estimate averaged from 5 runs:      With threshhold =       2 -> estimate = 0, +/-1026%      With threshhold =      16 -> estimate = 15_155, +/-363%      With threshhold =     128 -> estimate = 16_179, +/-128%      With threshhold =   1_024 -> estimate = 16_986, +/-45%      With threshhold =   8_192 -> estimate = 16_211, +/-16%      With threshhold =  65_536 -> estimate = 16_384, +/-6%
  Actual:    size = 262_145, unique = 64_347  Estimations:    All runs using delta = 0.05 and with estimate averaged from 5 runs:      With threshhold =       2 -> estimate = 26_214, +/-1026%      With threshhold =      16 -> estimate = 73_728, +/-363%      With threshhold =     128 -> estimate = 61_030, +/-128%      With threshhold =   1_024 -> estimate = 64_422, +/-45%      With threshhold =   8_192 -> estimate = 64_760, +/-16%      With threshhold =  65_536 -> estimate = 64_347, +/-6%

 Looks good!

Wikipedia

Another day, and I decide to start writing this blog post. I searched again and found the Wikipedia article on what it called the Count-distinct problem

Looking through it, It had this wrong description of the CVM algorithm:

The, (or a?),  problem with the wikipedia entry is that it shows

p ← p 2

...within the while loop. You need an enclosing if |B| >= s for the while loop and the  assignment to p outside the while loop, but inside this new if statement.

It's tough!

Both Quanta Magazine, and whoever added the algorithm to Wikipedia got the algorithm wrong.

I've written around two hundred tasks on site Rosettacode.org for over a decade. Others had to read my description and create code in their chosen language to implement those tasks. I have learnt from the feedback I got on talk pages to hone that craft, but details matter. Examples matter. Constructive feedback matters.

END.

 

Categories: FLOSS Project Planets

Real Python: Efficient Iterations With Python Iterators and Iterables

Planet Python - Tue, 2024-05-28 10:00

Python’s iterators and iterables are two different but related tools that come in handy when you need to iterate over a data stream or container. Iterators power and control the iteration process, while iterables typically hold data that you want to iterate over one value at a time.

Iterators and iterables are fundamental components of Python programming, and you’ll have to deal with them in almost all your programs. Learning how they work and how to create them is key for you as a Python developer.

In this video course, you’ll learn how to:

  • Create iterators using the iterator protocol in Python
  • Understand the differences between iterators and iterables
  • Work with iterators and iterables in your Python code
  • Use generator functions and the yield statement to create generator iterators
  • Build your own iterables using different techniques, such as the iterable protocol
  • Use the asyncio module and the await and async keywords to create asynchronous iterators

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Python Software Foundation: Thinking about running for the Python Software Foundation Board of Directors? Let’s talk!

Planet Python - Tue, 2024-05-28 06:27

PSF Board elections are a chance for the community to choose representatives to help the PSF create a vision for and build the future of the Python community. This year there are 3 seats open on the PSF board. Check out who is currently on the PSF Board. (Débora Azevedo, Kwon-Han Bae, and Tania Allard are at the end of their current terms.)

Office Hours Details

This year, the PSF Board is running Office Hours so you can connect with current members to ask questions and learn more about what being a part of the Board entails. There will be two Office Hour sessions:

  • June 11th, 4 PM UTC
  • June 18th, 12 PM UTC

Make sure to check what time that is for you. We welcome you to join the PSF Discord and navigate to the #psf-elections channel to participate in Office Hours. The server is moderated by PSF Staff and locked between office hours sessions. If you’re new to Discord, check out some Discord Basics to help you get started.

Who runs for the Board?

People who care about the Python community, who want to see it flourish and grow, and also have a few hours a month to attend regular meetings, serve on committees, participate in conversations, and promote the Python community. Check out our Life as Python Software Foundation Director video to learn more about what being a part of the PSF Board entails. We also invite you to review our Annual Impact Report for 2023 to learn more about the PSF mission and what we do.

Nomination info

You can nominate yourself or someone else. We encourage you to reach out to people before you nominate them to ensure they are enthusiastic about the potential of joining the Board. Nominations open on Tuesday, June 11th, 2:00 PM UTC, so you have a few weeks to research the role and craft a nomination statement. The nomination period ends on June 25th, 2:00 PM UTC.

Categories: FLOSS Project Planets

Robin Wilson: How to install the Python triangle package on an Apple Silicon Mac

Planet Python - Tue, 2024-05-28 05:53

I was recently trying to set up RasterVision on my Apple Silicon Mac (specifically a M1 MacBook Pro, but I’m pretty sure this applies to any Apple Silicon Mac). It all went fine until it came time to install the triangle package, when I got an error. The error output is fairly long, but the key part is the end part here:

triangle/core.c:196:12: fatal error: 'longintrepr.h' file not found #include "longintrepr.h" ^~~~~~~~~~~~~~~ 1 error generated. error: command '/usr/bin/clang' failed with exit code 1 [end of output]

It took me quite a bit of searching to find the answer (Google just isn’t very good at giving relevant results these days), but actually it turns out to be very simple. The latest version of triangle on PyPI doesn’t work on Apple Silicon, but the code in the Github repository does work, so you can install directly from Github with this command:

pip install git+https://github.com/drufat/triangle.git

and it should all work fine.

Once you’ve done this, install rastervision again and it should recognise that the triangle package is already installed and not try to install it again.

Categories: FLOSS Project Planets

Call for Papers – Qt World Summit 2025 in Munich

Planet KDE - Tue, 2024-05-28 04:57

 

Qt World Summit is back and bigger than ever! We are looking for speakers, collaborators, and industry thought leaders to share their expertise and thoughts at the upcomingQt World Summit on May 6-7th, 2025 in Munich, Germany. 

*Please note we are looking for live talks only. 

Categories: FLOSS Project Planets

Pages