Feeds

Data School: How to prevent data leakage in pandas & scikit-learn ☔

Planet Python - Thu, 2024-04-25 10:51

Let&aposs pretend you&aposre working on a supervised Machine Learning problem using Python&aposs scikit-learn library. Your training data is in a pandas DataFrame, and you discover missing values in a column that you were planning to use as a feature.

After considering your options, you decide to impute the missing values, which means that you&aposre going to fill in the missing values with reasonable values.

How should you perform the imputation?

  • Option 1 is to fill in the missing values in pandas, and then pass the transformed data to scikit-learn.
  • Option 2 is to pass the original data to scikit-learn, and then perform all data transformations (including missing value imputation) within scikit-learn.

Option 1 will cause data leakage, whereas option 2 will prevent data leakage.

Here are questions you might be asking:

  • What is data leakage?
  • Why is data leakage problematic?
  • Why would data leakage result from missing value imputation in pandas?
  • How can I prevent data leakage when using pandas and scikit-learn?

Answers below! 👇

What is data leakage?

Data leakage occurs when you inadvertently include knowledge from testing data when training a Machine Learning model.

Why is data leakage problematic?

Data leakage is problematic because it will cause your model evaluation scores to be less reliable. This may lead you to make bad decisions when tuning hyperparameters, and it will lead you to overestimate how well your model will perform on new data.

It&aposs hard to know whether data leakage will skew your evaluation scores by a negligible amount or a huge amount, so it&aposs best to just avoid data leakage entirely.

Why would data leakage result from missing value imputation in pandas?

Your model evaluation procedure (such as cross-validation) is supposed to simulate the future, so that you can accurately estimate right now how well your model will perform on new data.

But if you impute missing values on your whole dataset in pandas and then pass your dataset to scikit-learn, your model evaluation procedure will no longer be an accurate simulation of reality. That&aposs because the imputation values will be based on your entire dataset (meaning both the training portion and the testing portion), whereas the imputation values should just be based on the training portion.

In other words, imputation based on the entire dataset is like peeking into the future and then using what you learned from the future during model training, which is definitely not allowed.

How can we avoid this in pandas?

You might think that one way around this problem would be to split your dataset into training and testing sets and then impute missing values using pandas. (Specifically, you would need to learn the imputation value from the training set and then use it to fill in both the training and testing sets.)

That would work if you&aposre only ever planning to use train/test split for model evaluation, but it would not work if you&aposre planning to use cross-validation. That&aposs because during 5-fold cross-validation (for example), the rows contained in the training set will change 5 times, and thus it&aposs quite impractical to avoid data leakage if you use pandas for imputation while using cross-validation!

How else can data leakage arise?

So far, I&aposve only mentioned data leakage in the context of missing value imputation. But there are other transformations that if done in pandas on the full dataset will also cause data leakage.

For example, feature scaling in pandas would lead to data leakage, and even one-hot encoding (or "dummy encoding") in pandas would lead to data leakage unless there&aposs a known, fixed set of categories.

More generally, any transformation which incorporates information about other rows when transforming a row will lead to data leakage if done in pandas.

How does scikit-learn prevent data leakage?

Now that you&aposve learned how data transformations in pandas can cause data leakage, I&aposll briefly mention three ways in which scikit-learn prevents data leakage:

  • First, scikit-learn transformers have separate fit and transform steps, which allow you to base your data transformations on the training set only, and then apply those transformations to both the training set and the testing set.
  • Second, the fit and predict methods of a Pipeline encapsulate all calls to fit_transform and transform so that they&aposre called at the appropriate times.
  • Third, cross_val_score splits the data prior to performing data transformations, which ensures that the transformers only learn from the temporary training sets that are created during cross-validation.
Conclusion

When working on a Machine Learning problem in Python, I recommend performing all of your data transformations in scikit-learn, rather than performing some of them in pandas and then passing the transformed data to scikit-learn.

Besides helping you to prevent data leakage, this enables you to tune the transformer and model hyperparameters simultaneously, which can lead to a better performing model!

One final note...

This post is an excerpt from my upcoming video course, Master Machine Learning with scikit-learn.

Join the waitlist below to get free lessons from the course and a special launch discount 👇

Categories: FLOSS Project Planets

The Drop Times: DrupalCollab: How big is the Drupal Community?

Planet Drupal - Thu, 2024-04-25 09:32
The Drop Times delves into the dynamics of the Drupal community with a detailed analysis of LinkedIn data, revealing the distribution and growth trends of Drupal professionals worldwide. This comprehensive study sheds light on regional concentrations and potential areas for community engagement.
Categories: FLOSS Project Planets

Lukas Märdian: Creating a Netplan enabled system through Debian-Installer

Planet Debian - Thu, 2024-04-25 06:19

With the work that has been done in the debian-installer/netcfg merge-proposal !9 it is possible to install a standard Debian system, using the normal Debian-Installer (d-i) mini.iso images, that will come pre-installed with Netplan and all network configuration structured in /etc/netplan/.

In this write-up I’d like to run you through a list of commands for experiencing the Netplan enabled installation process first-hand. For now, we’ll be using a custom ISO image, while waiting for the above-mentioned merge-proposal to be landed. Furthermore, as the Debian archive is going through major transitions builds of the “unstable” branch of d-i don’t currently work. So I implemented a small backport, producing updated netcfg and netcfg-static for Bookworm, which can be used as localudebs/ during the d-i build.

Let’s start with preparing a working directory and installing the software dependencies for our virtualized Debian system:

$ mkdir d-i_bookworm && cd d-i_bookworm $ apt install ovmf qemu-utils qemu-system-x86

Now let’s download the custom mini.iso, linux kernel image and initrd.gz containing the Netplan enablement changes, as mentioned above.

TODO: localudebs/

$ wget https://people.ubuntu.com/~slyon/d-i/bookworm/mini.iso $ wget https://people.ubuntu.com/~slyon/d-i/bookworm/linux $ wget https://people.ubuntu.com/~slyon/d-i/bookworm/initrd.gz

Next we’ll prepare a VM, by copying the EFI firmware files, preparing some persistent EFIVARs file, to boot from FS0:\EFI\debian\grubx64.efi, and create a virtual disk for our machine:

$ cp /usr/share/OVMF/OVMF_CODE_4M.fd . $ cp /usr/share/OVMF/OVMF_VARS_4M.fd . $ qemu-img create -f qcow2 ./data.qcow2 5G

Finally, let’s launch the installer using a custom preseed.cfg file, that will automatically install Netplan for us in the target system. A minimal preseed file could look like this:

# Install minimal Netplan generator binary
d-i preseed/late_command string in-target apt-get -y install netplan-generator

For this demo, we’re installing the full netplan.io package (incl. Python CLI), as the netplan-generator package was not yet split out as an independent binary in the Bookworm cycle. You can choose the preseed file from a set of different variants to test the different configurations:

We’re using the custom linux kernel and initrd.gz here to be able to pass the PRESEED_URL as a parameter to the kernel’s cmdline directly. Launching this VM should bring up the normal debian-installer in its netboot/gtk form:

$ export U=https://people.ubuntu.com/~slyon/d-i/bookworm/netplan-preseed+networkd.cfg $ qemu-system-x86_64 \ -M q35 -enable-kvm -cpu host -smp 4 -m 2G \ -drive if=pflash,format=raw,unit=0,file=OVMF_CODE_4M.fd,readonly=on \ -drive if=pflash,format=raw,unit=1,file=OVMF_VARS_4M.fd,readonly=off \ -device qemu-xhci -device usb-kbd -device usb-mouse \ -vga none -device virtio-gpu-pci \ -net nic,model=virtio -net user \ -kernel ./linux -initrd ./initrd.gz -append "url=$U" \ -hda ./data.qcow2 -cdrom ./mini.iso;

Now you can click through the normal Debian-Installer process, using mostly default settings. Optionally, you could play around with the networking settings, to see how those get translated to /etc/netplan/ in the target system.

After you confirmed your partitioning changes, the base system gets installed. I suggest not to select any additional components, like desktop environments, to speed up the process.

During the final step of the installation (finish-install.d/55netcfg-copy-config) d-i will detect that Netplan was installed in the target system (due to the preseed file provided) and opt to write its network configuration to /etc/netplan/ instead of /etc/network/interfaces or /etc/NetworkManager/system-connections/.

Done! After the installation finished you can reboot into your virgin Debian Bookworm system.

To do that, quit the current Qemu process, by pressing Ctrl+C and make sure to copy over the EFIVARS.fd file that was written by grub during the installation, so Qemu can find the new system. Then reboot into the new system, not using the mini.iso image any more:

$ cp ./OVMF_VARS_4M.fd ./EFIVARS.fd $ qemu-system-x86_64 \ -M q35 -enable-kvm -cpu host -smp 4 -m 2G \ -drive if=pflash,format=raw,unit=0,file=OVMF_CODE_4M.fd,readonly=on \ -drive if=pflash,format=raw,unit=1,file=EFIVARS.fd,readonly=off \ -device qemu-xhci -device usb-kbd -device usb-mouse \ -vga none -device virtio-gpu-pci \ -net nic,model=virtio -net user \ -drive file=./data.qcow2,if=none,format=qcow2,id=disk0 \ -device virtio-blk-pci,drive=disk0,bootindex=1 -serial mon:stdio

Finally, you can play around with your Netplan enabled Debian system! As you will find, /etc/network/interfaces exists but is empty, it could still be used (optionally/additionally). Netplan was configured in /etc/netplan/ according to the settings given during the d-i installation process.

In our case we also installed the Netplan CLI, so we can play around with some of its features, like netplan status:

Thank you for following along the Netplan enabled Debian installation process and happy hacking! If you want to learn more join the discussion at Salsa:installer-team/netcfg and find us at GitHub:netplan.

Categories: FLOSS Project Planets

Mixing C++ and Rust for Fun and Profit: Part 3

Planet KDE - Thu, 2024-04-25 03:00

In the two previous posts (Part 1 and Part 2), we looked at how to build bindings between C++ and Rust from scratch. However, while building a binding generator from scratch is fun, it’s not necessarily an efficient way to integrate Rust into your C++ project. Let’s look at some existing technologies for mixing C++ and Rust that you can easily deploy today.

bindgen

bindgen is an official tool of the Rust project that can create bindings around C headers. It can also wrap C++ headers, but there are limitations to its C++ support. For example, while you can wrap classes, they won’t have their constructors or destructors automatically called. You can read more about these limitations on the bindgen C++ support page. Another quirk of bindgen is that it only allows you to call C++ from Rust. If you want to go the other way around, you have to add cbindgen to generate C headers for your Rust code.

CXX

CXX is a more powerful framework for integrating C++ and Rust. It’s used in some well-known projects, such as Chromium. It does an excellent job at integrating C++ and Rust, but it is not an actual binding generator. Instead, all of your bindings have to be manually created. You can read the tutorial to learn more about how CXX works.

autocxx

Since CXX doesn’t generate bindings itself, if you want to use it in your project, you’ll need to find a generator that wraps C++ headers with CXX bindings. autocxx is a Google project that does just that, using bindgen to generate Rust bindings around C++ headers. However, it gets better—autocxx can also create C++ bindings for Rust functions.

CXX-Qt

While CXX is one of the best C++/Rust binding generators available, it fails to address Qt users. Since Qt depends so heavily on the moc to enable features like signals and slots, it’s almost impossible to use it with a general-purpose binding generator. That’s where CXX-Qt comes in. KDAB has created the CXX-Qt crate to allow you to integrate Rust into your C++/Qt application. It works by leveraging CXX to generate most of the bindings but then adds a Qt support layer. This allows you to easily use Rust on the backend of your Qt app, whether you’re using Qt Widgets or QML. CXX-Qt is available on Github and crates.io.

If you’re interested in integrating CXX-Qt into your C++ application, let us know. To learn more about CXX-Qt, you can check out this blog.

Other options

There are some other binding generators out there that aren’t necessarily going to work well for migrating your codebase, but you may want to read about them and keep an eye on them:

In addition, there are continuing efforts to improve C++/Rust interoperability. For example, Google recently announced that they are giving $1 million dollars to the Rust foundation to improve interoperability.

Conclusion

In the world of programming tools and frameworks, there is never a single solution that will work for everybody. However, CXX, CXX-Qt, and autocxx seem to be the best options for anyone who wants to port their C++ codebase to Rust. Even if you aren’t looking to completely remove C++ from your codebase, these binding generators may be a good option for you to promote memory safety in critical areas of your application.

Have you successfully integrated Rust in your C++ codebase with one of these tools? Have you used a different tool or perhaps a different programming language entirely? Leave a comment and let us know. Memory-safe programming languages like Rust are here to stay, and it’s always good to see programmers change with the times.

About KDAB

If you like this article and want to read similar material, consider subscribing via our RSS feed.

Subscribe to KDAB TV for similar informative short video content.

KDAB provides market leading software consulting and development services and training in Qt, C++ and 3D/OpenGL. Contact us.

The post Mixing C++ and Rust for Fun and Profit: Part 3 appeared first on KDAB.

Categories: FLOSS Project Planets

Django Weblog: Livestream: Django Trends for 2024

Planet Python - Thu, 2024-04-25 02:10

Today at 3pm UTC – Discover the latest trends in the Django ecosystem, based on insights from 4,000 developers who participated in the Django Developers Survey. Join the livestream with Sarah Abdermane, a Django Software Foundation Board member, and Sarah Boyce, a Django Fellow, to reflect on insights from the Django community.

Register to join

Categories: FLOSS Project Planets

Russ Allbery: Review: Nation

Planet Debian - Thu, 2024-04-25 00:18

Review: Nation, by Terry Pratchett

Publisher: Harper Copyright: 2008 Printing: 2009 ISBN: 0-06-143303-9 Format: Trade paperback Pages: 369

Nation is a stand-alone young adult fantasy novel. It was published in the gap between Discworld novels Making Money and Unseen Academicals.

Nation starts with a plague. The Russian influenza has ravaged Britain, including the royal family. The next in line to the throne is off on a remote island and must be retrieved and crowned as soon as possible, or an obscure provision in Magna Carta will cause no end of trouble. The Cutty Wren is sent on this mission, carrying the Gentlemen of Last Resort.

Then comes the tsunami.

In the midst of fire raining from the sky and a wave like no one has ever seen, Captain Roberts tied himself to the wheel of the Sweet Judy and steered it as best he could, straight into an island. The sole survivor of the shipwreck: one Ermintrude Fanshaw, daughter of the governor of some British island possessions. Oh, and a parrot.

Mau was on the Boys' Island when the tsunami came, going through his rite of passage into manhood. He was to return to the Nation the next morning and receive his tattoos and his adult soul. He survived in a canoe. No one else in the Nation did.

Terry Pratchett considered Nation to be his best book. It is not his best book, at least in my opinion; it's firmly below the top tier of Discworld novels, let alone Night Watch. It is, however, an interesting and enjoyable book that tackles gods and religion with a sledgehammer rather than a knife.

It's also very, very dark and utterly depressing at the start, despite a few glimmers of Pratchett's humor. Mau is the main protagonist at first, and the book opens with everyone he cares about dying. This is the place where I thought Pratchett diverged the most from his Discworld style: in Discworld, I think most of that would have been off-screen, but here we follow Mau through the realization, the devastation, the disassociation, the burials at sea, the thoughts of suicide, and the complete upheaval of everything he thought he was or was about to become. I found the start of this book difficult to get through. The immediate transition into potentially tragic misunderstandings between Mau and Daphne (as Ermintrude names herself once there is no one to tell her not to) didn't help.

As I got farther into the book, though, I warmed to it. The best parts early on are Daphne's baffled but scientific attempts to understand Mau's culture and her place in it. More survivors arrive, and they start to assemble a community, anchored in large part by Mau's stubborn determination to do what's right even though he's lost all of his moorings. That community eventually re-establishes contact with the rest of the world and the opening plot about the British monarchy, but not before Daphne has been changed profoundly by being part of it.

I think Pratchett worked hard at keeping Mau's culture at the center of the story. It's notable that the community that reforms over the course of the book essentially follows the patterns of Mau's lost Nation and incorporates Daphne into it, rather than (as is so often the case) the other way around. The plot itself is fiercely anti-colonial in a way that mostly worked. Still, though, it's a quasi-Pacific-island culture written by a white British man, and I had some qualms.

Pratchett quite rightfully makes it clear in the afterward that this is an alternate world and Mau's culture is not a real Pacific island culture. However, that also means that its starkly gender-essentialist nature was a free choice, rather than one based on some specific culture, and I found that choice somewhat off-putting. The religious rituals are all gendered, the dwelling places are gendered, and one's entire life course in Mau's world seems based on binary classification as a man or a woman. Based on Pratchett's other books, I assume this was more an unfortunate default than a deliberate choice, but it's still a choice he could have avoided.

The end of this book wrestles directly with the relative worth of Mau's culture versus that of the British. I liked most of this, but the twists that Pratchett adds to avoid the colonialist results we saw in our world stumble partly into the trap of making Mau's culture valuable by British standards. (I'm being a bit vague here to avoid spoilers.) I think it is very hard to base this book on a different set of priorities and still bring the largely UK, US, and western European audience along, so I don't blame Pratchett for failing to do it, but I'm a bit sad that the world still revolved around a British axis.

This felt quite similar to Discworld to me in its overall sensibilities, but with the roles of moral philosophy and humor reversed. Discworld novels usually start with some larger-than-life characters and an absurd plot, and then the moral philosophy sneaks up behind you when you're not looking and hits you over the head. Nation starts with the moral philosophy: Mau wrestles with his gods and the problem of evil in a way that reminded me of Job, except with a far different pantheon and rather less tolerance for divine excuses on the part of the protagonist. It's the humor, instead, that sneaks up on you and makes you laugh when the plot is a bit too much. But the mix arrives at much the same place: the absurd hand-in-hand with the profound, and all seen from an angle that makes it a bit easier to understand.

I'm not sure I would recommend Nation as a good place to start with Pratchett. I felt like I benefited from having read a lot of Discworld to build up my willingness to trust where Pratchett was going. But it has the quality of writing of late Discworld without the (arguable) need to read 25 books to understand all of the backstory. Regardless, recommended, and you'll never hear Twinkle Twinkle Little Star in quite the same way again.

Rating: 8 out of 10

Categories: FLOSS Project Planets

Capellic: How We Broke up Complex Drupal Webforms to Improve the User Experience

Planet Drupal - Thu, 2024-04-25 00:00
Legal forms are often challenging to understand and fill-out. Breaking them into manageable chunks was essential for end users and content editors.
Categories: FLOSS Project Planets

Berlin mega-sprint recap

Planet KDE - Wed, 2024-04-24 20:16

For the past 8 days I’ve been in Berlin for what is technically four sprints: first a two-day KDE e.V. Board of Directors sprint, and then right afterwards, the KDE Goals mega-sprint for the Eco, Accessibility, and Automation/Systematization Goals! Both were hosted in the offices of KDE Patron MBition, a great partner to KDE which uses our software both internally and in some Mercedes cars. Thanks a lot, MBition! It’s been quite a week, but a productive one. So I thought I’d share what we did.

If you’re a KDE e.V. member, you’ve already received an email recap about the Board sprint’s discussion topics and decisions. Overall the organization is healthy and in great shape. Something substantive I can share publicly is that we posed for this wicked sick picture:

Moving onto the combined Goals sprint: this was in fact the first sprint for any of the KDE Goals, and having all three represented in one room attracted a unique cross-section of people from the KDE Eco crowd, usability-interested folks, and deeply technical core KDE developers.

Officially I’m the Goal Champion of the Automation & Systematization Goal, and a number of folks attended to work on those topics, ranging from adding and fixing tests to creating a code quality dashboards. Expect more blog posts from other participants regarding what they worked on!

Speaking personally, I changed the bug janitor bot to direct Debian users on old Plasma versions to Debian’s own Bug tracker—as in fact the Debian folks advise their own users to do. I added an autotest to validate the change, but regrettably it caused a regression anyway, which I corrected quickly. Further investigation into the reason why this went uncaught revealed that all the autotests for the version-based bug janitor actions are faulty. I worked on fixing them but unfortunately have not met with success yet. Further efforts are needed.

In the process of doing this work, I also found that the bug janitor operates on a hardcoded list of Plasma components, which has of course drifted out of sync with reality since it was originally authored. This causes the bot to miss many bugs at the moment.

Fellow sprint participant Tracey volunteered to address this, so I helped get her set up with a development environment for the bug janitor bot so she can auto-generate the list from GitLab and sysadmin repo metadata. This is in progress and proceeding nicely.

I also proposed a merge request template for the plasma-workspace git repo, modeled on the one we currently use in Elisa. The idea is to encourage people to write better merge request descriptions, and also nudge people in the direction of adding autotests for their merge requests, or at least mentioning reviewers can test the changes. If this ends up successful, I have high hopes about rolling it out more broadly.

But I was also there for the other goals too! Joseph delivered a wonderful presentation about KDE Eco topics, which introduced my new favorite cartoon, and it got me thinking about efficiency and performance. For a while I’d been experiencing high CPU usage in KDE’s NeoChat app, and with all the NeoChat developers there in the room, I took the opportunity to prod them with my pointy stick. This isn’t the first time I’d mentioned the performance issue to them, but in the past no one could reproduce it and we had to drop the investigation. Well, this time I think everyone else was also thinking eco thoughts, and they really moved heaven and earth to try to reproduce it. Eventually James was able to, and it wasn’t long before the issue was put six feet under. The result: NeoChat’s background CPU usage is now 0% for people using Intel GPUs, down from 30%. A big win for eco and laptop battery life, which I’m already appreciating as I write this blog post in an airport disconnected from AC power.

To verify the CPU usage, just for laughs I added the Catwalk widget to my panel. It’s so adorable that I haven’t had the heart to remove it, and now I notice things using CPU time when they should be idle much more than I did before. More visibility for performance issues should over time add up to more fixes for them!

  • Good cat!
  • Bad cat!

Another interesting thing happens when you get a bunch of talented KDE contributors in a room: people can’t help but initiate discussions about pressing topics. The result was many conversations about release schedules, dependency policy, visual design vision, and product branding. One discussion very relevant to the sprint was the lack of systematicity in how we use units for spacing in the QtQuick-based UIs we build. This resulted in a proposal that’s already generating some spirited feedback.

All in all it was a happy and productive week, and after doing one of these I always feel super privileged to be able to work with as impressively talented and friendly a group of colleagues as this is:

Full disclosure: KDE e.V. paid for my travel and lodging expenses, but not my döner kebab expenses!

Categories: FLOSS Project Planets

KDE e.V. board meeting

Planet KDE - Wed, 2024-04-24 18:00

Last week was one of the regular KDE e.V. board meetings. We (the board of KDE e.V.) have a video call every week, but twice a year we try to get together and actually put in some full days, actually sit around and laugh, actually listen to Dad jokes and eat food and drink beer or Fritz cola together.

It’s a good addition to our usual workflow.

Berlin is just so much fun (for five days, anyway). I can hop on the train, walk along the Spree to a hotel, walk everywhere – Friday I did 15km – get amazing food for cheap – compared to the Netherlands anyway – and hang out with friends. Some of those friends are the board. The board is some of those friends.

The actual board work is not something to write about all that much. There were HR things and finances and projects and plans and .. you’ll see some of the results on the KDE e.V. website. We also played GeoGuesser and something with WikiData.

One of the HR things I should particularly mention: welcome to Nicole,
our latest employee who is further involved with our sustainable software efforts.

The regular board calls resume next week, and we will meet again in September at Akademy.

Categories: FLOSS Project Planets

ImageX: Drupal 7 vs. Drupal 10: An Objective Visual Comparison of Some Popular Website Features

Planet Drupal - Wed, 2024-04-24 17:05

Authored by: Nadiia Nykolaichuk.

Drupal 7 was released 13 years ago, during the Jurassic period in the world of modern software. However, as of April 2024, there are 322,700+ websites officially listed on drupal.org as still running on Drupal 7, which sadly makes it the #1 installed Drupal major core version but, luckily, this number is steadily reducing.

Categories: FLOSS Project Planets

Sumana Harihareswara - Cogito, Ergo Sumana: Model UX Research & Design Docs for Command-Line Open Source

Planet Python - Wed, 2024-04-24 12:42
Model UX Research & Design Docs for Command-Line Open Source
Categories: FLOSS Project Planets

FSF Blogs: What role community plays in free software and more -- Interview with David Wilson

GNU Planet! - Wed, 2024-04-24 12:35
Can't wait for the start of LibrePlanet 2024: Cultivating Community? Same here. To sweeten the wait, we have interviewed David Wilson, one of the keynote speakers and the creator of the System Crafters channel and community.
Categories: FLOSS Project Planets

What role community plays in free software and more -- Interview with David Wilson

FSF Blogs - Wed, 2024-04-24 12:35
Can't wait for the start of LibrePlanet 2024: Cultivating Community? Same here. To sweeten the wait, we have interviewed David Wilson, one of the keynote speakers and the creator of the System Crafters channel and community.
Categories: FLOSS Project Planets

Real Python: What's Lazy Evaluation in Python?

Planet Python - Wed, 2024-04-24 10:00

Being lazy is not always a bad thing. Every line of code you write has at least one expression that Python needs to evaluate. Python lazy evaluation is when Python takes the lazy option and delays working out the value returned by an expression until that value is needed.

An expression in Python is a unit of code that evaluates to a value. Examples of expressions include object names, function calls, expressions with arithmetic operators, literals that create built-in object types such as lists, and more. However, not all statements are expressions. For example, if statements and for loop statements don’t return a value.

Python needs to evaluate every expression it encounters to use its value. In this tutorial, you’ll learn about the different ways Python evaluates these expressions. You’ll understand why some expressions are evaluated immediately, while others are evaluated later in the program’s execution. So, what’s lazy evaluation in Python?

Get Your Code: Click here to download the free sample code that shows you how to use lazy evaluation in Python.

Take the Quiz: Test your knowledge with our interactive “What's Lazy Evaluation in Python?” quiz. You’ll receive a score upon completion to help you track your learning progress:

Interactive Quiz

What's Lazy Evaluation in Python?

In this quiz, you'll test your understanding of the differences between lazy and eager evaluation in Python. By working through this quiz, you'll revisit how Python optimizes memory use and computational overhead by deciding when to compute values.

In Short: Python Lazy Evaluation Generates Objects Only When Needed

An expression evaluates to a value. However, you can separate the type of evaluation of expressions into two types:

  1. Eager evaluation
  2. Lazy evaluation

Eager evaluation refers to those cases when Python evaluates an expression as soon as it encounters it. Here are some examples of expressions that are evaluated eagerly:

Python 1>>> 5 + 10 215 3 4>>> import random 5>>> random.randint(1, 10) 64 7 8>>> [2, 4, 6, 8, 10] 9[2, 4, 6, 8, 10] 10>>> numbers = [2, 4, 6, 8, 10] 11>>> numbers 12[2, 4, 6, 8, 10] Copied!

Interactive environments, such as the standard Python REPL used in this example, display the value of an expression when the line only contains the expression. This code section shows a few examples of statements and expressions:

  • Lines 1 and 2: The first example includes the addition operator +, which Python evaluates as soon as it encounters it. The REPL shows the value 15.
  • Lines 4 to 6: The second example includes two lines:
    • The import statement includes the keyword import followed by the name of a module. The module name random is evaluated eagerly.
    • The function call random.randint() is evaluated eagerly, and its value is returned immediately. All standard functions are evaluated eagerly. You’ll learn about generator functions later, which behave differently.
  • Lines 8 to 12: The final example has three lines of code:
    • The literal to create a list is an expression that’s evaluated eagerly. This expression contains several integer literals, which are themselves expressions evaluated immediately.
    • The assignment statement assigns the object created by the list literal to the name numbers. This statement is not an expression and doesn’t return a value. However, it includes the list literal on the right-hand side, which is an expression that’s evaluated eagerly.
    • The final line contains the name numbers, which is eagerly evaluated to return the list object.

The list you create in the final example is created in full when you define it. Python needs to allocate memory for the list and all its elements. This memory won’t be freed as long as this list exists in your program. The memory allocation in this example is small and won’t impact the program. However, larger objects require more memory, which can cause performance issues.

Lazy evaluation refers to cases when Python doesn’t work out the values of an expression immediately. Instead, the values are returned at the point when they’re required in the program. Lazy evaluation can also be referred to as call-by-need.

This delay of when the program evaluates an expression delays the use of resources to create the value, which can improve the performance of a program by spreading the time-consuming process across a longer time period. It also prevents values that will not be used in the program from being generated. This can occur when the program terminates or moves to another part of its execution before all the generated values are used.

When large datasets are created using lazily-evaluated expressions, the program doesn’t need to use memory to store the data structure’s contents. The values are only generated when they’re needed.

An example of lazy evaluation occurs within the for loop when you iterate using range():

Python for index in range(1, 1_000_001): print(f"This is iteration {index}") Copied!

The built-in range() is the constructor for Python’s range object. The range object does not store all of the one million integers it represents. Instead, the for loop creates a range_iterator from the range object, which generates the next number in the sequence when it’s needed. Therefore, the program never needs to have all the values stored in memory at the same time.

Lazy evaluation also allows you to create infinite data structures, such as a live stream of audio or video data that continuously updates with new information, since the program doesn’t need to store all the values in memory at the same time. Infinite data structures are not possible with eager evaluation since they can’t be stored in memory.

There are disadvantages to deferred evaluation. Any errors raised by an expression are also deferred to a later point in the program. This delay can make debugging harder.

The lazy evaluation of the integers represented by range() in a for loop is one example of lazy evaluation. You’ll learn about more examples in the following section of this tutorial.

What Are Examples of Lazy Evaluation in Python?

In the previous section, you learned about using range() in a for loop, which leads to lazy evaluation of the integers represented by the range object. There are other expressions in Python that lead to lazy evaluation. In this section, you’ll explore the main ones.

Other Built-In Data Types Read the full article at https://realpython.com/python-lazy-evaluation/ »

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Tag1 Consulting: Source Site Audit - A High Level Overview

Planet Drupal - Wed, 2024-04-24 09:00

One of the most impactful things to do in preparation for any migration project is to understand the source site. The more insight you gather into how the current site was built, the better equipped you will be to perform the migration.

Read more mauricio Wed, 04/24/2024 - 06:00
Categories: FLOSS Project Planets

Programiz: Getting Started with Python

Planet Python - Wed, 2024-04-24 06:44
In this tutorial, you will learn to write your first Python program.
Categories: FLOSS Project Planets

Russell Coker: Source Code With Emoji

Planet Debian - Wed, 2024-04-24 04:03

The XKCD comic Code Quality [1] inspired me to test out emoji in source. I really should have done this years ago when that XKCD was first published.

The following code compiles in gcc and runs in the way that anyone who wants to write such code would want it to run. The hover text in the XKCD comic is correct. You could have a style guide for such programming, store error messages in the doctor and nurse emoji for example.

#include <stdio.h> int main() {   int 😇 = 1, 😈 = 2;   printf("😇=%d, 😈=%d\n", 😇, 😈);   return 0; }

To get this to display correctly in Debian you need to install the fonts-noto-color-emoji package (used by the KDE emoji picker that runs when you press Windows-. among other things) and restart programs that use emoji. The Konsole terminal emulator will probably need it’s profile settings changed to work with this if you ran Konsole before installing fonts-noto-color-emoji. The Kitty terminal emulator works if you restart it after installing fonts-noto-color-emoji.

This web page gives a list of HTML codes for emoji [2]. If I start writing real code with emoji variable names then I’ll have to update my source to HTML conversion script (which handles <>" and repeated spaces) to convert emoji.

I spent a couple of hours on this and I think it’s worth it. I have filed several Debian bug reports about improvements needed to issues related to emoji.

Related posts:

  1. Fat Finger Shell I’ve been trying out the Fat Finger Shell which is...
  2. source dump blog Inspired by Julien Goodwin‘s post I created a new blog...
  3. Tithing for Open Source It’s common to hear a complaint of the form “I...
Categories: FLOSS Project Planets

Talking Drupal: Skills Upgrade #8

Planet Drupal - Wed, 2024-04-24 04:00

Welcome back to “Skills Upgrade” a Talking Drupal mini-series following the journey of a D7 developer learning D10. This is episode 8.

Topics Resources

Chad's Drupal 10 Learning Curriclum & Journal Chad's Drupal 10 Learning Notes

The Linux Foundation is offering a discount of 30% off e-learning courses, certifications and bundles with the code, all uppercase DRUPAL24 and that is good until June 5th https://training.linuxfoundation.org/certification-catalog/

Hosts

AmyJune Hineline - @volkswagenchick

Guests

Chad Hester - chadkhester.com @chadkhest Mike Anello - DrupalEasy.com @ultimike

Categories: FLOSS Project Planets

Talk Python to Me: #458: Serverless Python in 2024

Planet Python - Wed, 2024-04-24 04:00
What is the state of serverless computing and Python in 2024? What are some of the new tools and best practices? We are lucky to have Tony Sherman who has a lot of practical experience with serverless programming on the show.<br/> <br/> <strong>Episode sponsors</strong><br/> <br/> <a href='https://talkpython.fm/sentry'>Sentry Error Monitoring, Code TALKPYTHON</a><br> <a href='https://talkpython.fm/mailtrap'>Mailtrap</a><br> <a href='https://talkpython.fm/training'>Talk Python Courses</a><br/> <br/> <strong>Links from the show</strong><br/> <br/> <div><b>Tony Sherman on Twitter</b>: <a href="https://twitter.com/tsh3rman" target="_blank" rel="noopener">twitter.com</a><br/> <b>Tony Sherman</b>: <a href="https://www.linkedin.com/in/tony--sherman/" target="_blank" rel="noopener">linkedin.com</a><br/> <b>PyCon serverless talk</b>: <a href="https://www.youtube.com/watch/2SZ6Wks5iK4" target="_blank" rel="noopener">youtube.com</a><br/> <b>AWS re:Invent talk</b>: <a href="https://www.youtube.com/watch?v=52W3Qyg242Y" target="_blank" rel="noopener">youtube.com</a><br/> <b>Powertools for AWS Lambda</b>: <a href="https://docs.powertools.aws.dev/lambda/python/latest/" target="_blank" rel="noopener">docs.powertools.aws.dev</a><br/> <b>Pantsbuild: The ergonomic build system</b>: <a href="https://www.pantsbuild.org/" target="_blank" rel="noopener">pantsbuild.org</a><br/> <b>aws-lambda-power-tuning</b>: <a href="https://github.com/alexcasalboni/aws-lambda-power-tuning" target="_blank" rel="noopener">github.com</a><br/> <b>import-profiler</b>: <a href="https://github.com/cournape/import-profiler" target="_blank" rel="noopener">github.com</a><br/> <b>AWS Fargate</b>: <a href="https://aws.amazon.com/fargate/" target="_blank" rel="noopener">aws.amazon.com</a><br/> <b>Run functions on demand. Scale automatically.</b>: <a href="https://www.digitalocean.com/products/functions" target="_blank" rel="noopener">digitalocean.com</a><br/> <b>Vercel</b>: <a href="https://vercel.com/docs/functions/serverless-functions/runtimes/python" target="_blank" rel="noopener">vercel.com</a><br/> <b>Deft</b>: <a href="https://deft.com" target="_blank" rel="noopener">deft.com</a><br/> <b>37 Signals We stand to save $7m over five years from our cloud exit</b>: <a href="https://world.hey.com/dhh/we-stand-to-save-7m-over-five-years-from-our-cloud-exit-53996caa" target="_blank" rel="noopener">world.hey.com</a><br/> <b>The Global Content Delivery Platform That Truly Hops</b>: <a href="https://bunny.net" target="_blank" rel="noopener">bunny.net</a><br/> <b>Watch this episode on YouTube</b>: <a href="https://www.youtube.com/watch?v=A5bbUq-ZJh0" target="_blank" rel="noopener">youtube.com</a><br/> <br/> <b>--- Stay in touch with us ---</b><br/> <b>Subscribe to us on YouTube</b>: <a href="https://talkpython.fm/youtube" target="_blank" rel="noopener">youtube.com</a><br/> <b>Follow Talk Python on Mastodon</b>: <a href="https://fosstodon.org/web/@talkpython" target="_blank" rel="noopener"><i class="fa-brands fa-mastodon"></i>talkpython</a><br/> <b>Follow Michael on Mastodon</b>: <a href="https://fosstodon.org/web/@mkennedy" target="_blank" rel="noopener"><i class="fa-brands fa-mastodon"></i>mkennedy</a><br/></div>
Categories: FLOSS Project Planets

Russell Coker: Ubuntu 24.04 and Bubblewrap

Planet Debian - Wed, 2024-04-24 03:46

When using Bubblewrap (the bwrap command) to create a container in Ubuntu 24.04 you can expect to get one of the following error messages:

bwrap: loopback: Failed RTM_NEWADDR: Operation not permitted bwrap: setting up uid map: Permission denied

This is due to Ubuntu developers deciding to use Apparmor to restrict the creation of user namespaces. Here is a Ubuntu blog post about it [1].

To resolve that you could upgrade to SE Linux, but the other option is to create a file named /etc/apparmor.d/bwrap with the following contents:

abi <abi/4.0>, include <tunables/global> profile bwrap /usr/bin/bwrap flags=(unconfined) { userns, # Site-specific additions and overrides. See local/README for details. include if exists <local/bwrap> }

Then run “systemctl reload apparmor“.

Related posts:

  1. Sandboxing Phone Apps As a follow up to Wayland [1]: A difficult problem...
  2. Kernel issues with Debian Xen and CentOS Kernels Last time I tried using a Debian 64bit Xen kernel...
  3. systemd-nspawn and Private Networking Currently there’s two things I want to do with my...
Categories: FLOSS Project Planets

Pages