Feeds

Andy Wingo: are ephemerons primitive?

GNU Planet! - Mon, 2022-11-28 16:11

Good evening :) A quick note, tonight: I've long thought that ephemerons are primitive and can't be implemented with mark functions and/or finalizers, but today I think I have a counterexample.

For context, one of the goals of the GC implementation I have been working on on is to replace Guile's current use of the Boehm-Demers-Weiser (BDW) conservative collector. Of course, changing a garbage collector for a production language runtime is risky, and for Guile one of the mitigation strategies for this work is that the new collector is behind an abstract API whose implementation can be chosen at compile-time, without requiring changes to user code. That way we can first switch to BDW-implementing-the-new-GC-API, then switch the implementation behind that API to something else.

Abstracting GC is a tricky problem to get right, and I thank the MMTk project for showing that this is possible -- you have user-facing APIs that need to be implemented by concrete collectors, but also extension points so that the user can provide some compile-time configuration too, for example to provide field-tracing visitors that take into account how a user wants to lay out objects.

Anyway. As we discussed last time, ephemerons are usually have explicit support from the GC, so we need an ephemeron abstraction as part of the abstract GC API. The question is, can BDW-GC provide an implementation of this API?

I think the answer is "yes, but it's very gnarly and will kill performance so bad that you won't want to do it."

the contenders

Consider that the primitives that you get with BDW-GC are custom mark functions, run on objects when they are found to be live by the mark workers; disappearing links, a kind of weak reference; and finalizers, which receive the object being finalized, can allocate, and indeed can resurrect the object.

BDW-GC's finalizers are a powerful primitive, but not one that is useful for implementing the "conjunction" aspect of ephemerons, as they cannot constrain the marker's idea of graph connectivity: a finalizer can only prolong the life of an object subgraph, not cut it short. So let's put finalizers aside.

Weak references have a tantalizingly close kind of conjunction property: if the weak reference itself is alive, and the referent is also otherwise reachable, then the weak reference can be dereferenced. However this primitive only involves the two objects E and K; there's no way to then condition traceability of a third object V to E and K.

We are left with mark functions. These are an extraordinarily powerful interface in BDW-GC, but somewhat expensive also: not inlined, and going against the grain of what BDW-GC is really about (heaps in which the majority of all references are conservative). But, OK. They way they work is, your program allocates a number of GC "kinds", and associates mark functions with those kinds. Then when you allocate objects, you use those kinds. BDW-GC will call your mark functions when tracing an object of those kinds.

Let's assume firstly that you have a kind for ephemerons; then when you go to mark an ephemeron E, you mark the value V only if the key K has been marked. Problem solved, right? Only halfway: you also have to handle the case in which E is marked first, then K. So you publish E to a global hash table, and... well. You would mark V when you mark a K for which there is a published E. But, for that you need a hook into marking V, and V can be any object...

So now we assume additionally that all objects are allocated with user-provided custom mark functions, and that all mark functions check if the marked object is in the published table of pending ephemerons, and if so marks values. This is essentially what a proper ephemeron implementation would do, though there are some optimizations one can do to avoid checking the table for each object before the mark stack runs empty for the first time. In this case, yes you can do it! Additionally if you register disappearing links for the K field in each E, you can know if an ephemeron E was marked dead in a previous collection. Add a pre-mark hook (something BDW-GC provides) to clear the pending ephemeron table, and you are in business.

yes, but no

So, it is possible to implement ephemerons with just custom mark functions. I wouldn't want to do it, though: missing the mostly-avoid-pending-ephemeron-check optimization would be devastating, and really what you want is support in the GC implementation. I think that for the BDW-GC implementation in whippet I'll just implement weak-key associations, in which the value is always marked strongly unless the key was dead on a previous collection, using disappearing links on the key field. That way a (possibly indirect) reference from a value V to a key K can indeed keep K alive, but oh well: it's a conservative approximation of what should happen, and not worse than what Guile has currently.

Good night and happy hacking!

Categories: FLOSS Project Planets

Sam Hartman: Introducing Carthage

Planet Debian - Mon, 2022-11-28 15:59

For the past four years, I’ve been working on Carthage, a free-software Infrastructure as Code framework. We’ve finally reached a point where it makes sense to talk about Carthage and what it can do. This is the first in a series of blog posts to introduce Carthage, discuss what it can do and show how it works.

Why Another IAC Framework

It seems everywhere you look, there are products designed to support the IAC pattern. On the simple side, you could check a Containerfile into Git. Products like Terraform and Vagrant allow you to template cloud infrastructure and VMs. There are more commercial offerings than I can keep up with.

We were disappointed by what was out there when we started Carthage. Other products have improved, but for many of our applications we’re happy with what Carthage can build. The biggest challenge we ran into is that products wanted us to specify things at the wrong level. For some of our cyber training work we wanted to say things like “We want 3 blue teams, each with a couple defended networks, a red team, and some neutral infrastructure for red to exploit.” Yet the tools we were trying to use wanted to lay things out at the individual machine/container level. We found ourselves contemplating writing a program to generate input for some other IAC tool.

Things were worse for our internal testing. Sometimes we’d be shipping hardware to a customer. But sometimes we’d be virtualizing that build out in a lab. Sometimes we’d be doing a mixture. So we wanted to completely separate the descriptions of machines, networks, and software from any of the information about whether that was realized on hardware, VMs, containers, or a mixture.

Dimensional Breakdown

In discussing Carthage with Enrico Zini, he pointed me at Cognitive Dimensions of notation as a way to think about how Carthage approaches the IAC problem. I’m more interested in the idea of breaking down a design along the idea of dimensions that allow examining the design space than I am particular adherence to Green’s original dimensions.

Low Viscosity, High Abstraction Reuse

One of the guiding principles is that we want to be able to reuse different components at different scales and in different environments. These include being able to do things like:

  • Define an operation like “Update a Debian system” and apply that in several environments including as part of building a base VM or container image, applying to an independently managed machine, or applying to a micro service container that does not run services like ssh or systemd.

  • Defining a role like DNS server that can be applied to a dedicated machine only having that role, to a traditional server with multiple roles, or in a micro service environment.

  • Allowing people to write groups of functionality that can be useful in descriptions of a small number of machines, but can also be reused in large environments like modeling of cyber infrastructure to defend. In the small environments, things are simplified, but in larger environments integration like directories, authentication infrastructure and the like is needed.

  • Allow grouping of functionality at multiple levels. So far I have talked about grouping of software to be installed on a single machine or container. We also want to allow groups of containers (pods or otherwise), groups of machines, groups of networks, or even enclaves (think a model of an entire company or section of a company). Each kind of grouping needs to be parametric and reusable.

Hidden Dependencies

To accomplish these abstraction goals, dependencies need to be non-local. For example, a software role might need to integrate with a directory if a directory is present in the environment. When writing the role, no one is going to know which directory to use, nor whether a directory is present. Taking that as an explicit input into the role is error-prone when the role is combined into large abstract units (bigger roles or collections of machines). Instead it is better to have a non-local dependency, and to find the directory if it is available. We accomplish this using dependency injection.

In addition to being non-local, dependencies are sometimes hidden. It is very easy to overwhelm our cognitive capacity with even a fairly simple IAC description. An effective notation allows us to focus on the parts that matter when working with a particular part of the description. I’ve found hiding dependencies, especially indirect dependencies, to be essential in building complex descriptions.

Obviously, tools are required for examining these dependencies as part of debugging.

First Class Modeling

Clearly one of the goals of IAC descriptions is to actually build and manage infrastructure. It turns out that there are all sorts of things you want to do with the description well before you instantiate the infrastructure. You might want to query the description to build network diagrams, understand interdependencies, or even build inventory/bill of materials. We often find ourselves building Ansible inventory, switch configurations, DNS zones, and all sorts of configuration artifacts. These artifacts may be installed into infrastructure that is instantiated by the description, but they may be consumed in other ways. Allowing the artifacts to be consumed externally means that you can avoid pre-commitment and focus on whatever part of the description you originally want to work on. You may use an existing network at first. Later the IAC description may replace that, or perhaps it never will.

As a result, Carthage separates modeling from instantiation. The model can generally be built and queried without needing to interact with clouds, VMs, or containers. We’ve actually found it useful to build Carthage layouts that cannot ever be fully instantiated, for example because they never specify details like whether a model should be instantiated on a container or VM, or what kind of technology will realize a modeled network. This allows developing roles before the machines that use them or focusing on how machines will interact and how the network will be laid out before the details of installing on specific hardware.

The modeling separation is by far the difference I value most between Carthage and other systems.

A Tool for Experts.

In Neal Stephenson’s essay “In the Beginning… Was the Command Line”, Stephenson points out that the kind of tools experts need are not the same tools that beginners need. The illustration of why a beginner might not be satisfied with a Hole Hog drill caught my attention. Carthage is a tool for experts. Despite what cloud providers will tell you, IAC is not easy. Doubly so when you start making reusable components. Trying to hide that or focus on making things easy to get started can make it harder for experts to efficiently solve the problems they are facing. When we have faced trade offs between making Carthage easy to pick up and making it powerful for expert users, we have chosen to support the experts.

That said, Carthage today is harder to pick up than it needs to be. It’s a relatively new project with few external users as of this time. Our documentation and examples need improvement, just like every project at this level of maturity. Similarly, as the set of things people try to do expand, we will doubtless run into bugs that our current test cases don’t cover. So Carthage absolutely will get easier to learn and use than it is today.

Also, we’ve already had success building beginner-focused applications on top of Carthage. For our cyber training, we built web applications on top of Carthage that made rebuilding and exploring infrastructure easy. We’ve had success using relatively understood tools like Ansible as integration and customization points for Carthage layouts. But in all these cases, when the core layout had significant reusable components and significant complexity in the networking, only an IAC expert was going to be able to maintain and develop that layout.

What Carthage can do.

Carthage has a number of capabilities today. One of Carthage’s strengths is its extensible design. Abstract interfaces make it easy to add new virtualization platforms, cloud services, and support for various ways of managing real hardware. This approach has been validated by incrementally adding support for virtualization architectures and cloud services. As development has progressed, adding new integrations continues to get faster because we are able to reuse existing infrastructure.

Today, Carthage can model:

  • Machines
  • Networks
  • Dynamically compose groupings of the above
  • Generate model level artifacts
    • Ansible inventory
    • Various DNS integrations
    • Various switch configurations

Carthage has excellent facilities for dealing with images on which VMs and Containers can be based, although it does have a bit of a Debian/Ubuntu bias in how it thinks about images:

  • Building base images from a tool like debootstrap
  • Customizing these images
  • Converting into VM images for kvm, VMware, and AWS
  • Building from scratch OCI images for Podman, Docker and k8s
  • Adding layers to existing OCI images

When instantiating infrastructure, Carthage can work with:

  • systemd nspawn containers
  • Podman (Docker would be easy)
  • Libvirt
  • VMware
  • With the AWS plugin, EC2 VMs and networking

We have also looked at Oracle Cloud and I believe Openstack, although that code is not merged.

Future posts will talk about core Carthage concepts and how to use Carthage to build infrastructure.



comments
Categories: FLOSS Project Planets

Łukasz Langa: I built an AM5 PC for Python-related things

Planet Python - Mon, 2022-11-28 15:21

16 cores, 128 GB of RAM, an RTX 3090. Sounds great but… don’t be an early adopter. While I’m pretty stoked about what this new machine lets me accomplish, it was quite painful to get it running.

Categories: FLOSS Project Planets

Talking Drupal: Talking Drupal #375 - Being A Creative Director

Planet Drupal - Mon, 2022-11-28 14:00

Today we are talking about Being a Creative Director with Randy Oest.

For show notes visit: www.talkingDrupal.com/375

Topics
  • What is a Creative Director?
  • How is being a CD at a technical company different?
  • Do Drupal CD’s need additional skills?
  • Sometimes things get lost in translation between design and development how do you bridge that gap?
  • How do you mentor?
  • How do you interview for creative positions?
  • Do you hire developers too?
  • Optimal makeup for a team.
  • Guiding the Four Kitchen’s team
  • Inpiration
Resources Hosts

Nic Laflin - www.nLighteneddevelopment.com @nicxvan John Picozzi - www.epam.com @johnpicozzi Randy Oest - randyoest.com @amazingrando

MOTW Correspondent

Martin Anderson-Clutz - @mandclu ECA ECA is a powerful, versatile, and user-friendly rules engine for Drupal 9+. The core module is a processor that validates and executes event-condition-action plugins. Integrated with graphical user interfaces like BPMN.iO, Camunda, ECA Core Modeller or other possible future modellers, ECA is a robust system for building conditionally triggered action sets.

Categories: FLOSS Project Planets

Steve Kemp: I put an LSP in your LISP ..

Planet Debian - Mon, 2022-11-28 11:45

I recently wrote about yet another lisp I'd been having fun with.

Over the past couple of years I've played with a few toy scripting languages, or random interpreters, and this time I figured I'd do something beyond the minimum, by implementing the Language Server Protocol.

In brief the language server protocol (LSP) is designed to abstract functionality that might be provided by an editor, or IDE, into a small "language server". If the language-server knows how to jump to definitions, provide completion, etc, etc, then the editor doesn't need to implement those things for NN different languages - it just needs to launch and communicate with something that does know how to do the job.

Anyway LSP? LISP? Only one letter different, so that's practically enough reason to have a stab at it.

Thankfully I found a beautiful library that implements a simple framework allowing the easy implementation of a golang-based LSP-serverÖ

Using that I quickly hacked up a server that can provide:

  • Overview of all standard-library functions, on hover.
  • Completion of all standard-library functions.

I've tested this in both GNU Emacs and Neovim, so that means I'm happy I support all editors! (More seriously if it works in two then that probably means that the LSP stuff should work elsewhere too.)

Here's what the "help on hover" looks like, within Emacs:

Vim looks similar but you have to press K to see the wee popup. Still kinda cute, and was a good experiment.

Categories: FLOSS Project Planets

John Goerzen: Flying Joy

Planet Debian - Mon, 2022-11-28 11:04

Wisdom from my 5-year-old: When flying in a small plane, it is important to give your dolls a headset and let them see out the window, too!

Moments like this make me smile at being a pilot dad.

A week ago, I also got to give 8 children and one adult their first ever ride in any kind of airplane, through EAA’s Young Eagles program. I got to hear several say, “Oh wow! It’s SO beautiful!” “Look at all the little houses!”

And my favorite: “How can I be a pilot?”

Categories: FLOSS Project Planets

Łukasz Langa: Weekly Report August 1 - 7

Planet Python - Mon, 2022-11-28 10:07

This week I spent helping 3.11.0rc1 get released. This included both reviewing PRs and helping release blockers get resolved. There were two in particular that I spent most time on so I’ll talk briefly about them now.

Categories: FLOSS Project Planets

Real Python: How to Get a List of All Files in a Directory With Python

Planet Python - Mon, 2022-11-28 09:00

Getting a list of all the files and folders in a directory is a natural first step for many file-related operations in Python. When looking into it, though, you may be surprised to find various ways to go about it.

When you’re faced with many ways of doing something, it can be a good indication that there’s no one-size-fits-all solution to your problems. Most likely, every solution will have its own advantages and trade-offs. This is the case when it comes to getting a list of the contents of a directory in Python.

In this tutorial, you’ll be focusing on the most general-purpose techniques in the pathlib module to list items in a directory, but you’ll also learn a bit about some alternative tools.

Source Code: Click here to download the free source code, directories, and bonus materials that showcase different ways to list files and folders in a directory with Python.

Before pathlib came out in Python 3.4, if you wanted to work with file paths, then you’d use the os module. While this was very efficient in terms of performance, you had to handle all the paths as strings.

Handling paths as strings may seem okay at first, but once you start bringing multiple operating systems into the mix, things get more tricky. You also end up with a bunch of code related to string manipulation, which can get very abstracted from what a file path is. Things can get cryptic pretty quickly.

Note: Check out the downloadable materials for some tests that you can run on your machine. The tests will compare the time it takes to return a list of all the items in a directory using methods from the pathlib module, the os module, and even the future Python 3.12 version of pathlib. That new version includes the well-known walk() function, which you won’t cover in this tutorial.

That’s not to say that working with paths as strings isn’t feasible—after all, developers managed fine without pathlib for many years! The pathlib module just takes care of a lot of the tricky stuff and lets you focus on the main logic of your code.

It all begins with creating a Path object, which will be different depending on your operating system (OS). On Windows, you’ll get a WindowsPath object, while Linux and macOS will return PosixPath:

>>>>>> import pathlib >>> desktop = pathlib.Path("C:/Users/RealPython/Desktop") >>> desktop WindowsPath("C:/Users/RealPython/Desktop") >>>>>> import pathlib >>> desktop = pathlib.Path("/home/RealPython/Desktop") >>> desktop PosixPath('/home/RealPython/Desktop')

With these OS-aware objects, you can take advantage of the many methods and properties available, such as ones to get a list of files and folders.

Note: If you’re interested in learning more about pathlib and its features, then check out Python 3’s pathlib Module: Taming the File System and the pathlib documentation.

Now, it’s time to dive into listing folder contents. Be aware that there are several ways to do this, and picking the right one will depend on your specific use case.

Getting a List of All Files and Folders in a Directory in Python

Before getting started on listing, you’ll want a set of files that matches what you’ll encounter in this tutorial. In the supplementary materials, you’ll find a folder called Desktop. If you plan to follow along, download this folder and navigate to the parent folder and start your Python REPL there:

Source Code: Click here to download the free source code, directories, and bonus materials that showcase different ways to list files and folders in a directory with Python.

You could also use your own desktop too. Just start the Python REPL in the parent directory of your desktop, and the examples should work, but you’ll have your own files in the output instead.

Note: You’ll mainly see WindowsPath objects as outputs in this tutorial. If you’re following along on Linux or macOS, then you’ll see PosixPath instead. That’ll be the only difference. The code you write is the same on all platforms.

If you only need to list the contents of a given directory, and you don’t need to get the contents of each subdirectory too, then you can use the Path object’s .iterdir() method. If your aim is to move through directories and subdirectories recursively, then you can jump ahead to the section on recursive listing.

The .iterdir() method, when called on a Path object, returns a generator that yields Path objects representing child items. If you wrap the generator in a list() constructor, then you can see your list of files and folders:

>>>>>> import pathlib >>> desktop = pathlib.Path("Desktop") >>> # .iterdir() produces a generator >>> desktop.iterdir() <generator object Path.iterdir at 0x000001A8A5110740> >>> # Which you can wrap in a list() constructor to materialize >>> list(desktop.iterdir()) [WindowsPath('Desktop/Notes'), WindowsPath('Desktop/realpython'), WindowsPath('Desktop/scripts'), WindowsPath('Desktop/todo.txt')]

Passing the generator produced by .iterdir() to the list() constructor provides you with a list of Path objects representing all the items in the Desktop directory.

As with all generators, you can also use a for loop to iterate over each item that the generator yields. This gives you the chance to explore some of the properties of each object:

>>>>>> desktop = pathlib.Path("Desktop") >>> for item in desktop.iterdir(): ... print(f"{item} - {'dir' if item.is_dir() else 'file'}") ... Desktop\Notes - dir Desktop\realpython - dir Desktop\scripts - dir Desktop\todo.txt - file Read the full article at https://realpython.com/get-all-files-in-directory-python/ »

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Python for Beginners: Sort a Pandas Series in Python

Planet Python - Mon, 2022-11-28 09:00

Pandas series is used to handle sequential data in python. In this article, we will discuss different ways to sort a pandas series in Python. 

Table of Contents
  1. Sort a series using the sort_values() method
  2. Sort a Series in Ascending Order in Python
  3. Sort a Pandas Series in Descending Order
  4. Sort a Series Having NaN Values in Python
  5. Sort a Series Inplace in Python
  6. Sort a Pandas Series Using a Key
  7. The sort_index() Method in Python
  8. Sort a Pandas Series by Index in Ascending Order
  9. Sort a Series by Index in Descending Order in Python
  10. Sort a Pandas Series by Index Having NaN Values
  11. Sort a Series by Index Inplace in Python
  12. Sort a Pandas Series by Index Using a Key in Python
  13. Conclusion
Sort a series using the sort_values() method

You can sort a pandas series using the sort_values() method. It has the following syntax.

Series.sort_values(*, axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last', ignore_index=False, key=None)

Here, 

  • The axis parameter is used to decide if we want to sort a dataframe by a column or row. For series, the axis parameter is unused. It is defined just for the compatibility of the sort_values() method with pandas dataframes.
  • By default, the sort_values() method sorts a series in ascending order. If you want to sort a series in descending order, you can set the ascending parameter to True. 
  • After execution, the sort_values() method returns the sorted series. It doesn’t modify the original series. To sort and modify the original series instead of creating a new series, you can set the inplace parameter to True.
  • The kind parameter is used to determine the sorting algorithm. By default, the “quicksort” algorithm is used. If your data has a specific pattern where another sorting algorithm can be efficient, you can use  ‘mergesort’, ‘heapsort’, or ‘stable’ sorting algorithm.
  • The na_position parameter is used to determine the position of NaN values in the sorted series. By default, the NaN values are stored at the last of the sorted series. You can set the na_position parameter to “first” to store the NaN values at the top of the sorted series.  
  • When we sort a series, the index of all the values is shuffled when the values are sorted. Due to this, the indices in the sorted series are in no order. If you want to reset the indices after sorting, you can set the ignore_index parameter to True.
  • The key parameter is used to perform operations on the series before sorting. It takes a vectorized function as its input argument. The function provided to the key parameter must take a pandas series as its input argument and return a pandas series. Before sorting, the function is applied to the series. The values in the output of the function are then used to sort the series. 
Sort a Series in Ascending Order in Python

To sort a series in ascending order, you can use the sort_values() method on the series object as shown in the following example.

import pandas as pd numbers=[12,34,11,25,27,8,13] series=pd.Series(numbers) print("The original series is:") print(series) sorted_series=series.sort_values() print("The sorted series is:") print(sorted_series)

Output:

The original series is: 0 12 1 34 2 11 3 25 4 27 5 8 6 13 dtype: int64 The sorted series is: 5 8 2 11 0 12 6 13 3 25 4 27 1 34 dtype: int64

In the above example, we have first created a pandas series of 7 numbers. Then, we have sorted the series using the sort_values() method.

You can observe that the indices are also shuffled with the values in the series when it is sorted. To reset the index, you can set the ignore_index parameter to True as shown below.

import pandas as pd numbers=[12,34,11,25,27,8,13] series=pd.Series(numbers) print("The original series is:") print(series) sorted_series=series.sort_values(ignore_index=True) print("The sorted series is:") print(sorted_series)

Output:

The original series is: 0 12 1 34 2 11 3 25 4 27 5 8 6 13 dtype: int64 The sorted series is: 0 8 1 11 2 12 3 13 4 25 5 27 6 34 dtype: int64

In this example, you can observe that the series returned by the sort_values() method has indices starting from 0 till 6 instead of shuffled indices.

Sort a Pandas Series in Descending Order

To sort a pandas series in descending order, you can set the ascending parameter in the sort_values() parameter to False. After execution, the sort_values() method will return a series sorted in descending order. You can observe this in the following example.

import pandas as pd numbers=[12,34,11,25,27,8,13] series=pd.Series(numbers) print("The original series is:") print(series) sorted_series=series.sort_values(ascending=False,ignore_index=True) print("The sorted series is:") print(sorted_series)

Output:

The original series is: 0 12 1 34 2 11 3 25 4 27 5 8 6 13 dtype: int64 The sorted series is: 0 34 1 27 2 25 3 13 4 12 5 11 6 8 dtype: int64

In the above example, we have set the ascending parameter in the sort_values() method to False. Hence, after execution of the sort_values() method, we get a series that is sorted in descending order.

Sort a Series Having NaN Values in Python

To sort a pandas series with NaN values, you just need to invoke the sort_values() method on the pandas series as shown in the following example.

import pandas as pd import numpy as np numbers=[12,np.nan,11,np.nan,27,-8,13] series=pd.Series(numbers) print("The original series is:") print(series) series.sort_values(inplace=True,ignore_index=True) print("The sorted series is:") print(series)

Output:

The original series is: 0 12.0 1 NaN 2 11.0 3 NaN 4 27.0 5 -8.0 6 13.0 dtype: float64 The sorted series is: 0 -8.0 1 11.0 2 12.0 3 13.0 4 27.0 5 NaN 6 NaN dtype: float64

In this example, you can observe that the series contains NaN values. Hence, the sort_values() method puts the NaN values at the last of a sorted series by default. If you want the NaN values at the start of the sorted series, you can set the na_position parameter to “first” as shown below.

import pandas as pd import numpy as np numbers=[12,np.nan,11,np.nan,27,-8,13] series=pd.Series(numbers) print("The original series is:") print(series) series.sort_values(inplace=True,ignore_index=True,na_position="first") print("The sorted series is:") print(series)

Output:

The original series is: 0 12.0 1 NaN 2 11.0 3 NaN 4 27.0 5 -8.0 6 13.0 dtype: float64 The sorted series is: 0 NaN 1 NaN 2 -8.0 3 11.0 4 12.0 5 13.0 6 27.0 dtype: float64

In the above two examples, you can observe that the datatype of the series is set to float64 unlike the prior examples where the data type of the series was set to int64. This is due to the reason that NaN values are considered floating point data type in python. Hence, all the numbers are typecast to most compatible data type.

Sort a Series Inplace in Python

In the above examples, you can observe that the original series isn’t modified and we get a new sorted series. If you want to sort the series inplace, you can set the inplace parameter to True as shown below.

import pandas as pd numbers=[12,34,11,25,27,8,13] series=pd.Series(numbers) print("The original series is:") print(series) series.sort_values(inplace=True,ignore_index=True) print("The sorted series is:") print(series)

Output:

The original series is: 0 12 1 34 2 11 3 25 4 27 5 8 6 13 dtype: int64 The sorted series is: 0 8 1 11 2 12 3 13 4 25 5 27 6 34 dtype: int64

In this example, we have set the inplace parameter to True in the sort_values() method. Hence, after execution of the sort_values() method, the original series is sorted instead of creating a new pandas series. In this case, the sort_values() method returns None.

Sort a Pandas Series Using a Key

By default, the values in the series are used for sorting. Now, suppose that you want to sort the series based on the magnitude of the values instead of their actual values. For this, you can use the keys parameter.

We will pass the abs() function to the key parameter of the sort_values() method. After this, the values of the series will be sorted by their magnitude. You can observe this in the following example.

import pandas as pd numbers=[12,-34,11,-25,27,-8,13] series=pd.Series(numbers) print("The original series is:") print(series) series.sort_values(inplace=True,ignore_index=True,key=abs) print("The sorted series is:") print(series)

Output:

The original series is: 0 12 1 -34 2 11 3 -25 4 27 5 -8 6 13 dtype: int64 The sorted series is: 0 -8 1 11 2 12 3 13 4 -25 5 27 6 -34 dtype: int64

In this example, we have a series of positive and negative numbers. Now, to sort the pandas series using the absolute value of the numbers, we have used the key parameter in the sort_values() method. In the key parameter, we have passed the abs() function.

When the sort_values() method is executed, the elements of the series are first passed to the abs() function. The values returned by the abs() function are then used to compare the elements for sorting the series. This is why we get the series in which the elements are sorted by absolute value instead of actual value.

Suggested Reading: If you are into machine learning, you can read this article on regression in machine learning. You might also like this article on clustering mixed data types in Python.

The sort_index() Method in Python

Instead of sorting a series using the values, we can also sort it using the row indices. For this, we can use the sort_index() method. It has the following syntax.

Series.sort_index(*, axis=0, level=None, ascending=True, inplace=False, kind='quicksort', na_position='last', sort_remaining=True, ignore_index=False, key=None)

Here,

  • The axis parameter is unused in a similar manner to the sort_values() method.
  • The level parameter is used to sort the series by a certain index level when there are multilevel indices. To sort the series by multiple index levels in a specific order, you can pass the list of levels to the level parameter in the same order. 
  • By default, the series object is sorted by index values in ascending order. If you want the indices to be in descending order in the output dataframe, you can set the ascending parameter to False. 
  • After execution, the sort_values() method returns the sorted series. To sort and modify the original series by index instead of creating a new series, you can set the inplace parameter to True.
  • The kind parameter is used to determine the sorting algorithm. By default, the “quicksort” algorithm is used. If the index values are in a specific pattern where another sorting algorithm can be efficient, you can use  ‘mergesort’, ‘heapsort’, or ‘stable’ sorting algorithm.
  • The na_position parameter is used to determine the position of NaN indices in the sorted series. By default, the NaN indices are stored at the last of the sorted series. You can set the na_position parameter to “first” to store the NaN indices at the top of the sorted series.  
  • The sort_index() method sorts the indices in a specific order (ascending or descending). After sorting the indices, if you want to reset the index of the series, you set the ignore_index parameter to True. 
  • The key parameter is used to perform operations on the index of the series before sorting. It takes a vectorized function as its input argument. The function provided to the key parameter must take the index as its input argument and return a pandas series. Before sorting, the function is applied to the index. The values in the output of the function are then used to sort the series. 
Sort a Pandas Series by Index in Ascending Order

To sort a pandas series by index in ascending order, you can invoke the sort_index() method on the series object as shown in the following example.

import pandas as pd import numpy as np letters=["a","b","c","ab","abc","abcd","bc","d"] numbers=[3,23,11,14,16,2,45,65] series=pd.Series(letters) series.index=numbers print("The original series is:") print(series) sorted_series=series.sort_index() print("The sorted series is:") print(sorted_series)

Output:

The original series is: 3 a 23 b 11 c 14 ab 16 abc 2 abcd 45 bc 65 d dtype: object The sorted series is: 2 abcd 3 a 11 c 14 ab 16 abc 23 b 45 bc 65 d dtype: object

In this example, we have series of strings with numbers as index. As we have used the sort_index() method on the pandas series to sort it, the series is sorted by index values. Hence, we get a series where the index values are sorted.

After sorting, if you want to reset the index of the output dataframe, you can set the ignore_index parameter to True in the sort_index() method as shown below.

import pandas as pd import numpy as np letters=["a","b","c","ab","abc","abcd","bc","d"] numbers=[3,23,11,14,16,2,45,65] series=pd.Series(letters) series.index=numbers print("The original series is:") print(series) sorted_series=series.sort_index(ignore_index=True) print("The sorted series is:") print(sorted_series)

Output:

The original series is: 3 a 23 b 11 c 14 ab 16 abc 2 abcd 45 bc 65 d dtype: object The sorted series is: 0 abcd 1 a 2 c 3 ab 4 abc 5 b 6 bc 7 d dtype: object

In this example, we have set the ignore_index parameter to True in the sort_index() method. Hence, after sorting the series by original index values, the index of the series is reset.

Sort a Series by Index in Descending Order in Python

To sort a pandas series by index in descending order, you can set the ascending parameter in the sort_index() method to False as shown in the following example.

import pandas as pd import numpy as np letters=["a","b","c","ab","abc","abcd","bc","d"] numbers=[3,23,11,14,16,2,45,65] series=pd.Series(letters) series.index=numbers print("The original series is:") print(series) sorted_series=series.sort_index(ascending=False) print("The sorted series is:") print(sorted_series)

Output:

The original series is: 3 a 23 b 11 c 14 ab 16 abc 2 abcd 45 bc 65 d dtype: object The sorted series is: 65 d 45 bc 23 b 16 abc 14 ab 11 c 3 a 2 abcd dtype: object

In this example, we have set the ascending parameter in the sort_index() method to False. Hence, the series is sorted by index in descending order.

Sort a Pandas Series by Index Having NaN Values

To sort a series by index when there are NaN values in the index, you just need to invoke the sort_index() method on the pandas series as shown in the following example.

import pandas as pd import numpy as np letters=["a","b","c","ab","abc","abcd","bc","d"] numbers=[3,23,np.nan,14,16,np.nan,45,65] series=pd.Series(letters) series.index=numbers print("The original series is:") print(series) sorted_series=series.sort_index() print("The sorted series is:") print(sorted_series)

Output:

The original series is: 3.0 a 23.0 b NaN c 14.0 ab 16.0 abc NaN abcd 45.0 bc 65.0 d dtype: object The sorted series is: 3.0 a 14.0 ab 16.0 abc 23.0 b 45.0 bc 65.0 d NaN c NaN abcd dtype: object

In the above example, the index of the series contains NaN values. By default, the NaN values are stored at the last of the sorted series. If you want the NaN values at the start of the sorted series, you can set the na_position parameter to “first” as shown below.

import pandas as pd import numpy as np letters=["a","b","c","ab","abc","abcd","bc","d"] numbers=[3,23,np.nan,14,16,np.nan,45,65] series=pd.Series(letters) series.index=numbers print("The original series is:") print(series) sorted_series=series.sort_index(na_position="first") print("The sorted series is:") print(sorted_series)

Output:

The original series is: 3.0 a 23.0 b NaN c 14.0 ab 16.0 abc NaN abcd 45.0 bc 65.0 d dtype: object The sorted series is: NaN c NaN abcd 3.0 a 14.0 ab 16.0 abc 23.0 b 45.0 bc 65.0 d dtype: object

In this example, you can observe that we have set the na_position parameter to "first" in the sort_index() method. Hence, the elements having NaN values as their index are kept at the start of the sorted series returned by the sort_index() method.

Interesting read: Advantages of being a programmer.

Sort a Series by Index Inplace in Python

By default, the sort_index() method doesn’t sort the original series. It returns a new series sorted by index. If you want to modify the original series, you can set the inplace parameter to True in the sort_index() method as shown below.

import pandas as pd import numpy as np letters=["a","b","c","ab","abc","abcd","bc","d"] numbers=[3,23,np.nan,14,16,np.nan,45,65] series=pd.Series(letters) series.index=numbers print("The original series is:") print(series) series.sort_index(inplace=True) print("The sorted series is:") print(series)

Output:

The original series is: 3.0 a 23.0 b NaN c 14.0 ab 16.0 abc NaN abcd 45.0 bc 65.0 d dtype: object The sorted series is: 3.0 a 14.0 ab 16.0 abc 23.0 b 45.0 bc 65.0 d NaN c NaN abcd dtype: object

In this example, we have set the inplace parameter to True in the sort_index() method. Hence, the original series is sorted instead of creating a new series.

Sort a Pandas Series by Index Using a Key in Python

By using the key parameter, we can perform operations on the index of the series before sorting the series by index. For example, if you have negative numbers as the index in the series and you want to sort the series using the magnitude of the indices, you can pass the abs() function to the key parameter in the sort_index() method. 

import pandas as pd import numpy as np letters=["a","b","c","ab","abc","abcd","bc","d"] numbers=[3,23,-100,14,16,-3,45,65] series=pd.Series(letters) series.index=numbers print("The original series is:") print(series) series.sort_index(inplace=True,key=abs) print("The sorted series is:") print(series)

Output:

The original series is: 3 a 23 b -100 c 14 ab 16 abc -3 abcd 45 bc 65 d dtype: object The sorted series is: 3 a -3 abcd 14 ab 16 abc 23 b 45 bc 65 d -100 c dtype: object

In this example, we have a series having positive and negative numbers as indices. Now, to sort the pandas series using the absolute value of the indices, we have used the key parameter in the sort_index() method. In the key parameter, we have passed the abs() function.

When the sort_index() method is executed, the indices of the series are first passed to the abs() function. The values returned by the abs() function are then used to compare the indices for sorting the series. This is why we get the series in which the indices are sorted by absolute value instead of the actual value.

Conclusion

In this article, we have discussed how to sort a pandas series in Python. For this, we have used the sort_values() and sort_index() method. We have demonstrated different examples using different parameters of these methods.

I hope you enjoyed reading this article. To know more about pandas module, you can read this article on how to sort a pandas dataframe. You might also like this article on how to drop columns from a pandas dataframe.

Happy Learning!

The post Sort a Pandas Series in Python appeared first on PythonForBeginners.com.

Categories: FLOSS Project Planets

Salsa Digital Drupal-Related Articles: Juliane Erben and Alex Skrypnyk at DrupalSouth 2022

Planet Drupal - Mon, 2022-11-28 07:00
Julie and Alex's DrupalSouth 2022 presentation was on day 2 of the event. They took attendees through the process of building a new, high secure platform and website. View presentation description on DrupalSouth website
Categories: FLOSS Project Planets

Pages