Feeds

Real Python: How to Reset a pandas DataFrame Index

Planet Python - Wed, 2024-11-06 09:00

In this tutorial, you’ll learn how to reset a pandas DataFrame index, the reasons why you might want to do this, and the problems that could occur if you don’t.

Before you start your learning journey, you should familiarize yourself with how to create a pandas DataFrame. Knowing the difference between a DataFrame and a pandas Series will also prove useful to you.

In addition, you may want to use the data analysis tool Jupyter Notebook as you work through the examples in this tutorial. Alternatively, JupyterLab will give you an enhanced notebook experience, but feel free to use any Python environment you wish.

As a starting point, you’ll need some data. To begin with, you’ll use the band_members.csv file included in the downloadable materials that you can access by clicking the link below:

Get Your Code: Click here to download the free sample code you’ll use to learn how to reset a pandas DataFrame index.

The table below describes the data from band_members.csv that you’ll begin with:

Column Name PyArrow Data Type Description first_name string First name of member last_name string Last name of member instrument string Main instrument played date_of_birth string Member’s date of birth

As you’ll see, the data has details of the members of the rock band The Beach Boys. Each row contains information about its various members both past and present.

Note: In case you’ve never heard of The Beach Boys, they’re an American rock band formed in the early 1960s.

Throughout this tutorial, you’ll be using the pandas library to allow you to work with DataFrames, as well as the newer PyArrow library. The PyArrow library provides pandas with its own optimized data types, which are faster and less memory-intensive than the traditional NumPy types that pandas uses by default.

If you’re working at the command line, you can install both pandas and pyarrow using the single command python -m pip install pandas pyarrow. If you’re working in a Jupyter Notebook, you should use !python -m pip install pandas pyarrow. Regardless, you should do this within a virtual environment to avoid clashes with the libraries you use in your global environment.

Once you have the libraries in place, it’s time to read your data into a DataFrame:

Python >>> import pandas as pd >>> beach_boys = pd.read_csv( ... "band_members.csv" ... ).convert_dtypes(dtype_backend="pyarrow") Copied!

First, you used import pandas to make the library available within your code. To construct the DataFrame and read it into the beach_boys variable, you used pandas’ read_csv() function, passing band_members.csv as the file to read. Finally, by passing dtype_backend="pyarrow" to .convert_dtypes() you convert all columns to pyarrow types.

If you want to verify that pyarrow data types are indeed being used, then beach_boys.dtypes will satisfy your curiosity:

Python >>> beach_boys.dtypes first_name string[pyarrow] last_name string[pyarrow] instrument string[pyarrow] date_of_birth string[pyarrow] dtype: object Copied!

As you can see, each data type contains [pyarrow] in its name.

If you wanted to analyze the date information thoroughly, then you would parse the date_of_birth column to make sure dates are read as a suitable pyarrow date type. This would allow you to analyze by specific days, months or years, and so on, as commonly found in pivot tables.

The date_of_birth column is not analyzed in this tutorial, so the string data type it’s being read as will do. Later on, you’ll get the chance to hone your skills with some exercises. The solutions include the date parsing code if you want to see how it’s done.

Now that the file has been loaded into a DataFrame, you’ll probably want to take a look at it:

Python >>> beach_boys first_name last_name instrument date_of_birth 0 Brian Wilson Bass 20-Jun-1942 1 Mike Love Saxophone 15-Mar-1941 2 Al Jardine Guitar 03-Sep-1942 3 Bruce Johnston Bass 27-Jun-1942 4 Carl Wilson Guitar 21-Dec-1946 5 Dennis Wilson Drums 04-Dec-1944 6 David Marks Guitar 22-Aug-1948 7 Ricky Fataar Drums 05-Sep-1952 8 Blondie Chaplin Guitar 07-Jul-1951 Copied!

DataFrames are two-dimensional data structures similar to spreadsheets or database tables. A pandas DataFrame can be considered a set of columns, with each column being a pandas Series. Each column also has a heading, which is the name property of the Series, and each row has a label, which is referred to as an element of its associated index object.

The DataFrame’s index is shown to the left of the DataFrame. It’s not part of the original band_members.csv source file, but is added as part of the DataFrame creation process. It’s this index object you’re learning to reset.

The index of a DataFrame is an additional column of labels that helps you identify rows. When used in combination with column headings, it allows you to access specific data within your DataFrame. The default index labels are a sequence of integers, but you can use strings to make them more meaningful. You can actually use any hashable type for your index, but integers, strings, and timestamps are the most common.

Note: Although indexes are certainly useful in pandas, an alternative to pandas is the new high-performance Polars library, which eliminates them in favor of row numbers. This may come as a surprise, but aside from being used for selecting rows or columns, indexes aren’t often used when analyzing DataFrames. Also, row numbers always remain sequential when rows are added or removed in a Polars DataFrame. This isn’t the case with indexes in pandas.

Read the full article at https://realpython.com/pandas-reset-index/ »

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Julien Tayon: The crudest CRUD of them all : the smallest CRUD possible in 150 lines of python

Planet Python - Wed, 2024-11-06 07:04
Right now, I am on a never ending quest that requires me to think of building a full fledge MVC controller : an anti-jira tracker that would favours HARD CHECKED facts over wishful thinking.

For this to begin, I am not really motivated in beginning with a full fledged MVC (Model View Controller) à la django because there is a lot of boilerplates and actions to do before a result. But, it has a lot of feature I want, including authentication, authorization and handling security.

For prototypes we normally flavours lightweight framework (à la flask), and CRUD.

CRUD approach is a factorisation of all framework in a single dynamic form that adapts itself to the model to generate HTML forms to input data, tabulate, REST endpoints and search them from the python class declaration and generate the database model. One language to rule them all : PYTHON. You can easily generate even the javascript to handle autocompletion on the generated view from python with enough talent.

But before using a CRUD framework, we need a cruder one, ugly, disgusting but useful for a human before building the REST APIs, writing the class in python, the HTML form, and the controlers.

I call this the crudest CRUD of them all.

Think hard at what you want when prototyping ...

  • to write no CONTROLLERS ; flask documentation has a very verbose approach to exposing routes and writing them, writing controller for embasing and searching databases is boring
  • to write the fewer HTML views possible, one and only onle would be great ;
  • to avoid having to fiddle the many files reflecting separation of concerns : the lesser python files and class you touch the better;
  • to avoid having to write SQL nor use an ORM (at least a verbose declarative one) ;
  • show me your code and you can mesmerize and even fool me, however show me your data structure and I'll know everthing I have to know about your application : data structure should be under your nose in a readable fashion in the code;/
  • to have AT LEAST one end point for inserting and searching so that curl can be used to begin automation and testing, preferably in a factorisable fashion;
  • only one point of failure is accepted

Once we set these few condition we see whatever we do WE NEED a dynamic http server at the core. Python being the topic here, we are gonna do it in python.

What is the simplest dynamic web server in python ?

The reference implementation of wsgi that is the crudest wsgi server of them all : wsgiref. And you don't need to download it since it's provided in python stdlib.

First thing first, we are gonna had a default view so that we can serve an HTML static page with the list of the minimal HTML we need to interact with data : sets of input and forms.

Here, we stop. And we see that these forms are describing the data model.

Wouldn't it be nice if we could parse the HTML form easily with a tool from the standard library : html.parser and maybe deduce the database model and even more than fields coud add relationship, and well since we are dreaming : what about creating the tables on the fly from the form if they don't exists ?

The encoding of the relationship do require an hijack of convention where when the parser cross a name of the field in the form whatever_id it deduces it is a foreign key to table « whatever », column « id ».
Once this is done, we can parse the html, do some magick to match HTML input types to database types (adapter) and it's almost over. We can even dream of creating the database if it does not exists in a oneliner for sqlite.

We just need to throw away all the frugality of dependencies by the window and spoil our karma of « digital soberty » by adding the almighty sqlalchemy the crudest (but still heavy) ORM when it comes of the field of the introspective features of an ORM to map a database object to a python object in a clear consistent way. With this, just one function is needed in the controller to switch from embasing (POST method) and searching (GET).

Well, if the DOM is passed in the request. So of course I see the critics here :
  • we can't pass the DOM in the request because the HTML form ignores the DOM
  • You are not scared of error 415 (request too large) in the get method if you pass the DOM ?
That's where we obviously need two important tools : 1) javascript, 2) limitations.

Since we are human we would also like the form to be readable when served, because, well, human don't read the source and can't see the name attributes of the input. A tad of improving the raw html would be nice. It would also give consistency. It will also diminishes the required size of the formular to send. Here, javascript again is the right anwser. Fine, we serve the static page in the top of the controller. Let's use jquery to make it terse enough. Oh, if we have Javascript, wouldn't il be able to clone the part of the invented model tag inside every form so now we can pass the relevant part of the DOM to the controller ?

I think we have everything to write the crudest CRUD server of them all :D

Happy code reading : import multipart from wsgiref.simple_server import make_server from json import dumps from sqlalchemy import create_engine, MetaData, Table, Column from sqlalchemy import Integer, String, Float, Date, DateTime,UnicodeText, ForeignKey from html.parser import HTMLParser from sqlalchemy.ext.automap import automap_base from sqlalchemy.orm import Session from sqlalchemy import select from sqlalchemy import create_engine from sqlalchemy_utils import database_exists, create_database from urllib.parse import parse_qsl, urlparse engine = create_engine("postgresql://jul@192.168.1.32/pdca") if not database_exists(engine.url): create_database(engine.url) tables = dict() class HTMLtoData(HTMLParser): def __init__(self): global engine, tables self.cols = [] self.table = "" self.tables= [] self.engine= engine self.meta = MetaData() super().__init__() def handle_starttag(self, tag, attrs): attrs = dict(attrs) if tag == "input": if attrs.get("name") == "id": self.cols += [ Column('id', Integer, primary_key = True), ] return try: if attrs.get("name").endswith("_id"): table,_=attrs.get("name").split("_") self.cols += [ Column(attrs["name"], Integer, ForeignKey(table + ".id")) ] return except Exception as e: print(e) if attrs["type"] in ("email", "url", "phone", "text"): self.cols += [ Column(attrs["name"], UnicodeText ), ] if attrs["type"] == "number": if attrs["step"] == "any": self.cols+= [ Columns(attrs["name"], Float), ] else: self.cols+= [ Column(attrs["name"], Integer), ] if attrs["type"] == "date": self.cols += [ Column(attrs["name"], Date) ] if attrs["type"] == "datetime": self.cols += [ Column(attrs["name"], DateTime) ] if attrs["type"] == "time": self.cols += [ Column(attrs["name"], Time) ] if tag== "form": self.table = urlparse(attrs["action"]).path[1:] def handle_endtag(self, tag): if tag=="form": self.tables += [ Table(self.table, self.meta, *self.cols), ] tables[self.table] = self.tables[-1] self.table = "" self.cols = [] with engine.connect() as cnx: self.meta.create_all(engine) cnx.commit() html = """ <!doctype html> <html> <head> <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.7.1/jquery.min.js"></script> <script> $(document).ready(function() { $("form").each((i,el) => { $(el).wrap("<fieldset>"+ el.action + "</fieldset>" ); $(el).append("<input type=submit value=insert formmethod=post ><input type=submit value=search formmethod=get />"); }); $("input:not([type=hidden],[type=submit])").each((i,el) => { $(el).before("<label>" + el.name+ "</label><br/>"); $(el).after("<br>"); }); }); </script> </head> <body> <form action=/user > <input type=number name=id /> <input type=text name=name /> <input type=email name=email > </form> <form action=/event > <input type=number name=id /> <input type=date name=date /> <input type=text name=text /> <input type=number name=user_id /> </form> </body> </html> """ router = dict({"" : lambda fo: html,}) def simple_app(environ, start_response): fo,fi=multipart.parse_form_data(environ) fo.update(**{ k: dict( name=fi.filename, content=fi.file.read().decode('utf-8', 'backslashreplace'), content_type=fi.content_type, ) for k,v in fi.items()}) table = route = environ["PATH_INFO"][1:] fo.update(**dict(parse_qsl(environ["QUERY_STRING"]))) start_response('200 OK', [('Content-type', 'text/html; charset=utf-8')]) try: HTMLtoData().feed(html) except KeyError: pass metadata = MetaData() metadata.reflect(bind=engine) Base = automap_base(metadata=metadata) Base.prepare() if route in tables.keys(): with Session(engine) as session: Item = getattr(Base.classes, table) if environ.get("REQUEST_METHOD", "GET") == "POST": new_item = Item(**{ k:v for k,v in fo.items() if v and not k.startswith("_")}) session.add(new_item) ret=session.commit() fo["insert_result"] = new_item.id if environ.get("REQUEST_METHOD") == "GET": result = [] for elt in session.execute( select(Item).filter_by(**{ k : v for k,v in fo.items() if v and not k.startswith("_")})).all(): result += [{ k.name:getattr(elt[0],k.name) for k in tables[table].columns}] fo["search_result"] = result return [ router.get(route,lambda fo:dumps(fo.dict, indent=4, default=str))(fo).encode() ] print("Crudest CRDU of them all on port 5000...") make_server('', 5000, simple_app).serve_forever()
Categories: FLOSS Project Planets

1xINTERNET blog: Why choosing a reliable migration partner is crucial for a successful transition from Drupal 7

Planet Drupal - Wed, 2024-11-06 07:00

The end of life of Drupal 7 is just around the corner and selecting the right migration partner is crucial for a smooth, cost-effective, and future-proof transition. Find out how 1xINTERNET and Pantheon’s unique solution can support your organisation!

Categories: FLOSS Project Planets

1xINTERNET blog: Why choosing a reliable migration partner is crucial for a successful transition from Drupal 7

Planet Drupal - Wed, 2024-11-06 07:00

The end of life of Drupal 7 is just around the corner and selecting the right migration partner is crucial for a smooth, cost-effective, and future-proof transition. Find out how 1xINTERNET and Pantheon’s unique solution can support your organisation!

Categories: FLOSS Project Planets

Daniel Lange: Weird times ... or how the New York DEC decided the US presidential elections

Planet Debian - Wed, 2024-11-06 04:15

November 2024 will be known as the time when killing peanut, a pet squirrel, by the New York State DEC swung the US presidential elections and shaped history forever.

The hundreds of millions of dollars spent on each side, the tireless campaigning by the candidates, the celebrity endorsements ... all made for an open race for months. Investments evened each other out.

But an OnlyFans producer showing people an overreaching, bureaucracy driven State raiding his home to confiscate a pet squirrel and kill it ... swung enough voters to decide the elections.

That is what we need to understand in times of instant worldwide publication and a mostly attention driven economy: Human fates, elections, economic cycles and wars can be decided by people killing squirrels.

RIP, peanut.

P.S.: Trump Media & Technology Group Corp. (DJT) stock is up 30% pre-market.

Categories: FLOSS Project Planets

Little Wayland Things

Planet KDE - Wed, 2024-11-06 04:00

While I do have a Qt git build on my machine that I use for development, I usually only test individual applications and functionality but hardly ever run my full Plasma session on it. This means that for day-to-day use I typically only get to enjoy new Qt features once they have actually been released.

Proper modal dialogs under Wayland (note the darkened editor window) thanks to XDG Dialog and the new Qt 6.8

One feature I talked about in the very last issue of “On the road to Plasma 6” is a nice API for XDG Foreign. To recap: it’s a Wayland protocol that lets an application export a window to another one so it can can attach a window to it. For example, the XDG Desktop Portal wants to attach the “Open File” dialog as if it were coming from the application that requested it.

Of course we don’t want to write low-level Wayland code and instead have an easy to use API for it. The KWindowSystem::setMainWindow function does just that: hand in a window and the token you received from the other application (created through KWaylandExtras::exportWindow) and it takes care of everything else. Presumably, you want to set the parent window before showing your dialog to make absolutely sure it’s set up properly.

However, Qt did not have an API to tell us when the underlying XDG Toplevel (think: a regular desktop-y window with a title bar and what not) had been created. We were only told when the basic wl_surface was created, which was too early, or the window was exposed/shown, at which point it was already flashing up in the user’s task bar. Hence, I added a new QWaylandWindow::surfaceRoleCreated (and corresponding surfaceRoleDestroyed) signal. Utilizing that, the aforementioned KWindowSystem API now works perfectly.

Another major addition to Qt Wayland that I have been looking forward to very much is support for the XDG Dialog protocol. While a window could have always had a parent (e.g. a popup menu or settings dialog parented to the application’s main window), there was no concept of a “modal” dialog. Therefore, we did not support the “dim parent” effect under Wayland that darkens a window to indicate it cannot be interacted with. More importantly, KWin couldn’t take it into account for its focus handling either. It happily let you focus a blocked window but the application would then just ignore your input.

There’s only one Dolphin running here!

This was most noticeable for me when Alt+Tab’ing back and forth, for example using the “Open File” dialog in one application and then trying to switch to the other to verify where the file was actually located. Instead of cycling between the file dialog and the other application, it would alternate between the file dialog and the blocked main window.

Sadly, even when I upgraded to Qt 6.8 the situation didn’t improve. I noticed that Alt+Tab actually showed the dialog twice. This looked like a bug and sure enough comparing it to the Plasma 5.27 LTS session on my other computer proved that it used to work at some point. At first I didn’t spot anything obvious until I noticed a small typo that must have slipped in during some major refactoring. Instead of not including the main window when it had a modal child, it included the modal child once again! Sure enough, adding an exclamation mark (the logical NOT operator in C++) did the trick.

If you want to support more good people such as myself, consider donating to the KDE End of Year Fundraiser!

Categories: FLOSS Project Planets

drunomics: Low-code + Decoupled Drupal: The Power of Custom Elements 3.0

Planet Drupal - Tue, 2024-11-05 22:54
Low-code + Decoupled Drupal: The Power of Custom Elements 3.0 no-code.png wolfgang.ziegler Wed, 11/06/2024 - 04:54 Custom Elements UI empowers frontend developers to configure Drupal's data processing for API responses, enabling precise tailoring of output.
Categories: FLOSS Project Planets

Matt Layman: Deploy Your Own Web App With Kamal 2

Planet Python - Tue, 2024-11-05 19:00
Kamal offers zero-downtime deploys, rolling restarts, asset bridging, remote builds, accessory service management, and everything else you need to deploy and manage your web app in production with Docker. Originally built for Rails apps, Kamal will work with any type of web app that can be containerized. We dig into Kamal, how it works, and how you could use it on your next project.
Categories: FLOSS Project Planets

HDR and color management in KWin, part 5: HDR on SDR laptops

Planet KDE - Tue, 2024-11-05 18:00

This one required a few other features to be implemented first, so let’s jump right in.

Matching reference luminances

A big part of what a desktop compositor needs to get right with HDR content is to show SDR and HDR content properly side by side. KWin 6.0 added an SDR brightness slider for that purpose, but that’s only half the equation - what about the brightness of HDR content?

When we say “HDR”, usually that refers to a colorspace with the rec.2020 primaries and the perceptual quantizer (PQ) transfer function. A transfer function describes how to calculate a real brightness value from the “electrical” signal encoded in the content - PQ specifically has encoded values from 0 to 1 and brightness values from 0 to 10000 nits. For reference, your typical office monitor does around 300 or 400 nits at maximum brightness setting, and many newer phones can go a bit above 1000 nits.

Now if we want to show HDR content on an HDR screen, the most straight forward thing to do would be to just calculate the brightness values, write them to the screen and be done with it, right? That’s what KWin did up to Plasma 6.1, but it’s far from ideal. Even if your display can show the full range of requested brightness values, you might want to adjust the brightness to match your environment - be it brighter or darker than the room the content was optimized for - and when there’s SDR things in HDR content, like subtitles in a video, that should ideally match other SDR content on the screen as well.

Luckily, there is a preexisting relationship between HDR and SDR that we can use: The reference luminance. It defines how bright SDR white is - which is why another name for it is simply “SDR white”.

As we want to keep the brightness slider working, we won’t map SDR content to the reference luminance of any HDR transfer function though, but instead we map both SDR and HDR content to the SDR brightness setting. If we have an HDR video that uses the PQ transfer function, that reference luminance is 203 nits. If your SDR brightness setting is at 406 nits, KWin will just multiply the brightness of the HDR video with a factor of 2.

This doesn’t only mean that we can make SDR and HDR content fit together nicely on HDR screens, but it also means we now know what to do when we have HDR content on an SDR screen: We map the reference luminance from the video to SDR white on the screen. That’s of course not enough to make it look nice though…

Tone mapping

Especially with HDR presented on an SDR screen, but also on many HDR screens, it will happen that the content brightness exceeds the display capabilities. To handle this, starting with Plasma 6.2, whenever the HDR metadata of the content says it’s brighter than the display can go, KWin will apply tone mapping.

Doing this tone mapping in RGB can result in changing the content quite badly though. Let’s take a look by using the most simple “tone mapping” function there is, clipping. It just limits the red, green and blue values separately to the brightness that the screen can show.

If we have a pixel with the value [2.0, 0.0, 2.0] and a maximum brightness of 1.0, that gets mapped to [1.0, 0.0, 1.0] - which is the same purple, just in darker. But if the pixel has the values [2.0, 0.0, 1.0], then that gets mapped to [1.0, 0.0, 1.0], even though the source color was significantly more red!

To fix that, KWin’s tone mapping uses ICtCp. This is a color space developed by Dolby, in which the perceived brightness (aka Intensity) is separated from the chroma components (Ct = blue-yellow, Cp = red-green), which is perfect for tone mapping. KWin’s shaders thus transform the RGB content to ICtCp, apply a brightness mapping function to only the intensity component, and then convert back to RGB.

The result of that algorithm looks like this:

RGB clipping KWin 6.2’s tone mapping MPV’s tone mapping

As you can see, there’s still some color changes going on in comparison to MPV’s algorithm; this is partially because the tone mapping curve still needs some more adjustments, and partially because we also still need to do similar mapping for colors that the screen can’t actually show. It’s already a large improvement though, and does better than the built-in tone mapping functionality in many HDR screens.

When tone mapping HDR content on SDR screens, we always end up reducing the brightness of the overall image, so that we have some brightness values to map the really bright highlights in the video to - otherwise everything just slightly over the reference luminance would look like an overexposed blob of color, as you can see in the “RGB clipping” image. There are ways around that though…

HDR on SDR laptop displays

To explain the reasoning behind this, it helps to first have a look at what even makes a display “HDR”. In many cases it’s just marketing nonsense, a label that’s put on displays to make them seem more fancy and desirable, but in others there’s an actual tangible benefit to it.

Let’s take OLED displays as an example, as it’s considered one of the display technologies where HDR really shines. When you drive an OLED at high brightness levels, it becomes quite inefficient, it draws a lot of power and generates a lot of heat. Both of these things can only be dealt with to a limited degree, so OLED displays can generally only be used with relatively low average brightness levels. They can go a lot brighter than the average in a small part of the screen though, and that’s why they benefit so much from HDR - you can show a scene that’s on average only 200 nits bright, with the sky in the image going up to 300 nits, the sun going up to 1000 nits and the ground only doing 150 nits.

Now let’s compare that to SDR laptop displays. In the case of most LCDs, you have a single backlight LED for the whole screen, and when you move the brightness slider, the power the backlight is driven at is changed. So there’s no way to make parts of the screen brighter than the rest on a hardware level… but that doesn’t mean there isn’t a way to do it in software!

When we want to show HDR content and the brightness slider is below 100%, KWin increases the backlight level to get a peak brightness that matches the relative peak brightness of that content (as far as that’s possible). At the same time it changes the colorspace description on the output to match that change: While the reference luminance stays the same, the maximum luminance of the transfer function gets increased in proportion to the increase in backlight brightness.

The results is that SDR white gets mapped to a reduced RGB value, which is at least supposed to exactly counteract the increase of brightness that we’re applying with the backlight, while HDR content that goes beyond the reference luminance gets to use the full brightness range.

Increasing the backlight power of course doesn’t come without downsides; black levels and power usage both get increased, so this is only ever active if there’s HDR content on the screen with valid HDR metadata that signals brightness levels going beyond the reference luminance.

As always, capturing HDR content with a phone camera is quite difficult, but I think you can at least sort of see the effect:

without backlight adjustment with backlight adjustment

This feature has been merged into KWin’s git master branch and will be available on all laptop displays starting with Plasma 6.3. I really recommend trying it for yourself once it reaches your distribution!

Categories: FLOSS Project Planets

TestDriven.io: Avoid Counting in Django Pagination

Planet Python - Tue, 2024-11-05 17:28
This article looks at how to avoid the count query in Django's paginator.
Categories: FLOSS Project Planets

FSF Anniversary Logo Contest

FSF Blogs - Tue, 2024-11-05 16:41
Categories: FLOSS Project Planets

FSF Blogs: Forty years of commitment to software freedom

GNU Planet! - Tue, 2024-11-05 16:40
We're planning a jam-packed anniversary year and we hope you'll join us for the festivities!
Categories: FLOSS Project Planets

Forty years of commitment to software freedom

FSF Blogs - Tue, 2024-11-05 16:40
We're planning a jam-packed anniversary year and we hope you'll join us for the festivities!
Categories: FLOSS Project Planets

PreviousNext: PowerBI Dashboard: Addressing content currency

Planet Drupal - Tue, 2024-11-05 16:00

Post co-authored with NSW Resources. A critical issue with the management of content currency on our Drupal website, nsw.gov.au/nswresources required an innovative solution to provide us with an automated content audit process. 

by luhur.rizal / 6 November 2024

Due to the complexity and size of our Drupal web presence, ensuring each page was up-to-date and reviewed by the appropriate business unit became increasingly challenging. We needed a tool to track how long it had been since a page was reviewed, set specific periods for future reviews and easily identify the page owners for each section of the website. Furthermore, with the required frequency of daily updates to the site, the solution had to be ‘live’ to accurately reflect these changes.

Choosing the right solution

To tackle these challenges, we collaborated with our Drupal web development partner, PreviousNext, to create a live .csv file of all relevant web pages. This file included custom metadata detailing each page’s review frequency, page owner, date of last page update, date of last content review, publishing status and the next scheduled review date. By using this .csv file as a data source, we built a user-friendly content audit report dashboard in Microsoft Power BI.

The PowerBI dashboard provides executives with a high-level overview of which sections of the website are most in need of review. A complementary dashboard for ‘content champions’ offers a more granular view of the status of each individual page, enabling targeted content management.

Power BI implementation

Implementing this solution involved several steps:

Internal stakeholder consultation

We engaged with the various business units in NSW Resources division to identify page owners and establish appropriate review periods for each section of the website.

Metadata assignment 

Metadata bulk-uploaded to the pages included the custom metadata fields created for the project, such as review periods and page owners.

Data manipulation in PowerBI 

Data from the .csv file was manipulated within PowerBI to ensure that columns were in the correct format. We created a 'Review status' column based on the next date of review to provide clear visibility. We also filtered out any unpublished or archived content to make it more streamlined. 

PowerBI build

Using the dataset from the website's metadata and Google Analytics, we built a comprehensive dashboard in PowerBI Desktop and then uploaded it to PowerBI Service for broader access. Live links to the web pages were integrated into the dashboards for easy navigation.

Executive overview report

We developed a high-level summary report that shows how many pages each business unit is responsible for and includes Google Analytics page views from the past 30 days.

Content audit report

This report provides filtering by review status and sub-areas within each business unit, offering direct links to the listed web pages and conditional formatting utilising a traffic light system for review status indicators.

Overall tracker

Designed to be used exclusively by the NSW Resources website team, this site-wide and document overview provides an overall tracker, including documents, events, and articles that are not part of other reports.

Internal integrations 

PowerBI reports were integrated into Microsoft Teams and the SharePoint intranet, facilitating use across the business.

Internal work request form 

The form was updated to distinguish whether a web update is part of a comprehensive content audit review, therefore requiring a review period reset, or just a minor adjustment that means the review status remains unchanged.

A content audit that works for everyone

The project required extensive consultation to define the scope and needs of each business unit. Following this, identifying the correct page owners, along with setting appropriate review periods, posed significant challenges. 

As business units sometimes want entire website sections to be marked as reviewed with a change in the review period, the Drupal-side dashboard allowed for bulk changes to both owners and review periods by uploading a revised version of the .csv file, saving substantial time.

Understanding the correct licensing requirements for PowerBI was another challenge. After consulting with our internal IT team, a group workspace was set up under an enterprise agreement, and an individual licence was obtained for the team member managing the dashboards.

Testing the PowerBI dashboard

To ensure effectiveness, the solution was initially tested in a development environment that mirrored the production site. This approach allowed us to test the limitations and user experience prior to going live. During this phase, we tested the bulk upload of the .csv file to update page metadata.

A soft launch of the content audit dashboard provided valuable insights, such as the realisation that a three-month review period was too short, given the number of pages each business unit manages. 

As a result of this testing period, we made minor adjustments, such as requiring a defined review period for each page and allowing users to opt-out of updates considered a ‘review’ for reporting purposes. This might include, for example, correcting a typo, which doesn’t constitute a page review in the context of the content audit dashboard.

Transforming content auditing

This solution significantly enhanced reporting capabilities across NSW Resources, reducing the need for manual intervention. Now, page updates are easily reflected in the report dashboard automatically.

The PowerBI dashboards offer real-time updates and clear visibility of page ownership and review status, making it easier for business units to manage their content.

Business units can independently track the currency of their pages without needing data from the digital team, streamlining the process and increasing efficiency.

Future plans for the dashboard

The solution will continue to evolve. We plan to use the work done on the PowerBI integration to inform future website improvements with the potential for further Google Analytics data integration.

A document audit is currently a separate and opt-in process for business units, but future plans may involve greater integration.

Conclusion

Overall, this innovative solution addresses a critical need for NSW Resources by providing a robust, automated and user-friendly content audit process that adapts to the dynamic nature of our Drupal website.

Categories: FLOSS Project Planets

PyCoder’s Weekly: Issue #654 (Nov. 5, 2024)

Planet Python - Tue, 2024-11-05 14:30

#654 – NOVEMBER 5, 2024
View in Browser »

PySheets: Spreadsheets in the Browser Using PyScript

What goes into building a spreadsheet application in Python that runs in the browser? How do you make it launch quickly, and where do you store the cells of data? This week on the show, we speak with Chris Laffra about his project, PySheets, and his book “Communication for Engineers.”
REAL PYTHON podcast

Adding Keyboard Shortcuts to the Python REPL

Python 3.13 included a new version of the REPL which has the ability to define keyboard shortcuts. This article shows you how to create one and warns you about potential hangups.
TREY HUNNER

Tired of Being Paged? Worry Less With Temporal

Say goodbye to managing failures, network outages, flaky endpoints, and long-running processes. Temporal ensures your code never fails. Period. PLUS, you can get started today on Temporal Cloud with $1,000 free credits on us →
TEMPORAL TECHNOLOGIES sponsor

Running a Million Empty Tests

To better understand just where the performance cost of running tests comes from, Anders ran a million empty tests. This post talks about what he did and the final results.
ANDERS HOVMOLLER

Pillow Release 11.0.0

GITHUB.COM/PYTHON-PILLOW

PEP 750: Template Strings (Major Updates)

PSF

PEP 756: Add PyUnicode_Export() and PyUnicode_Import() C Functions (Withdrawn)

PSF

Python 3.8 Reaches End of Life

PYTHON.ORG

Quiz: Single and Double Underscores in Python Names

REAL PYTHON

Quiz: Getting Started With Async Features in Python

REAL PYTHON

Discussions Thinking of Rewriting Our Go / Java API in Python

REDDIT

Best GUI for Local Client App?

REDDIT

Articles & Tutorials Move to Sigstore Complicates Linux Distros

Currently, CPython signs its artifacts with both PGP and Sigstore. Removing the PGP signature has been proposed, but that has implications: Sigstore is still new enough that many Linux distributions don’t support it yet.
JOE BROCKMEIER

Python’s Magic Methods in Classes

In this video course, you’ll learn what magic methods are in Python, how they work, and how to use them in your custom classes to support powerful features in your object-oriented code.
REAL PYTHON course

[Webinar] How to Build Secure, Ethical, and Scalable AI Operations

As GenAI and LLMs rapidly evolve, the impact of data leaks and unsafe AI outputs makes it critical to secure your AI infrastructure. Learn how MLOps and ML Platform teams can use the newly launched Guardrails Pro to secure AI operations — enabling faster, safer adoption of LLMs at scale →
GUARDRAILS sponsor

Make It Ephemeral: Software Should Decay and Lose Data

In the real world, things decay over time. In the digital world things get kept forever, and sometimes that shouldn’t be so. Designing for deletion is hard.
ARMIN RONACHER

Python 3.13, t-Strings, Dep Groups…

Bite code! does their monthly Python news wrap-up. Check out stories on 3.13, proposed template strings, dependency groups in pyproject.toml, and more.
BITE CODE!

Identifying Products From Images

This project uses computer vision solution to automate doing inventory of products in retail, using YOLOv8 and image embeddings for precise detection.
ALBERT FERRÉ • Shared by Albert Ferré

Write More Pythonic Code With Context Managers

Context managers enable you to create “template” code with initialization and clean up to make the code that uses them easier to read and understand.
JUHA-MATTI SANTALA

Django Girls 10th Birthday!

This post celebrating ten years of Django Girls talks about how it got started, what they’re hoping to do, and how you can get involved.
DJANGO GIRLS

pytest Selection Arguments for Failing Tests

This quick TIL post talks about five useful pytest options that let you control what tests to run with respect to failing tests.
RODRIGO GIRÃO SERRÃO

Asyncio gather() Return Values

This post shows you how to return values from coroutines that have been concurrently executed using asyncio.gather().
JASON BROWNLEE

PyBay 2024

This list contains the recorded talks from the PyBay 2024 conference.
YOUTUBE video

Projects & Code wimsey: Data Contract Library

GITHUB.COM/BENRUTTER

libcom: Image Composition Toolbox

GITHUB.COM/BCMI

simplemind: Experimental Client for AI Providers

GITHUB.COM/KENNETHREITZ

PyChrono: Multi-Physics Simulation in Python

CRISTIANOPIZZAMIGLIO.COM • Shared by Cristiano Pizzamiglio

jamesql: In-Memory NoSQL Database in Python

GITHUB.COM/CAPJAMESG

Events Weekly Real Python Office Hours Q&A (Virtual)

November 6, 2024
REALPYTHON.COM

Canberra Python Meetup

November 7, 2024
MEETUP.COM

Sydney Python User Group (SyPy)

November 7, 2024
SYPY.ORG

DFW Pythoneers 2nd Saturday Teaching Meeting

November 9, 2024
MEETUP.COM

PiterPy Meetup

November 12, 2024
PITERPY.COM

PyCon Sweden 2024

November 14 to November 16, 2024
PYCON.SE

PyCon Hong Kong 2024

November 16 to November 17, 2024
PYCON.HK

PyCon Mini Tokai 2024

November 16 to November 17, 2024
PYCON.JP

PyCon Ireland 2024

November 16 to November 18, 2024
PYTHON.IE

Happy Pythoning!
This was PyCoder’s Weekly Issue #654.
View in Browser »

[ Subscribe to 🐍 PyCoder’s Weekly 💌 – Get the best Python news, articles, and tutorials delivered to your inbox once a week >> Click here to learn more ]

Categories: FLOSS Project Planets

Droptica: Product Search Engine in Drupal with Apache Solr Integration - How-to Guide

Planet Drupal - Tue, 2024-11-05 12:56

Product search is a key function in e-commerce today. This article will show how to create an advanced product search engine based on Drupal and its integration with Apache Solr. By combining Drupal, Droopler installation profile, and Solr, a powerful tool can be created to make it easier for customers to navigate and search large data sets faster. I encourage you to read the blog post or watch the video in the “Nowoczesny Drupal” series (the video is in Polish).

Categories: FLOSS Project Planets

FSF Events: Free Software Directory meeting on IRC: Friday, November 8, starting at 12:00 EST (17:00 UTC)

GNU Planet! - Tue, 2024-11-05 09:35
Join the FSF and friends on Friday, November 8 from 12:00 to 15:00 EST (17:00 to 20:00 UTC) to help improve the Free Software Directory.
Categories: FLOSS Project Planets

Real Python: Introduction to Web Scraping With Python

Planet Python - Tue, 2024-11-05 09:00

Web scraping is the process of collecting and parsing raw data from the Web, and the Python community has come up with some pretty powerful web scraping tools.

The Internet hosts perhaps the greatest source of information on the planet. Many disciplines, such as data science, business intelligence, and investigative reporting, can benefit enormously from collecting and analyzing data from websites.

In this video course, you’ll learn how to:

  • Parse website data using string methods and regular expressions
  • Parse website data using an HTML parser
  • Interact with forms and other website components

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Specbee: How to fix SEO rankings after your Drupal website migration

Planet Drupal - Tue, 2024-11-05 07:59
Worried about retaining your SEO ranking during your Drupal migration? Lost rankings after the migration?! Learn how to maintain and fix Drupal SEO during migration with our guide.
Categories: FLOSS Project Planets

Pages