Feeds
Real Python: How to Reset a pandas DataFrame Index
In this tutorial, you’ll learn how to reset a pandas DataFrame index, the reasons why you might want to do this, and the problems that could occur if you don’t.
Before you start your learning journey, you should familiarize yourself with how to create a pandas DataFrame. Knowing the difference between a DataFrame and a pandas Series will also prove useful to you.
In addition, you may want to use the data analysis tool Jupyter Notebook as you work through the examples in this tutorial. Alternatively, JupyterLab will give you an enhanced notebook experience, but feel free to use any Python environment you wish.
As a starting point, you’ll need some data. To begin with, you’ll use the band_members.csv file included in the downloadable materials that you can access by clicking the link below:
Get Your Code: Click here to download the free sample code you’ll use to learn how to reset a pandas DataFrame index.
The table below describes the data from band_members.csv that you’ll begin with:
Column Name PyArrow Data Type Description first_name string First name of member last_name string Last name of member instrument string Main instrument played date_of_birth string Member’s date of birthAs you’ll see, the data has details of the members of the rock band The Beach Boys. Each row contains information about its various members both past and present.
Note: In case you’ve never heard of The Beach Boys, they’re an American rock band formed in the early 1960s.
Throughout this tutorial, you’ll be using the pandas library to allow you to work with DataFrames, as well as the newer PyArrow library. The PyArrow library provides pandas with its own optimized data types, which are faster and less memory-intensive than the traditional NumPy types that pandas uses by default.
If you’re working at the command line, you can install both pandas and pyarrow using the single command python -m pip install pandas pyarrow. If you’re working in a Jupyter Notebook, you should use !python -m pip install pandas pyarrow. Regardless, you should do this within a virtual environment to avoid clashes with the libraries you use in your global environment.
Once you have the libraries in place, it’s time to read your data into a DataFrame:
Python >>> import pandas as pd >>> beach_boys = pd.read_csv( ... "band_members.csv" ... ).convert_dtypes(dtype_backend="pyarrow") Copied!First, you used import pandas to make the library available within your code. To construct the DataFrame and read it into the beach_boys variable, you used pandas’ read_csv() function, passing band_members.csv as the file to read. Finally, by passing dtype_backend="pyarrow" to .convert_dtypes() you convert all columns to pyarrow types.
If you want to verify that pyarrow data types are indeed being used, then beach_boys.dtypes will satisfy your curiosity:
Python >>> beach_boys.dtypes first_name string[pyarrow] last_name string[pyarrow] instrument string[pyarrow] date_of_birth string[pyarrow] dtype: object Copied!As you can see, each data type contains [pyarrow] in its name.
If you wanted to analyze the date information thoroughly, then you would parse the date_of_birth column to make sure dates are read as a suitable pyarrow date type. This would allow you to analyze by specific days, months or years, and so on, as commonly found in pivot tables.
The date_of_birth column is not analyzed in this tutorial, so the string data type it’s being read as will do. Later on, you’ll get the chance to hone your skills with some exercises. The solutions include the date parsing code if you want to see how it’s done.
Now that the file has been loaded into a DataFrame, you’ll probably want to take a look at it:
Python >>> beach_boys first_name last_name instrument date_of_birth 0 Brian Wilson Bass 20-Jun-1942 1 Mike Love Saxophone 15-Mar-1941 2 Al Jardine Guitar 03-Sep-1942 3 Bruce Johnston Bass 27-Jun-1942 4 Carl Wilson Guitar 21-Dec-1946 5 Dennis Wilson Drums 04-Dec-1944 6 David Marks Guitar 22-Aug-1948 7 Ricky Fataar Drums 05-Sep-1952 8 Blondie Chaplin Guitar 07-Jul-1951 Copied!DataFrames are two-dimensional data structures similar to spreadsheets or database tables. A pandas DataFrame can be considered a set of columns, with each column being a pandas Series. Each column also has a heading, which is the name property of the Series, and each row has a label, which is referred to as an element of its associated index object.
The DataFrame’s index is shown to the left of the DataFrame. It’s not part of the original band_members.csv source file, but is added as part of the DataFrame creation process. It’s this index object you’re learning to reset.
The index of a DataFrame is an additional column of labels that helps you identify rows. When used in combination with column headings, it allows you to access specific data within your DataFrame. The default index labels are a sequence of integers, but you can use strings to make them more meaningful. You can actually use any hashable type for your index, but integers, strings, and timestamps are the most common.
Note: Although indexes are certainly useful in pandas, an alternative to pandas is the new high-performance Polars library, which eliminates them in favor of row numbers. This may come as a surprise, but aside from being used for selecting rows or columns, indexes aren’t often used when analyzing DataFrames. Also, row numbers always remain sequential when rows are added or removed in a Polars DataFrame. This isn’t the case with indexes in pandas.
Read the full article at https://realpython.com/pandas-reset-index/ »[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
Julien Tayon: The crudest CRUD of them all : the smallest CRUD possible in 150 lines of python
For this to begin, I am not really motivated in beginning with a full fledged MVC (Model View Controller) à la django because there is a lot of boilerplates and actions to do before a result. But, it has a lot of feature I want, including authentication, authorization and handling security.
For prototypes we normally flavours lightweight framework (à la flask), and CRUD.
CRUD approach is a factorisation of all framework in a single dynamic form that adapts itself to the model to generate HTML forms to input data, tabulate, REST endpoints and search them from the python class declaration and generate the database model. One language to rule them all : PYTHON. You can easily generate even the javascript to handle autocompletion on the generated view from python with enough talent.
But before using a CRUD framework, we need a cruder one, ugly, disgusting but useful for a human before building the REST APIs, writing the class in python, the HTML form, and the controlers.
I call this the crudest CRUD of them all.
Think hard at what you want when prototyping ...
- to write no CONTROLLERS ; flask documentation has a very verbose approach to exposing routes and writing them, writing controller for embasing and searching databases is boring
- to write the fewer HTML views possible, one and only onle would be great ;
- to avoid having to fiddle the many files reflecting separation of concerns : the lesser python files and class you touch the better;
- to avoid having to write SQL nor use an ORM (at least a verbose declarative one) ;
- show me your code and you can mesmerize and even fool me, however show me your data structure and I'll know everthing I have to know about your application : data structure should be under your nose in a readable fashion in the code;/
- to have AT LEAST one end point for inserting and searching so that curl can be used to begin automation and testing, preferably in a factorisable fashion;
- only one point of failure is accepted
Once we set these few condition we see whatever we do WE NEED a dynamic http server at the core. Python being the topic here, we are gonna do it in python.
What is the simplest dynamic web server in python ?
The reference implementation of wsgi that is the crudest wsgi server of them all : wsgiref. And you don't need to download it since it's provided in python stdlib.
First thing first, we are gonna had a default view so that we can serve an HTML static page with the list of the minimal HTML we need to interact with data : sets of input and forms.
Here, we stop. And we see that these forms are describing the data model.
Wouldn't it be nice if we could parse the HTML form easily with a tool from the standard library : html.parser and maybe deduce the database model and even more than fields coud add relationship, and well since we are dreaming : what about creating the tables on the fly from the form if they don't exists ?
The encoding of the relationship do require an hijack of convention where when the parser cross a name of the field in the form whatever_id it deduces it is a foreign key to table « whatever », column « id ».
Once this is done, we can parse the html, do some magick to match HTML input types to database types (adapter) and it's almost over. We can even dream of creating the database if it does not exists in a oneliner for sqlite.
We just need to throw away all the frugality of dependencies by the window and spoil our karma of « digital soberty » by adding the almighty sqlalchemy the crudest (but still heavy) ORM when it comes of the field of the introspective features of an ORM to map a database object to a python object in a clear consistent way. With this, just one function is needed in the controller to switch from embasing (POST method) and searching (GET).
Well, if the DOM is passed in the request. So of course I see the critics here :
- we can't pass the DOM in the request because the HTML form ignores the DOM
- You are not scared of error 415 (request too large) in the get method if you pass the DOM ?
Since we are human we would also like the form to be readable when served, because, well, human don't read the source and can't see the name attributes of the input. A tad of improving the raw html would be nice. It would also give consistency. It will also diminishes the required size of the formular to send. Here, javascript again is the right anwser. Fine, we serve the static page in the top of the controller. Let's use jquery to make it terse enough. Oh, if we have Javascript, wouldn't il be able to clone the part of the invented model tag inside every form so now we can pass the relevant part of the DOM to the controller ?
I think we have everything to write the crudest CRUD server of them all :D
Happy code reading : import multipart from wsgiref.simple_server import make_server from json import dumps from sqlalchemy import create_engine, MetaData, Table, Column from sqlalchemy import Integer, String, Float, Date, DateTime,UnicodeText, ForeignKey from html.parser import HTMLParser from sqlalchemy.ext.automap import automap_base from sqlalchemy.orm import Session from sqlalchemy import select from sqlalchemy import create_engine from sqlalchemy_utils import database_exists, create_database from urllib.parse import parse_qsl, urlparse engine = create_engine("postgresql://jul@192.168.1.32/pdca") if not database_exists(engine.url): create_database(engine.url) tables = dict() class HTMLtoData(HTMLParser): def __init__(self): global engine, tables self.cols = [] self.table = "" self.tables= [] self.engine= engine self.meta = MetaData() super().__init__() def handle_starttag(self, tag, attrs): attrs = dict(attrs) if tag == "input": if attrs.get("name") == "id": self.cols += [ Column('id', Integer, primary_key = True), ] return try: if attrs.get("name").endswith("_id"): table,_=attrs.get("name").split("_") self.cols += [ Column(attrs["name"], Integer, ForeignKey(table + ".id")) ] return except Exception as e: print(e) if attrs["type"] in ("email", "url", "phone", "text"): self.cols += [ Column(attrs["name"], UnicodeText ), ] if attrs["type"] == "number": if attrs["step"] == "any": self.cols+= [ Columns(attrs["name"], Float), ] else: self.cols+= [ Column(attrs["name"], Integer), ] if attrs["type"] == "date": self.cols += [ Column(attrs["name"], Date) ] if attrs["type"] == "datetime": self.cols += [ Column(attrs["name"], DateTime) ] if attrs["type"] == "time": self.cols += [ Column(attrs["name"], Time) ] if tag== "form": self.table = urlparse(attrs["action"]).path[1:] def handle_endtag(self, tag): if tag=="form": self.tables += [ Table(self.table, self.meta, *self.cols), ] tables[self.table] = self.tables[-1] self.table = "" self.cols = [] with engine.connect() as cnx: self.meta.create_all(engine) cnx.commit() html = """ <!doctype html> <html> <head> <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.7.1/jquery.min.js"></script> <script> $(document).ready(function() { $("form").each((i,el) => { $(el).wrap("<fieldset>"+ el.action + "</fieldset>" ); $(el).append("<input type=submit value=insert formmethod=post ><input type=submit value=search formmethod=get />"); }); $("input:not([type=hidden],[type=submit])").each((i,el) => { $(el).before("<label>" + el.name+ "</label><br/>"); $(el).after("<br>"); }); }); </script> </head> <body> <form action=/user > <input type=number name=id /> <input type=text name=name /> <input type=email name=email > </form> <form action=/event > <input type=number name=id /> <input type=date name=date /> <input type=text name=text /> <input type=number name=user_id /> </form> </body> </html> """ router = dict({"" : lambda fo: html,}) def simple_app(environ, start_response): fo,fi=multipart.parse_form_data(environ) fo.update(**{ k: dict( name=fi.filename, content=fi.file.read().decode('utf-8', 'backslashreplace'), content_type=fi.content_type, ) for k,v in fi.items()}) table = route = environ["PATH_INFO"][1:] fo.update(**dict(parse_qsl(environ["QUERY_STRING"]))) start_response('200 OK', [('Content-type', 'text/html; charset=utf-8')]) try: HTMLtoData().feed(html) except KeyError: pass metadata = MetaData() metadata.reflect(bind=engine) Base = automap_base(metadata=metadata) Base.prepare() if route in tables.keys(): with Session(engine) as session: Item = getattr(Base.classes, table) if environ.get("REQUEST_METHOD", "GET") == "POST": new_item = Item(**{ k:v for k,v in fo.items() if v and not k.startswith("_")}) session.add(new_item) ret=session.commit() fo["insert_result"] = new_item.id if environ.get("REQUEST_METHOD") == "GET": result = [] for elt in session.execute( select(Item).filter_by(**{ k : v for k,v in fo.items() if v and not k.startswith("_")})).all(): result += [{ k.name:getattr(elt[0],k.name) for k in tables[table].columns}] fo["search_result"] = result return [ router.get(route,lambda fo:dumps(fo.dict, indent=4, default=str))(fo).encode() ] print("Crudest CRDU of them all on port 5000...") make_server('', 5000, simple_app).serve_forever()
1xINTERNET blog: Why choosing a reliable migration partner is crucial for a successful transition from Drupal 7
The end of life of Drupal 7 is just around the corner and selecting the right migration partner is crucial for a smooth, cost-effective, and future-proof transition. Find out how 1xINTERNET and Pantheon’s unique solution can support your organisation!
1xINTERNET blog: Why choosing a reliable migration partner is crucial for a successful transition from Drupal 7
The end of life of Drupal 7 is just around the corner and selecting the right migration partner is crucial for a smooth, cost-effective, and future-proof transition. Find out how 1xINTERNET and Pantheon’s unique solution can support your organisation!
Daniel Lange: Weird times ... or how the New York DEC decided the US presidential elections
November 2024 will be known as the time when killing peanut, a pet squirrel, by the New York State DEC swung the US presidential elections and shaped history forever.
The hundreds of millions of dollars spent on each side, the tireless campaigning by the candidates, the celebrity endorsements ... all made for an open race for months. Investments evened each other out.
But an OnlyFans producer showing people an overreaching, bureaucracy driven State raiding his home to confiscate a pet squirrel and kill it ... swung enough voters to decide the elections.
That is what we need to understand in times of instant worldwide publication and a mostly attention driven economy: Human fates, elections, economic cycles and wars can be decided by people killing squirrels.
RIP, peanut.
P.S.: Trump Media & Technology Group Corp. (DJT) stock is up 30% pre-market.
Little Wayland Things
While I do have a Qt git build on my machine that I use for development, I usually only test individual applications and functionality but hardly ever run my full Plasma session on it. This means that for day-to-day use I typically only get to enjoy new Qt features once they have actually been released.
Proper modal dialogs under Wayland (note the darkened editor window) thanks to XDG Dialog and the new Qt 6.8One feature I talked about in the very last issue of “On the road to Plasma 6” is a nice API for XDG Foreign. To recap: it’s a Wayland protocol that lets an application export a window to another one so it can can attach a window to it. For example, the XDG Desktop Portal wants to attach the “Open File” dialog as if it were coming from the application that requested it.
Of course we don’t want to write low-level Wayland code and instead have an easy to use API for it. The KWindowSystem::setMainWindow function does just that: hand in a window and the token you received from the other application (created through KWaylandExtras::exportWindow) and it takes care of everything else. Presumably, you want to set the parent window before showing your dialog to make absolutely sure it’s set up properly.
However, Qt did not have an API to tell us when the underlying XDG Toplevel (think: a regular desktop-y window with a title bar and what not) had been created. We were only told when the basic wl_surface was created, which was too early, or the window was exposed/shown, at which point it was already flashing up in the user’s task bar. Hence, I added a new QWaylandWindow::surfaceRoleCreated (and corresponding surfaceRoleDestroyed) signal. Utilizing that, the aforementioned KWindowSystem API now works perfectly.
Another major addition to Qt Wayland that I have been looking forward to very much is support for the XDG Dialog protocol. While a window could have always had a parent (e.g. a popup menu or settings dialog parented to the application’s main window), there was no concept of a “modal” dialog. Therefore, we did not support the “dim parent” effect under Wayland that darkens a window to indicate it cannot be interacted with. More importantly, KWin couldn’t take it into account for its focus handling either. It happily let you focus a blocked window but the application would then just ignore your input.
There’s only one Dolphin running here!This was most noticeable for me when Alt+Tab’ing back and forth, for example using the “Open File” dialog in one application and then trying to switch to the other to verify where the file was actually located. Instead of cycling between the file dialog and the other application, it would alternate between the file dialog and the blocked main window.
Sadly, even when I upgraded to Qt 6.8 the situation didn’t improve. I noticed that Alt+Tab actually showed the dialog twice. This looked like a bug and sure enough comparing it to the Plasma 5.27 LTS session on my other computer proved that it used to work at some point. At first I didn’t spot anything obvious until I noticed a small typo that must have slipped in during some major refactoring. Instead of not including the main window when it had a modal child, it included the modal child once again! Sure enough, adding an exclamation mark (the logical NOT operator in C++) did the trick.
If you want to support more good people such as myself, consider donating to the KDE End of Year Fundraiser!
drunomics: Low-code + Decoupled Drupal: The Power of Custom Elements 3.0
Matt Layman: Deploy Your Own Web App With Kamal 2
HDR and color management in KWin, part 5: HDR on SDR laptops
This one required a few other features to be implemented first, so let’s jump right in.
Matching reference luminancesA big part of what a desktop compositor needs to get right with HDR content is to show SDR and HDR content properly side by side. KWin 6.0 added an SDR brightness slider for that purpose, but that’s only half the equation - what about the brightness of HDR content?
When we say “HDR”, usually that refers to a colorspace with the rec.2020 primaries and the perceptual quantizer (PQ) transfer function. A transfer function describes how to calculate a real brightness value from the “electrical” signal encoded in the content - PQ specifically has encoded values from 0 to 1 and brightness values from 0 to 10000 nits. For reference, your typical office monitor does around 300 or 400 nits at maximum brightness setting, and many newer phones can go a bit above 1000 nits.
Now if we want to show HDR content on an HDR screen, the most straight forward thing to do would be to just calculate the brightness values, write them to the screen and be done with it, right? That’s what KWin did up to Plasma 6.1, but it’s far from ideal. Even if your display can show the full range of requested brightness values, you might want to adjust the brightness to match your environment - be it brighter or darker than the room the content was optimized for - and when there’s SDR things in HDR content, like subtitles in a video, that should ideally match other SDR content on the screen as well.
Luckily, there is a preexisting relationship between HDR and SDR that we can use: The reference luminance. It defines how bright SDR white is - which is why another name for it is simply “SDR white”.
As we want to keep the brightness slider working, we won’t map SDR content to the reference luminance of any HDR transfer function though, but instead we map both SDR and HDR content to the SDR brightness setting. If we have an HDR video that uses the PQ transfer function, that reference luminance is 203 nits. If your SDR brightness setting is at 406 nits, KWin will just multiply the brightness of the HDR video with a factor of 2.
This doesn’t only mean that we can make SDR and HDR content fit together nicely on HDR screens, but it also means we now know what to do when we have HDR content on an SDR screen: We map the reference luminance from the video to SDR white on the screen. That’s of course not enough to make it look nice though…
Tone mappingEspecially with HDR presented on an SDR screen, but also on many HDR screens, it will happen that the content brightness exceeds the display capabilities. To handle this, starting with Plasma 6.2, whenever the HDR metadata of the content says it’s brighter than the display can go, KWin will apply tone mapping.
Doing this tone mapping in RGB can result in changing the content quite badly though. Let’s take a look by using the most simple “tone mapping” function there is, clipping. It just limits the red, green and blue values separately to the brightness that the screen can show.
If we have a pixel with the value [2.0, 0.0, 2.0] and a maximum brightness of 1.0, that gets mapped to [1.0, 0.0, 1.0] - which is the same purple, just in darker. But if the pixel has the values [2.0, 0.0, 1.0], then that gets mapped to [1.0, 0.0, 1.0], even though the source color was significantly more red!
To fix that, KWin’s tone mapping uses ICtCp. This is a color space developed by Dolby, in which the perceived brightness (aka Intensity) is separated from the chroma components (Ct = blue-yellow, Cp = red-green), which is perfect for tone mapping. KWin’s shaders thus transform the RGB content to ICtCp, apply a brightness mapping function to only the intensity component, and then convert back to RGB.
The result of that algorithm looks like this:
RGB clipping KWin 6.2’s tone mapping MPV’s tone mappingAs you can see, there’s still some color changes going on in comparison to MPV’s algorithm; this is partially because the tone mapping curve still needs some more adjustments, and partially because we also still need to do similar mapping for colors that the screen can’t actually show. It’s already a large improvement though, and does better than the built-in tone mapping functionality in many HDR screens.
When tone mapping HDR content on SDR screens, we always end up reducing the brightness of the overall image, so that we have some brightness values to map the really bright highlights in the video to - otherwise everything just slightly over the reference luminance would look like an overexposed blob of color, as you can see in the “RGB clipping” image. There are ways around that though…
HDR on SDR laptop displaysTo explain the reasoning behind this, it helps to first have a look at what even makes a display “HDR”. In many cases it’s just marketing nonsense, a label that’s put on displays to make them seem more fancy and desirable, but in others there’s an actual tangible benefit to it.
Let’s take OLED displays as an example, as it’s considered one of the display technologies where HDR really shines. When you drive an OLED at high brightness levels, it becomes quite inefficient, it draws a lot of power and generates a lot of heat. Both of these things can only be dealt with to a limited degree, so OLED displays can generally only be used with relatively low average brightness levels. They can go a lot brighter than the average in a small part of the screen though, and that’s why they benefit so much from HDR - you can show a scene that’s on average only 200 nits bright, with the sky in the image going up to 300 nits, the sun going up to 1000 nits and the ground only doing 150 nits.
Now let’s compare that to SDR laptop displays. In the case of most LCDs, you have a single backlight LED for the whole screen, and when you move the brightness slider, the power the backlight is driven at is changed. So there’s no way to make parts of the screen brighter than the rest on a hardware level… but that doesn’t mean there isn’t a way to do it in software!
When we want to show HDR content and the brightness slider is below 100%, KWin increases the backlight level to get a peak brightness that matches the relative peak brightness of that content (as far as that’s possible). At the same time it changes the colorspace description on the output to match that change: While the reference luminance stays the same, the maximum luminance of the transfer function gets increased in proportion to the increase in backlight brightness.
The results is that SDR white gets mapped to a reduced RGB value, which is at least supposed to exactly counteract the increase of brightness that we’re applying with the backlight, while HDR content that goes beyond the reference luminance gets to use the full brightness range.
Increasing the backlight power of course doesn’t come without downsides; black levels and power usage both get increased, so this is only ever active if there’s HDR content on the screen with valid HDR metadata that signals brightness levels going beyond the reference luminance.
As always, capturing HDR content with a phone camera is quite difficult, but I think you can at least sort of see the effect:
without backlight adjustment with backlight adjustmentThis feature has been merged into KWin’s git master branch and will be available on all laptop displays starting with Plasma 6.3. I really recommend trying it for yourself once it reaches your distribution!
TestDriven.io: Avoid Counting in Django Pagination
FSF Anniversary Logo Contest
FSF Blogs: Forty years of commitment to software freedom
Forty years of commitment to software freedom
PreviousNext: PowerBI Dashboard: Addressing content currency
Post co-authored with NSW Resources. A critical issue with the management of content currency on our Drupal website, nsw.gov.au/nswresources required an innovative solution to provide us with an automated content audit process.
by luhur.rizal / 6 November 2024Due to the complexity and size of our Drupal web presence, ensuring each page was up-to-date and reviewed by the appropriate business unit became increasingly challenging. We needed a tool to track how long it had been since a page was reviewed, set specific periods for future reviews and easily identify the page owners for each section of the website. Furthermore, with the required frequency of daily updates to the site, the solution had to be ‘live’ to accurately reflect these changes.
Choosing the right solutionTo tackle these challenges, we collaborated with our Drupal web development partner, PreviousNext, to create a live .csv file of all relevant web pages. This file included custom metadata detailing each page’s review frequency, page owner, date of last page update, date of last content review, publishing status and the next scheduled review date. By using this .csv file as a data source, we built a user-friendly content audit report dashboard in Microsoft Power BI.
The PowerBI dashboard provides executives with a high-level overview of which sections of the website are most in need of review. A complementary dashboard for ‘content champions’ offers a more granular view of the status of each individual page, enabling targeted content management.
Power BI implementationImplementing this solution involved several steps:
Internal stakeholder consultationWe engaged with the various business units in NSW Resources division to identify page owners and establish appropriate review periods for each section of the website.
Metadata assignmentMetadata bulk-uploaded to the pages included the custom metadata fields created for the project, such as review periods and page owners.
Data manipulation in PowerBIData from the .csv file was manipulated within PowerBI to ensure that columns were in the correct format. We created a 'Review status' column based on the next date of review to provide clear visibility. We also filtered out any unpublished or archived content to make it more streamlined.
PowerBI buildUsing the dataset from the website's metadata and Google Analytics, we built a comprehensive dashboard in PowerBI Desktop and then uploaded it to PowerBI Service for broader access. Live links to the web pages were integrated into the dashboards for easy navigation.
Executive overview reportWe developed a high-level summary report that shows how many pages each business unit is responsible for and includes Google Analytics page views from the past 30 days.
Content audit reportThis report provides filtering by review status and sub-areas within each business unit, offering direct links to the listed web pages and conditional formatting utilising a traffic light system for review status indicators.
Overall trackerDesigned to be used exclusively by the NSW Resources website team, this site-wide and document overview provides an overall tracker, including documents, events, and articles that are not part of other reports.
Internal integrationsPowerBI reports were integrated into Microsoft Teams and the SharePoint intranet, facilitating use across the business.
Internal work request formThe form was updated to distinguish whether a web update is part of a comprehensive content audit review, therefore requiring a review period reset, or just a minor adjustment that means the review status remains unchanged.
A content audit that works for everyoneThe project required extensive consultation to define the scope and needs of each business unit. Following this, identifying the correct page owners, along with setting appropriate review periods, posed significant challenges.
As business units sometimes want entire website sections to be marked as reviewed with a change in the review period, the Drupal-side dashboard allowed for bulk changes to both owners and review periods by uploading a revised version of the .csv file, saving substantial time.
Understanding the correct licensing requirements for PowerBI was another challenge. After consulting with our internal IT team, a group workspace was set up under an enterprise agreement, and an individual licence was obtained for the team member managing the dashboards.
Testing the PowerBI dashboardTo ensure effectiveness, the solution was initially tested in a development environment that mirrored the production site. This approach allowed us to test the limitations and user experience prior to going live. During this phase, we tested the bulk upload of the .csv file to update page metadata.
A soft launch of the content audit dashboard provided valuable insights, such as the realisation that a three-month review period was too short, given the number of pages each business unit manages.
As a result of this testing period, we made minor adjustments, such as requiring a defined review period for each page and allowing users to opt-out of updates considered a ‘review’ for reporting purposes. This might include, for example, correcting a typo, which doesn’t constitute a page review in the context of the content audit dashboard.
Transforming content auditingThis solution significantly enhanced reporting capabilities across NSW Resources, reducing the need for manual intervention. Now, page updates are easily reflected in the report dashboard automatically.
The PowerBI dashboards offer real-time updates and clear visibility of page ownership and review status, making it easier for business units to manage their content.
Business units can independently track the currency of their pages without needing data from the digital team, streamlining the process and increasing efficiency.
Future plans for the dashboardThe solution will continue to evolve. We plan to use the work done on the PowerBI integration to inform future website improvements with the potential for further Google Analytics data integration.
A document audit is currently a separate and opt-in process for business units, but future plans may involve greater integration.
ConclusionOverall, this innovative solution addresses a critical need for NSW Resources by providing a robust, automated and user-friendly content audit process that adapts to the dynamic nature of our Drupal website.
PyCoder’s Weekly: Issue #654 (Nov. 5, 2024)
#654 – NOVEMBER 5, 2024
View in Browser »
What goes into building a spreadsheet application in Python that runs in the browser? How do you make it launch quickly, and where do you store the cells of data? This week on the show, we speak with Chris Laffra about his project, PySheets, and his book “Communication for Engineers.”
REAL PYTHON podcast
Python 3.13 included a new version of the REPL which has the ability to define keyboard shortcuts. This article shows you how to create one and warns you about potential hangups.
TREY HUNNER
Say goodbye to managing failures, network outages, flaky endpoints, and long-running processes. Temporal ensures your code never fails. Period. PLUS, you can get started today on Temporal Cloud with $1,000 free credits on us →
TEMPORAL TECHNOLOGIES sponsor
To better understand just where the performance cost of running tests comes from, Anders ran a million empty tests. This post talks about what he did and the final results.
ANDERS HOVMOLLER
Currently, CPython signs its artifacts with both PGP and Sigstore. Removing the PGP signature has been proposed, but that has implications: Sigstore is still new enough that many Linux distributions don’t support it yet.
JOE BROCKMEIER
In this video course, you’ll learn what magic methods are in Python, how they work, and how to use them in your custom classes to support powerful features in your object-oriented code.
REAL PYTHON course
As GenAI and LLMs rapidly evolve, the impact of data leaks and unsafe AI outputs makes it critical to secure your AI infrastructure. Learn how MLOps and ML Platform teams can use the newly launched Guardrails Pro to secure AI operations — enabling faster, safer adoption of LLMs at scale →
GUARDRAILS sponsor
In the real world, things decay over time. In the digital world things get kept forever, and sometimes that shouldn’t be so. Designing for deletion is hard.
ARMIN RONACHER
Bite code! does their monthly Python news wrap-up. Check out stories on 3.13, proposed template strings, dependency groups in pyproject.toml, and more.
BITE CODE!
This project uses computer vision solution to automate doing inventory of products in retail, using YOLOv8 and image embeddings for precise detection.
ALBERT FERRÉ • Shared by Albert Ferré
Context managers enable you to create “template” code with initialization and clean up to make the code that uses them easier to read and understand.
JUHA-MATTI SANTALA
This post celebrating ten years of Django Girls talks about how it got started, what they’re hoping to do, and how you can get involved.
DJANGO GIRLS
This quick TIL post talks about five useful pytest options that let you control what tests to run with respect to failing tests.
RODRIGO GIRÃO SERRÃO
This post shows you how to return values from coroutines that have been concurrently executed using asyncio.gather().
JASON BROWNLEE
This list contains the recorded talks from the PyBay 2024 conference.
YOUTUBE video
CRISTIANOPIZZAMIGLIO.COM • Shared by Cristiano Pizzamiglio
jamesql: In-Memory NoSQL Database in Python Events Weekly Real Python Office Hours Q&A (Virtual) November 6, 2024
REALPYTHON.COM
November 7, 2024
MEETUP.COM
November 7, 2024
SYPY.ORG
November 9, 2024
MEETUP.COM
November 12, 2024
PITERPY.COM
November 14 to November 16, 2024
PYCON.SE
November 16 to November 17, 2024
PYCON.HK
November 16 to November 17, 2024
PYCON.JP
November 16 to November 18, 2024
PYTHON.IE
Happy Pythoning!
This was PyCoder’s Weekly Issue #654.
View in Browser »
[ Subscribe to 🐍 PyCoder’s Weekly 💌 – Get the best Python news, articles, and tutorials delivered to your inbox once a week >> Click here to learn more ]
Droptica: Product Search Engine in Drupal with Apache Solr Integration - How-to Guide
Product search is a key function in e-commerce today. This article will show how to create an advanced product search engine based on Drupal and its integration with Apache Solr. By combining Drupal, Droopler installation profile, and Solr, a powerful tool can be created to make it easier for customers to navigate and search large data sets faster. I encourage you to read the blog post or watch the video in the “Nowoczesny Drupal” series (the video is in Polish).
FSF Events: Free Software Directory meeting on IRC: Friday, November 8, starting at 12:00 EST (17:00 UTC)
Real Python: Introduction to Web Scraping With Python
Web scraping is the process of collecting and parsing raw data from the Web, and the Python community has come up with some pretty powerful web scraping tools.
The Internet hosts perhaps the greatest source of information on the planet. Many disciplines, such as data science, business intelligence, and investigative reporting, can benefit enormously from collecting and analyzing data from websites.
In this video course, you’ll learn how to:
- Parse website data using string methods and regular expressions
- Parse website data using an HTML parser
- Interact with forms and other website components
[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]