Planet Python

Subscribe to Planet Python feed
Planet Python - http://planetpython.org/
Updated: 22 min 26 sec ago

TechBeamers Python: Pandas GroupBy() and Count() Explained With Examples

Thu, 2024-01-25 10:04

Pandas GroupBy and Count work in combination and are valuable in various data analysis scenarios. The groupby function is used to group a data frame by one or more columns, and the count function is used to count the occurrences of each group. When combined, they can provide a convenient way to perform group-wise counting […]

The post Pandas GroupBy() and Count() Explained With Examples appeared first on TechBeamers.

Categories: FLOSS Project Planets

TechBeamers Python: Top Important Terms in Python Programming With Examples

Thu, 2024-01-25 03:40

In this tutorial, we have captured the important terms used in Python programming. If you are learning Python, it is good to be aware of different programming concepts and slang related to Python. Please note that these terms form the foundation of Python programming, and a solid understanding of them is essential for effective development […]

The post Top Important Terms in Python Programming With Examples appeared first on TechBeamers.

Categories: FLOSS Project Planets

Glyph Lefkowitz: The Macintosh

Thu, 2024-01-25 01:31

Today is the 40th anniversary of the announcement of the Macintosh. Others have articulated compelling emotional narratives that easily eclipse my own similar childhood memories of the Macintosh family of computers. So instead, I will ask a question:

What is the Macintosh?

As this is the anniversary of the beginning, that is where I will begin. The original Macintosh, the classic MacOS, the original “System Software” are a shining example of “fake it till you make it”. The original mac operating system was fake.

Don’t get me wrong, it was an impressive technical achievement to fake something like this, but what Steve Jobs did was to see a demo of a Smalltalk-76 system, an object-oriented programming environment with 1-to-1 correspondences between graphical objects on screen and runtime-introspectable data structures, a self-hosting high level programming language, memory safety, message passing, garbage collection, and many other advanced facilities that would not be popularized for decades, and make a fake version of it which ran on hardware that consumers could actually afford, by throwing out most of what made the programming environment interesting and replacing it with a much more memory-efficient illusion implemented in 68000 assembler and Pascal.

The machine’s RAM didn’t have room for a kernel. Whatever application was running was in control of the whole system. No protected memory, no preemptive multitasking. It was a house of cards that was destined to collapse. And collapse it did, both in the short term and the long. In the short term, the system was buggy and unstable, and application crashes resulted in system halts and reboots.

In the longer term, the company based on the Macintosh effectively went out of business and was reverse-acquired by NeXT, but they kept the better-known branding of the older company. The old operating system was gradually disposed of, quickly replaced at its core with a significantly more mature generation of operating system technology based on BSD UNIX and Mach. With the removal of Carbon compatibility 4 years ago, the last vestigial traces of it were removed. But even as early as 2004 the Mac was no longer really the Macintosh.

What NeXT had built was much closer to the Smalltalk system that Jobs was originally attempting to emulate. Its programming language, “Objective C” explicitly called back to Smalltalk’s message-passing, right down to the syntax. Objects on the screen now did correspond to “objects” you could send messages to. The development environment understood this too; that was a major selling point.

The NeXSTEP operating system and Objective C runtime did not have garbage collection, but it provided a similar developer experience by providing reference-counting throughout its object model. The original vision was finally achieved, for real, and that’s what we have on our desks and in our backpacks today (and in our pockets, in the form of the iPhone, which is in some sense a tiny next-generation NeXT computer itself).

The one detail I will relate from my own childhood is this: my first computer was not a Mac. My first computer, as a child, was an Amiga. When I was 5, I had a computer with 4096 colors, real multitasking, 3D graphics, and a paint program that could draw hard real-time animations with palette tricks. Then the writing was on the wall for Commodore and I got a computer which had 256 colors, a bunch of old software that was still black and white, an operating system that would freeze if you held down the mouse button on the menu bar and couldn’t even play animations smoothly. Many will relay their first encounter with the Mac as a kind of magic, but mine was a feeling of loss and disappointment. Unlike almost everyone at the time, I knew what a computer really could be, and despite many pleasant and formative experiences with the Macintosh in the meanwhile, it would be a decade before I saw a real one again.

But this is not to deride the faking. The faking was necessary. Xerox was not going to put an Alto running Smalltalk on anyone’s desk. People have always grumbled that Apple products are expensive, but in 2024 dollars, one of these Xerox computers cost roughly $55,000.

The Amiga was, in its own way, a similar sort of fake. It managed its own miracles by putting performance-critical functions into dedicated hardware which rapidly became obsolete as software technology evolved much more rapidly.

Jobs is celebrated as a genius of product design, and he certainly wasn’t bad at it, but I had the rare privilege of seeing the homework he was cribbing from in that subject, and in my estimation he was a B student at best. Where he got an A was bringing a vision to life by creating an organization, both inside and outside of his companies.

If you want a culture-defining technological artifact, everybody in the culture has to be able to get their hands on one. This doesn’t just mean that the builder has to be able to build it. The buyer also has to be able to afford it, obviously. Developers have to be able to develop for it. The buyer has to actually want it; the much-derided “marketing” is a necessary part of the process of making a product what it is. Everyone needs to be able to move together in the direction of the same technological future.

This is why it was so fitting that Tim Cook was made Jobs's successor. The supply chain was the hard part.

The crowning, final achievement of Jobs’s career was the fact that not only did he fake it — the fakes were flying fast and thick at that time in history, even if they mostly weren’t as good — it was that he faked it and then he built the real version and then he bridged the transitions to get to the real thing.

I began here by saying that the Mac isn’t really the Mac, and speaking in terms of a point in time analysis that is true. Its technology today has practically nothing in common with its technology in 1984. This is not merely an artifact of the length of time here: the technology at the core of various UNIXes in 1984 bears a lot of resemblance of UNIX-like operating systems today1. But looking across its whole history from 1984 to 2024, there is undeniably a continuity to the conceptual “Macintosh”.

Not just as a user, but as a developer moving through time rather than looking at just a few points: the “Macintosh”, such as it is, has transitioned from the Motorola 68000 to the PowerPC to Intel 32-bit to Intel 64-bit to ARM. From obscurely proprietary to enthusiastically embracing open source and then, sadly, much of the way back again. It moved from black and white to color, from desktop to laptop, from Carbon to Cocoa, from Display PostScript to Display PDF, all the while preserving instantly recognizable iconic features like the apple menu and the cursor pointer, while providing developers documentation and SDKs and training sessions that helped them transition their apps through multiple near-complete rewrites as a result of all of these changes.

To paraphrase Abigail Thorne’s first video about Identity, identity is what survives. The Macintosh is an interesting case study in the survival of the idea of a platform, as distinct from the platform itself. It is the Computer of Theseus, a thought experiment successfully brought to life and sustained over time.

If there is a personal lesson to be learned here, I’d say it’s that one’s own efforts need not be perfect. In fact, a significantly flawed vision that you can achieve right now is often much, much better than a perfect version that might take just a little bit longer, if you don’t have the resources to actually sustain going that much longer2. You have to be bad at things before you can be good at them. Real artists, as Jobs famously put it, ship.

So my contribution to the 40th anniversary reflections is to say: the Macintosh is dead. Long live the Mac.

Acknowledgments

Thank you to my patrons who are supporting my writing on this blog. If you like what you’ve read here and you’d like to read more of it, or you’d like to support my various open-source endeavors, you can support me on Patreon as well!

  1. including, ironically, the modern macOS. 

  2. And that is why I am posting this right now, rather than proofreading it further. 

Categories: FLOSS Project Planets

Glyph Lefkowitz: Unsigned Commits

Wed, 2024-01-24 19:29

I am going to tell you why I don’t think you should sign your Git commits, even though doing so with SSH keys is now easier than ever. But first, to contextualize my objection, I have a brief hypothetical for you, and then a bit of history from the evolution of security on the web.

It seems like these days, everybody’s signing all different kinds of papers.

Bank forms, permission slips, power of attorney; it seems like if you want to securely validate a document, you’ve gotta sign it.

So I have invented a machine that automatically signs every document on your desk, just in case it needs your signature. Signing is good for security, so you should probably get one, and turn it on, just in case something needs your signature on it.

We also want to make sure that verifying your signature is easy, so we will have them all notarized and duplicates stored permanently and publicly for future reference.

No? Not interested?

Hopefully, that sounded like a silly idea to you.

Most adults in modern civilization have learned that signing your name to a document has an effect. It is not merely decorative; the words in the document being signed have some specific meaning and can be enforced against you.

In some ways the metaphor of “signing” in cryptography is bad. One does not “sign” things with “keys” in real life. But here, it is spot on: a cryptographic signature can have an effect.

It should be an input to some software, one that is acted upon. Software does a thing differently depending on the presence or absence of a signature. If it doesn’t, the signature probably shouldn’t be there.

Consider the most venerable example of encryption and signing that we all deal with every day: HTTPS. Many years ago, browsers would happily display unencrypted web pages. The browser would also encrypt the connection, if the server operator had paid for an expensive certificate and correctly configured their server. If that operator messed up the encryption, it would pop up a helpful dialog box that would tell the user “This website did something wrong that you cannot possibly understand. Would you like to ignore this and keep working?” with buttons that said “Yes” and “No”.

Of course, these are not the precise words that were written. The words, as written, said things about “information you exchange” and “security certificate” and “certifying authorities” but “Yes” and “No” were the words that most users read. Predictably, most users just clicked “Yes”.

In the usual case, where users ignored these warnings, it meant that no user ever got meaningful security from HTTPS. It was a component of the web stack that did nothing but funnel money into the pockets of certificate authorities and occasionally present annoying interruptions to users.

In the case where the user carefully read and honored these warnings in the spirit they were intended, adding any sort of transport security to your website was a potential liability. If you got everything perfectly correct, nothing happened except the browser would display a picture of a small green purse. If you made any small mistake, it would scare users off and thereby directly harm your business. You would only want to do it if you were doing something that put a big enough target on your site that you became unusually interesting to attackers, or were required to do so by some contractual obligation like credit card companies.

Keep in mind that the second case here is the best case.

In 2016, the browser makers noticed this problem and started taking some pretty aggressive steps towards actually enforcing the security that HTTPS was supposed to provide, by fixing the user interface to do the right thing. If your site didn’t have security, it would be shown as “Not Secure”, a subtle warning that would gradually escalate in intensity as time went on, correctly incentivizing site operators to adopt transport security certificates. On the user interface side, certificate errors would be significantly harder to disregard, making it so that users who didn’t understand what they were seeing would actually be stopped from doing the dangerous thing.

Nothing fundamental1 changed about the technical aspects of the cryptographic primitives or constructions being used by HTTPS in this time period, but socially, the meaning of an HTTP server signing and encrypting its requests changed a lot.

Now, let’s consider signing Git commits.

You may have heard that in some abstract sense you “should” be signing your commits. GitHub puts a little green “verified” badge next to commits that are signed, which is neat, I guess. They provide “security”. 1Password provides a nice UI for setting it up. If you’re not a 1Password user, GitHub itself recommends you put in just a few lines of configuration to do it with either a GPG, SSH, or even an S/MIME key.

But while GitHub’s documentation quite lucidly tells you how to sign your commits, its explanation of why is somewhat less clear. Their purse is the word “Verified”; it’s still green. If you enable “vigilant mode”, you can make the blank “no verification status” option say “Unverified”, but not much else changes.

This is like the old-style HTTPS verification “Yes”/“No” dialog, except that there is not even an interruption to your workflow. They might put the “Unverified” status on there, but they’ve gone ahead and clicked “Yes” for you.

It is tempting to think that the “HTTPS” metaphor will map neatly onto Git commit signatures. It was bad when the web wasn’t using HTTPS, and the next step in that process was for Let’s Encrypt to come along and for the browsers to fix their implementations. Getting your certificates properly set up in the meanwhile and becoming familiar with the tools for properly doing HTTPS was unambiguously a good thing for an engineer to do. I did, and I’m quite glad I did so!

However, there is a significant difference: signing and encrypting an HTTPS request is ephemeral; signing a Git commit is functionally permanent.

This ephemeral nature meant that errors in the early HTTPS landscape were easily fixable. Earlier I mentioned that there was a time where you might not want to set up HTTPS on your production web servers, because any small screw-up would break your site and thereby your business. But if you were really skilled and you could see the future coming, you could set up monitoring, avoid these mistakes, and rapidly recover. These mistakes didn’t need to badly break your site.

We can extend the analogy to HTTPS, but we have to take a detour into one of the more unpleasant mistakes in HTTPS’s history: HTTP Public Key Pinning, or “HPKP”. The idea with HPKP was that you could publish a record in an HTTP header where your site commits2 to using certain certificate authorities for a period of time, where that period of time could be “forever”. Attackers gonna attack, and attack they did. Even without getting attacked, a site could easily commit “HPKP Suicide” where they would pin the wrong certificate authority with a long timeline, and their site was effectively gone for every browser that had ever seen those pins. As a result, after a few years, HPKP was completely removed from all browsers.

Git commit signing is even worse. With HPKP, you could easily make terrible mistakes with permanent consequences even though you knew the exact meaning of the data you were putting into the system at the time you were doing it. With signed commits, you are saying something permanently, but you don’t really know what it is that you’re saying.

Today, what is the benefit of signing a Git commit? GitHub might present it as “Verified”. It’s worth noting that only GitHub will do this, since they are the root of trust for this signing scheme. So, by signing commits and registering your keys with GitHub, you are, at best, helping to lock in GitHub as a permanent piece of infrastructure that is even harder to dislodge because they are not only where your code is stored, but also the arbiters of whether or not it is trustworthy.

In the future, what is the possible security benefit? If we all collectively decide we want Git to be more secure, then we will need to meaningfully treat signed commits differently from unsigned ones.

There’s a long tail of unsigned commits several billion entries long. And those are in the permanent record as much as the signed ones are, so future tooling will have to be able to deal with them. If, as stewards of Git, we wish to move towards a more secure Git, as the stewards of the web moved towards a more secure web, we do not have the option that the web did. In the browser, the meaning of a plain-text HTTP or incorrectly-signed HTTPS site changed, in order to encourage the site’s operator to change the site to be HTTPS.

In contrast, the meaning of an unsigned commit cannot change, because there are zillions of unsigned commits lying around in critical infrastructure and we need them to remain there. Commits cannot meaningfully be changed to become signed retroactively. Unlike an online website, they are part of a historical record, not an operating program. So we cannot establish the difference in treatment by changing how unsigned commits are treated.

That means that tooling maintainers will need to provide some difference in behavior that provides some incentive. With HTTPS where the binary choice was clear: don’t present sites with incorrect, potentially compromised configurations to users. The question was just how to achieve that. With Git commits, the difference in treatment of a “trusted” commit is far less clear.

If you will forgive me a slight straw-man here, one possible naive interpretation is that a “trusted” signed commit is that it’s OK to run in CI. Conveniently, it’s not simply “trusted” in a general sense. If you signed it, it’s trusted to be from you, specifically. Surely it’s fine if we bill the CI costs for validating the PR that includes that signed commit to your GitHub account?

Now, someone can piggy-back off a 1-line typo fix that you made on top of an unsigned commit to some large repo, making you implicitly responsible for transitively signing all unsigned parent commits, even though you haven’t looked at any of the code.

Remember, also, that the only central authority that is practically trustable at this point is your GitHub account. That means that if you are using a third-party CI system, even if you’re using a third-party Git host, you can only run “trusted” code if GitHub is online and responding to requests for its “get me the trusted signing keys for this user” API. This also adds a lot of value to a GitHub credential breach, strongly motivating attackers to sneakily attach their own keys to your account so that their commits in unrelated repos can be “Verified” by you.

Let’s review the pros and cons of turning on commit signing now, before you know what it is going to be used for:

Pro Con Green “Verified” badge Unknown, possibly unlimited future liability for the consequences of running code in a commit you signed Further implicitly cementing GitHub as a centralized trust authority in the open source world Introducing unknown reliability problems into infrastructure that relies on commit signatures Temporary breach of your GitHub credentials now lead to potentially permanent consequences if someone can smuggle a new trusted key in there New kinds of ongoing process overhead as commit-signing keys become new permanent load-bearing infrastructure, like “what do I do with expired keys”, “how often should I rotate these”, and so on

I feel like the “Con” column is coming out ahead.

That probably seemed like increasingly unhinged hyperbole, and it was.

In reality, the consequences are unlikely to be nearly so dramatic. The status quo has a very high amount of inertia, and probably the “Verified” badge will remain the only visible difference, except for a few repo-specific esoteric workflows, like pushing trust verification into offline or sandboxed build systems. I do still think that there is some potential for nefariousness around the “unknown and unlimited” dimension of any future plans that might rely on verifying signed commits, but any flaws are likely to be subtle attack chains and not anything flashy and obvious.

But I think that one of the biggest problems in information security is a lack of threat modeling. We encrypt things, we sign things, we institute rotation policies and elaborate useless rules for passwords, because we are looking for a “best practice” that is going to save us from having to think about what our actual security problems are.

I think the actual harm of signing git commits is to perpetuate an engineering culture of unquestioningly cargo-culting sophisticated and complex tools like cryptographic signatures into new contexts where they have no use.

Just from a baseline utilitarian philosophical perspective, for a given action A, all else being equal, it’s always better not to do A, because taking an action always has some non-zero opportunity cost even if it is just the time taken to do it. Epsilon cost and zero benefit is still a net harm. This is even more true in the context of a complex system. Any action taken in response to a rule in a system is going to interact with all the other rules in that system. You have to pay complexity-rent on every new rule. So an apparently-useless embellishment like signing commits can have potentially far-reaching consequences in the future.

Git commit signing itself is not particularly consequential. I have probably spent more time writing this blog post than the sum total of all the time wasted by all programmers configuring their git clients to add useless signatures; even the relatively modest readership of this blog will likely transfer more data reading this post than all those signatures will take to transmit to the various git clients that will read them. If I just convince you not to sign your commits, I don’t think I’m coming out ahead in the felicific calculus here.

What I am actually trying to point out here is that it is useful to carefully consider how to avoid adding junk complexity to your systems. One area where junk tends to leak in to designs and to cultures particularly easily is in intimidating subjects like trust and safety, where it is easy to get anxious and convince ourselves that piling on more stuff is safer than leaving things simple.

If I can help you avoid adding even a little bit of unnecessary complexity, I think it will have been well worth the cost of the writing, and the reading.

Acknowledgments

Thank you to my patrons who are supporting my writing on this blog. If you like what you’ve read here and you’d like to read more of it, or you’d like to support my various open-source endeavors, you can support me on Patreon as well! I am also available for consulting work if you think your organization could benefit from expertise on topics such as “What else should I not apply a cryptographic signature to?”.

  1. Yes yes I know about heartbleed and Bleichenbacher attacks and adoption of forward-secret ciphers and CRIME and BREACH and none of that is relevant here, okay? Jeez. 

  2. Do you see what I did there. 

Categories: FLOSS Project Planets

Matt Layman: Payments Gateway - Building SaaS with Python and Django#181

Wed, 2024-01-24 19:00
In this episode, we continued on the Stripe integration. I worked on a new payments gateway interface to access the Stripe APIs needed for creating a check out session. We hit some bumps along the way because of djstripe’s new preference for putting the Stripe keys into the database exclusively.
Categories: FLOSS Project Planets

Bruno Ponne / Coding The Past: Explore art with SQL and pd.read_sql_query

Wed, 2024-01-24 19:00


Greetings, humanists, social and data scientists!


Have you ever tried to load a large file in Python or R? Sometimes, when we have file sizes in the order of gigabytes, you may experience problems of performance with your program taking an unusually long time to load the data. SQL, or Structured Query Language, is used to deal with larger data files stored in relational databases and is widely used in the industry and even in research. Apart from being more efficient to prepare data, in your journey, you might encounter data sources whose main form of access is through SQL.


In this lesson you will learn how to use SQL in Python to retrieve data from a relational data base of the National Gallery of Art (US). You will also learn how to use a relational database management system (RDBMS) and pd.read_sql_query to extract data from it in Python.



1. Data source

The database used in this lesson is made available by National Gallery of Art (US) under a Creative Commons Zero license. The dataset contains data about more than 130,000 artworks and their artists since the Middle Ages until the present day.


It is a wonderful resource to study history and art. Variables available include the title of the artwork, dimensions, author, description, location, country where it was produced, the year the artist started the work and the year he or she finished it. These variables are only some examples, but there is much more to explore.



2. Download and install PostgreSQL and pgAdmin

PostgreSQL is a free and very popular relational database management system. It stores and manages the tables contained in a database. Please, consult this guide to install it in your computer.


After you install PostgreSQL, you will need to connect to the Postgre database server. In this tutorial, we will be using the pgAdmin application to establish this connection. It is a visual and intuitive interface and makes many operations easier to execute. The guide above will also guide you through the process of connecting to your local database. In the next steps, after being connected to your local database server, we will learn how to create a database that will store the National Gallery Dataset.


3. Creating the database and its tables

After you are connected to the server, click “Databases” with the right mouse button and choose “Create” and “Database…” as shown in the image below.



Next, give a title to your database as shown in the figure below. In our case, it will be called “art_db”. Click “Save” and it is all set!



With the database ‘art_bd’ selected, click the ‘Query Tool’ as shown below.


This will open a field where you can type SQL code. Our objective is to create the first table of our database, which will contain the content of ‘objects.csv’ available in the GitHub account of the National Gallery of Art, provided in the Data section above.


To create a table, we must specify the name and the variable type for each variable in the table. The SQL command to create a table is quite intuitive: CREATE TABLE name_of_your_table. Copy the code below and paste it in the window opened by the ‘Query Tool’. The code specify each variable of the objects table. This table contains information on each artwork available in the collection.


content_copy Copy

CREATE TABLE objects ( objectID integer NOT NULL, accessioned CHARACTER VARYING(32), accessionnum CHARACTER VARYING(32), locationid CHARACTER VARYING(32), title CHARACTER VARYING(2048), displaydate CHARACTER VARYING(256), beginyear integer, endyear integer, visualbrowsertimespan CHARACTER VARYING(32), medium CHARACTER VARYING(2048), dimensions CHARACTER VARYING(2048), inscription CHARACTER VARYING, markings CHARACTER VARYING, attributioninverted CHARACTER VARYING(1024), attribution CHARACTER VARYING(1024), provenancetext CHARACTER VARYING, creditline CHARACTER VARYING(2048), classification CHARACTER VARYING(64), subclassification CHARACTER VARYING(64), visualbrowserclassification CHARACTER VARYING(32), parentid CHARACTER VARYING(32), isvirtual CHARACTER VARYING(32), departmentabbr CHARACTER VARYING(32), portfolio CHARACTER VARYING(2048), series CHARACTER VARYING(850), volume CHARACTER VARYING(850), watermarks CHARACTER VARYING(512), lastdetectedmodification CHARACTER VARYING(64), wikidataid CHARACTER VARYING(64), customprinturl CHARACTER VARYING(512) );


The last step is to load the data from the csv file into this table. This can be done through the ‘COPY’ command as shown below.


content_copy Copy

COPY objects (objectid, accessioned, accessionnum, locationid, title, displaydate, beginyear, endyear, visualbrowsertimespan, medium, dimensions, inscription, markings, attributioninverted, attribution, provenancetext, creditline, classification, subclassification, visualbrowserclassification, parentid, isvirtual, departmentabbr, portfolio, series, volume, watermarks, lastdetectedmodification, wikidataid, customprinturl) FROM 'C:/temp/objects.csv' DELIMITER ',' CSV HEADER;


tips_and_updates   Download the "objects.csv" file and save it in the desired folder. Note however, that sometimes your system might block access to this file via pgAdmin. Therefore I saved it in the "temp" folder. In any case, change the path in the code above to match where you saved the "objects.csv" file.


Great! Now you should have your first table loaded to your database. The complete database includes more than 15 tables. However, we will only use two of them for this example, as shown in the scheme below. Note that the two tables relate to each other through the key variable objectid.



To load the “objects_terms” table, please repeat the same procedure with the code below.


content_copy Copy

CREATE TABLE objects_terms ( termid INTEGER, objectid INTEGER, termtype VARCHAR(64), term VARCHAR(256), visualbrowsertheme VARCHAR(32), visualbrowserstyle VARCHAR(64) ); COPY objects_terms (termid, objectid, termtype, term, visualbrowsertheme, visualbrowserstyle) FROM 'C:/temp/objects_terms.csv' DELIMITER ',' CSV HEADER;



4. Exploring the data with SQL commands

Click the ‘Query Tool’ to start exploring the data. First, select which variables you would like to include in your analysis. Second, you tell SQL in which table this variables are. The code below selects the variables title and attribution from the objects table. It also limits the result to 5 observations.


content_copy Copy

SELECT title, attribution FROM objects LIMIT 5


Now, we would like to know what are the different kinds of classification in this dataset. To achieve that, we have to select the classification variable, but including only distinct values.


content_copy Copy

SELECT DISTINCT(classification) FROM objects


The result tells us that there are 11 classifications: “Decorative Art”, “Drawing”, “Index of American Design”, “Painting”, “Photograph”, “Portfolio”, “Print”, “Sculpture”, “Technical Material”, “Time-Based Media Art” and “Volume”.


Finally, let us group the artworks by classification and count the number of objects in each category. COUNT(*) will count the total of items in the groups defined by GROUP BY. When you select a variable you can give it a new name with AS. Finally, the command ORDER BY orders the classification by number of items in a descending order (DESC).


content_copy Copy

SELECT classification, COUNT(*) as n_items FROM objects GROUP BY classification ORDER BY n_items DESC


Note that prints is the largest classification, followed by photographs.



5. Using pd.read_sql_query to access data

Now that you have your SQL database working, it is time to access it with Python. Before using Pandas, we have to connect Python to our SQL database. We will do that with psycopg2, a very popular PostgreSQL adapter for Python. Please, install it with pip install psycopg2.


We use the connect method of psycopg2 to establish the connection. It takes 4 main arguments:

  • host: in our case, the database is hosted locally, so we will pass localhost to this parameter. Note, however, that we could specify an IP if the server was external;
  • database: the name given to your SQL database, art_db;
  • user: user name required to authenticate;
  • password: your database password.


content_copy Copy

import psycopg2 import pandas as pd conn = psycopg2.connect( host="localhost", database="art_db", user="postgres", password="*******")


The next step is to store our SQL query in a string Python variable. The query below performs a LEFT JOIN with the two tables in our database. The operation uses the variable objectid to join the two tables. In practice we are selecting the titles, authors (attribution), classification - we keep only “Painting” with a WHERE command -, and term - we filter only terms that specify the “Style” of the painting.


content_copy Copy

command = ''' SELECT o.title, o.attribution, o.classification, ot.term FROM objects as o LEFT JOIN objects_terms as ot ON o.objectid = ot.objectid WHERE classification = 'Painting' AND termtype = 'Style' '''


Finally, we can extract the data. Use the cursor() method of conn to be able to “type” your SQL query. Pass the command variable and connection object to pd.read_sql_query and it will return a Pandas dataframe with the data we selected. Next, commit and close cursor and connections.


content_copy Copy

# open cursor to insert our query cur = conn.cursor() # use pd.read_sql_query to query our database and get the result in a pandas dataframe paintings = pd.read_sql_query(command, conn) # save any changes to the database conn.commit() # close cursor and connection cur.close() conn.close()


6. Visualizing the most popular styles

From the data we gathered from our database, we would like to check which are the 10 most popular art styles in our data, by number of paintings. We can use the value_counts() method of the column term to count how many paintings are classified in each style.


The result is a Pandas Series where the index contains the styles and the values contain the quantities of paintings of the respective style. The remaining code produces an horizontal bar plot showing the top 10 styles by number of paintings. If you would like to learn more about data visualization with matplotlib, please consult the lesson Storytelling with Matplotlib - Visualizing historical data.


content_copy Copy

import matplotlib.pyplot as plt top_10_styles = paintings['term'].value_counts().head(10) fig, ax = plt.subplots() ax.barh(top_10_styles.index, top_10_styles.values, color = "#f0027f", edgecolor = "#f0027f") ax.set_title("The Most Popular Styles") # inverts y axis ax.invert_yaxis() # eliminates grids ax.grid(False) # set ticks' colors to white ax.tick_params(axis='x', colors='white') ax.tick_params(axis='y', colors='white') # set font colors ax.set_facecolor('#2E3031') ax.title.set_color('white') # eliminates top, left and right borders and sets the bottom border color to white ax.spines["top"].set_visible(False) ax.spines["right"].set_visible(False) ax.spines["left"].set_visible(False) ax.spines["bottom"].set_color("white") # fig background color: fig.patch.set_facecolor('#2E3031')


Note that Realist, Baroque and Renaissance are the most popular art styles in our dataset.



Please feel free to share your thoughts and questions below!



6. Conclusions


  • It is possible to create a SQL database from csv files and access it with Python;
  • psycopg2 enables connection between Python and your SQL database;
  • pd.read_sql_query can be used to extract data into a Pandas dataframe.


Categories: FLOSS Project Planets

TechBeamers Python: How Do I Install Pip in Python?

Wed, 2024-01-24 12:40

In this tutorial, we’ll provide all the necessary steps for you to install Pip in Python on both Windows and Linux platforms. If you’re using a recent version of Python (Python 3.4 and above), pip is likely already installed. To check if pip is installed, open a command prompt or terminal and run: If it’s […]

The post How Do I Install Pip in Python? appeared first on TechBeamers.

Categories: FLOSS Project Planets

TechBeamers Python: How Do You Filter a List in Python?

Wed, 2024-01-24 09:33

In this tutorial, we’ll explain different methods to filter a list in Python with the help of multiple examples. You’ll learn to use the Python filter() function, list comprehension, and also use Python for loop to select elements from the list. Filter a List in Python With the Help of Examples As we know there […]

The post How Do You Filter a List in Python? appeared first on TechBeamers.

Categories: FLOSS Project Planets

Real Python: What Are Python Raw Strings?

Wed, 2024-01-24 09:00

If you’ve ever come across a standard string literal prefixed with either the lowercase letter r or the uppercase letter R, then you’ve encountered a Python raw string:

Python >>> r"This is a raw string" 'This is a raw string' Copied!

Although a raw string looks and behaves mostly the same as a normal string literal, there’s an important difference in how Python interprets some of its characters, which you’ll explore in this tutorial.

Notice that there’s nothing special about the resulting string object. Whether you declare your literal value using a prefix or not, you’ll always end up with a regular Python str object.

Other prefixes available at your fingertips, which you can use and sometimes even mix together in your Python string literals, include:

  • b: Bytes literal
  • f: Formatted string literal
  • u: Legacy Unicode string literal (PEP 414)

Out of those, you might be most familiar with f-strings, which let you evaluate expressions inside string literals. Raw strings aren’t as popular as f-strings, but they do have their own uses that can improve your code’s readability.

Creating a string of characters is often one of the first skills that you learn when studying a new programming language. The Python Basics book and learning path cover this topic right at the beginning. With Python, you can define string literals in your source code by delimiting the text with either single quotes (') or double quotes ("):

Python >>> david = 'She said "I love you" to me.' >>> alice = "Oh, that's wonderful to hear!" Copied!

Having such a choice can help you avoid a syntax error when your text includes one of those delimiting characters (' or "). For example, if you need to represent an apostrophe in a string, then you can enclose your text in double quotes. Alternatively, you can use multiline strings to mix both types of delimiters in the text.

You may use triple quotes (''' or """) to declare a multiline string literal that can accommodate a longer piece of text, such as an excerpt from the Zen of Python:

Python >>> poem = """ ... Beautiful is better than ugly. ... Explicit is better than implicit. ... Simple is better than complex. ... Complex is better than complicated. ... """ Copied!

Multiline string literals can optionally act as docstrings, a useful form of code documentation in Python. Docstrings can include bare-bones test cases known as doctests, as well.

Regardless of the delimiter type of your choice, you can always prepend a prefix to your string literal. Just make sure there’s no space between the prefix letters and the opening quote.

When you use the letter r as the prefix, you’ll turn the corresponding string literal into a raw string counterpart. So, what are Python raw strings exactly?

Free Bonus: Click here to download a cheatsheet that shows you the most useful Python escape character sequences.

Take the Quiz: Test your knowledge with our interactive “Python Raw Strings” quiz. Upon completion you will receive a score so you can track your learning progress over time:

Take the Quiz »

In Short: Python Raw Strings Ignore Escape Character Sequences

In some cases, defining a string through the raw string literal will produce precisely the same result as using the standard string literal in Python:

Python >>> r"I love you" == "I love you" True Copied!

Here, both literals represent string objects that share a common value: the text I love you. Even though the first literal comes with a prefix, it has no effect on the outcome, so both strings compare as equal.

To observe the real difference between raw and standard string literals in Python, consider a different example depicting a date formatted as a string:

Python >>> r"10\25\1991" == "10\25\1991" False Copied!

This time, the comparison turns out to be false even though the two string literals look visually similar. Unlike before, the resulting string objects no longer contain the same sequence of characters. The raw string’s prefix (r) changes the meaning of special character sequences that begin with a backslash (\) inside the literal.

Note: To understand how Python interprets the above string, head over to the final section of this tutorial, where you’ll cover the most common types of escape sequences in Python.

Read the full article at https://realpython.com/python-raw-strings/ »

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Ned Batchelder: You (probably) don’t need to learn C

Wed, 2024-01-24 06:38

On Mastodon I wrote that I was tired of people saying, “you should learn C so you can understand how a computer really works.” I got a lot of replies which did not change my mind, but helped me understand more how abstractions are inescapable in computers.

People made a number of claims. C was important because syscalls are defined in terms of C semantics (they are not). They said it was good for exploring limited-resource computers like Arduinos, but most people don’t program for those. They said it was important because C is more performant, but Python programs often offload the compute-intensive work to libraries other people have written, and these days that work is often on a GPU. Someone said you need it to debug with strace, then someone said they use strace all the time and don’t know C. Someone even said C was good because it explains why NUL isn’t allowed in filenames, but who tries to do that, and why learn a language just for that trivia?

I’m all for learning C if it will be useful for the job at hand, but you can write lots of great software without knowing C.

A few people repeated the idea that C teaches you how code “really” executes. But C is an abstract model of a computer, and modern CPUs do all kinds of things that C doesn’t show you or explain. Pipelining, cache misses, branch prediction, speculative execution, multiple cores, even virtual memory are all completely invisible to C programs.

C is an abstraction of how a computer works, and chip makers work hard to implement that abstraction, but they do it on top of much more complicated machinery.

C is far removed from modern computer architectures: there have been 50 years of innovation since it was created in the 1970’s. The gap between C’s model and modern hardware is the root cause of famous vulnerabilities like Meltdown and Spectre, as explained in C is Not a Low-level Language.

C can teach you useful things, like how memory is a huge array of bytes, but you can also learn that without writing C programs. People say, C teaches you about memory allocation. Yes it does, but you can learn what that means as a concept without learning a programming language. And besides, what will Python or Ruby developers do with that knowledge other than appreciate that their languages do that work for them and they no longer have to think about it?

Pointers came up a lot in the Mastodon replies. Pointers underpin concepts in higher-level languages, but you can explain those concepts as references instead, and skip pointer arithmetic, aliasing, and null pointers completely.

A question I asked a number of people: what mistakes are JavaScript/Ruby/Python developers making if they don’t know these things (C, syscalls, pointers)?”. I didn’t get strong answers.

We work in an enormous tower of abstractions. I write programs in Python, which provides me abstractions that C (its underlying implementation language) does not. C provides an abstract model of memory and CPU execution which the computer implements on top of other mechanisms (microcode and virtual memory). When I made a wire-wrapped computer, I could pretend the signal travelled through wires instantaneously. For other hardware designers, that abstraction breaks down and they need to consider the speed electricity travels. Sometimes you need to go one level deeper in the abstraction stack to understand what’s going on. Everyone has to find the right layer to work at.

Andy Gocke said it well:

When you no longer have problems at that layer, that’s when you can stop caring about that layer. I don’t think there’s a universal level of knowledge that people need or is sufficient.

“like jam or bootlaces” made another excellent point:

There’s a big difference between “everyone should know this” and “someone should know this” that seems to get glossed over in these kinds of discussions.

C can teach you many useful and interesting things. It will make you a better programmer, just as learning any new-to-you language will because it broadens your perspective. Some kinds of programming need C, though other languages like Rust are ably filling that role now too. C doesn’t teach you how a computer really works. It teaches you a common abstraction of how computers work.

Find a level of abstraction that works for what you need to do. When you have trouble there, look beneath that abstraction. You won’t be seeing how things really work, you’ll be seeing a lower-level abstraction that could be helpful. Sometimes what you need will be an abstraction one level up. Is your Python loop too slow? Perhaps you need a C loop. Or perhaps you need numpy array operations.

You (probably) don’t need to learn C.

Categories: FLOSS Project Planets

IslandT: How to search multiple lines with Python?

Wed, 2024-01-24 04:34

Often you will want to search for words or phrase in the entire paragraph and here is the python regular expression code which will do that.

pattern = re.compile(r'^\w+ (\w+) (\w+)', re.M)

We use the re.M flag which will search the entire paragraph for the match words.

Now let us try out the program above…

gad = pattern.findall("hello mr Islandt\nhello mr gadgets") print(gad)

…which will then display the following outcome

[('mr', 'Islandt'), ('mr', 'gadgets')]

Explanation :

The program above will look for two words in the first line and keeps them under a tuple and when the program meets the new line character it continues the search in the second line and return another tuple, both of the tuple will include inside a list. Using re.M flag the search will go on for multiple lines as long as there are more matches out there!

Categories: FLOSS Project Planets

PyBites: Exploring the Role of Static Methods in Python: A Functional Perspective

Wed, 2024-01-24 04:21
Introduction

Python’s versatility in supporting different programming paradigms, including procedural, object-oriented, and functional programming, opens up a rich landscape for software design and development.

Among these paradigms, the use of static methods in Python, particularly in an object-oriented context, has been a topic of debate.

This article delves into the role and implications of static methods in Python, weighing them against a more functional approach that leverages modules and functional programming principles.

The Nature of Static Methods in Python Definition and Usage:

Static methods in Python are defined within a class using the @staticmethod decorator.

Unlike regular methods, they do not require an instance (self) or class (cls) reference.

They are typically used for utility functions that logically belong to a class but are independent of class instances.

Example in Practice:

Consider this code example from Django:

# django/db/backends/oracle/operations.py class DatabaseOperations(BaseDatabaseOperations): ... other methods and attributes ... @staticmethod def convert_empty_string(value, expression, connection): return "" if value is None else value @staticmethod def convert_empty_bytes(value, expression, connection): return b"" if value is None else value

Here, convert_empty_string and convert_empty_bytes are static due to their utility nature and specific association with the DatabaseOperations class.

The Case for Modules and Functional Programming Embracing Python’s Module System:

Python’s module system allows for effective namespace management and code organization.

Namespaces are one honking great idea — let’s do more of those!

The Zen of Python, by Tim Peters

Functions, including those that could be static methods, can be organized in modules, making them reusable and easily accessible.

Functional Programming Advantages:
  1. Quick Development: Functional programming emphasizes simplicity and stateless operations, leading to concise and readable code.
  2. Code Resilience: Pure functions (functions that do not alter external state) enhance predictability and testability. Related: 10 Tips to Write Better Functions in Python
  3. Separation of Concerns: Using functions and modules promotes a clean separation of data representation (classes) and behavior (functions).
Combining Object-Oriented and Functional Approaches Hybrid Strategy:
  1. Abstraction with Classes: Use classes for data representation, encapsulating state and behavior that are closely related. See also our When to Use Classes article.
  2. Functional Constructs: Utilize functional concepts like higher-order functions, immutability, and pure functions for business logic and data manipulation.
  3. Factories and Observers: Implement design patterns like factory and observer for creating objects and managing state changes, respectively (shout-out to Brandon Rhodes’ awesome great design patterns guide!)
Conclusion: Striking the Right Balance

The decision to use static methods, standalone functions, or a functional programming approach in Python depends on several factors:

  • Relevance: Is the function logically part of a class’s responsibilities?
  • Reusability: Would the function be more versatile as a standalone module function?
  • Simplicity: Can the use of regular functions simplify the class structure and align with the Single Responsibility Principle? Related article: Tips for clean code in Python.

Ultimately, the choice lies in finding the right balance that aligns with the application’s architecture, maintainability, and the development team’s expertise.

Python, with its multi-paradigm capabilities , offers the flexibility to adopt a style that best suits the project’s needs.

Fun Fact: Static Methods Were an Accident

Guido added static methods as an accident! He originally meant to add class methods instead.

I think the reason is that a module at best acts as a class where every method is a *static* method, but implicitly so. Ad we all know how limited static methods are. (They’re basically an accident — back in the Python 2.2 days when I was inventing new-style classes and descriptors, I meant to implement class methods but at first I didn’t understand them and accidentally implemented static methods first. Then it was too late to remove them and only provide class methods.)

Guido van Rossum, see the discussion thread here, and thanks Will for pointing me to this. Call to Action

What’s your approach to using static methods in Python?

Do you favor a more functional style, or do you find static methods indispensable in certain scenarios?

Share your thoughts and experiences in our community

Categories: FLOSS Project Planets

eGenix.com: eGenix Antispam Bot for Telegram 0.6.0 GA

Wed, 2024-01-24 03:00
Introduction

eGenix has long been running a local user group meeting in Düsseldorf called Python Meeting Düsseldorf and we are using a Telegram group for most of our communication.

In the early days, the group worked well and we only had few spammers joining it, which we could well handle manually.

More recently, this has changed dramatically. We are seeing between 2-5 spam signups per day, often at night. Furthermore, the signups accounts are not always easy to spot as spammers, since they often come with profile images, descriptions, etc.

With the bot, we now have a more flexible way of dealing with the problem.

Please see our project page for details and download links.

Features
  • Low impact mode of operation: the bot tries to keep noise in the group to a minimum
  • Several challenge mechanisms to choose from, more can be added as needed
  • Flexible and easy to use configuration
  • Only needs a few MB of RAM, so can easily be put into a container or run on a Raspberry Pi
  • Can handle quite a bit of load due to the async implementation
  • Works with Python 3.9+
  • MIT open source licensed
News

The 0.6.0 release fixes a few bugs and adds more features:

  • Upgraded to pyrogram 2.0.106, which fixes a weird error we have been getting recently with the old version 1.4.16 (see pyrogram/pyrogram#1347)
  • Catch weird error from Telegram when deleting conversations; this seems to sometimes fail, probably due to a glitch on their side
  • Made the math and char entry challenges a little harder
  • Added new DictItemChallenge

    It has been battle-tested in production for several years already and is proving to be a really useful tool to help with Telegram group administration.

    More Information

    For more information on the eGenix.com Python products, licensing and download instructions, please write to sales@egenix.com.

    Enjoy !

    Marc-Andre Lemburg, eGenix.com

    Categories: FLOSS Project Planets

    Wing Tips: AI Assisted Development in Wing Pro

    Tue, 2024-01-23 20:00

    This Wing Tip introduces Wing Pro's AI assisted software development capabilities. Starting with Wing Pro version 10, you can use generative AI to write new code at the current editor insertion point, or you can use the AI tool to refactor, redesign, or extend existing code.

    Generative AI is astonishingly capable as a programmer's assistant. As long as you provide it with sufficient context and clear instructions, it can cleanly and correctly execute a wide variety of programming tasks.

    AI Code Suggestion

    Here is an example where Wing Pro's AI code suggestion capability is used to write a missing method for an existing class. The AI knows what to add because it can see what precedes and follows the insertion point in the editor. It infers from that context what code you would like it to produce:

    Shown above: Typing 'def get_full_name' followed by Ctrl-? to initiate AI suggestion mode. The suggested code is accepted by pressing Enter.

    AI Refactoring

    AI refactoring is even more powerful. You can request changes to existing code according to written instructions. For example, you might ask it to "convert this threaded implementation to run asynchronously instead":

    Shown above: Running the highlighted request in the AI tool to convert multithreaded code to run asynchronously instead.

    Description-Driven Development

    Wing Pro's AI refactoring tool can also be used to write new code at the current insertion point, according to written instructions. For example, you might ask it to "add client and server classes that expose all the public methods of FileManager to a client process using sockets and JSON":

    Shown above: Using the AI tool to request implementation of client/server classes for remote access to an existing class.

    Simpler and perhaps more common requests like "write documentation strings for these methods" and "create unit tests for class Person" of course also work. In general, Wing Pro's AI assistant can do any reasonably sized chunk of work for which you can clearly state instructions.

    Used correctly, this capability will have a significant impact on your productivity as a programmer. Instead of typing out code manually, your role changes to one of directing an intelligent assistant capable of completing a wide range of programming tasks very quickly. You will still need to review and accept or reject the AI's work. Generative AI can't replace you, but it allows you to concentrate much more on higher-level design and much less on implementation details.

    Getting Started

    Wing Pro uses OpenAI as its AI provider, and you will need to create and pay for your own OpenAI account before you can use this feature. You may need to pay up to US $50 up front to be given computational rate limits that are high enough to use AI for your software development. However, individual requests often cost less than a US$ 0.01. More complex tasks may cost up to 30 cents, if you provide a lot of context with them. This is still far less than the paid programmer time the AI is replacing.

    To use AI assisted development effectively, and you will need to learn how to create well-designed requests that provide the AI both with the necessary relevant context and clear and specific instructions. Please read all of the AI Assisted Development documentation for details on setup, framing requests, and monitoring costs. It takes a bit of time to get started, but it is well worth the effort incorporate generative AI into your tool chain.



    That's it for now! We'll be back soon with more Wing Tips for Wing Python IDE.

    As always, please don't hesitate to email support@wingware.com if you run into problems or have any questions.

    Categories: FLOSS Project Planets

    Seth Michael Larson: Releases on the Python Package Index are never “done”

    Tue, 2024-01-23 19:00
    Releases on the Python Package Index are never “done” AboutBlogNewsletterLinks Releases on the Python Package Index are never “done”

    Published 2024-01-24 by Seth Larson
    Reading time: minutes

    This critical role would not be possible without funding from the OpenSSF Alpha-Omega project. Massive thank-you to Alpha-Omega for investing in the security of the Python ecosystem! PEP 740 and open-ended PyPI releases

    PEP 740 is a proposal to add support for digital attestations to PyPI artifacts, for example publish provenance attestations, which can be verified and used by tooling.

    William Woodruff has been working on PEP 740 which is in draft on GitHub, William addressed my feedback this week. During this work the open-endedness of PyPI releases came up during our discussion, specifically how it is a common gotcha for folks designing tools and policy for multiple software ecosystems difficult.

    What does it mean for PyPI releases to be open-ended? It means that you can always upload new files to an existing release on PyPI even if the release has been created for years. This is because a PyPI “release” is only a thin layer aggregating a bunch of files on PyPI that happen to share the same version.

    This discussion between us was opened up as a wider discussion on discuss.python.org about this property. Summarizing this discussion:

    • New Python releases mean new wheels need to be built for non-ABI3 compatible projects. IMO this is the most compelling reason to keep this property.
    • Draft releases seem semi-related, being able to put artifacts into a "queue" before making them public.
    • Ordering of which wheel gets evaluated as an installation candidate isn't defined well. Up to installers, tends to be more specific -> less specific.
    • PyPI doesn't allow single files to be yanked even though PEP 592 allows for yanking at the file level instead of only the release level.
    • The "attack" vector is fairly small, this property would mostly only provide additional secrecy for attackers by blending into existing releases.
    CPython Software Bill-of-Materials update

    CPython 3.13.0a3 was released, this is the very first CPython release that contains any SBOM metadata at all, and thus we can create an initial draft SBOM document.

    Much of the work on CPython's SBOMs was done to fix issues related to pip's vendored dependencies and issues found by downstream distributors of CPython builds like Red Hat. The issues were as follows:

    All of these issues are mostly related and touch the same place in the codebase, so resulted in a medium-sized pull request to fix all the issues together.

    On the release side, I've addressed feedback from the first round of reviews for generating SBOMs for source code artifacts and uploading them during the release. Once those SBOMs start being generated they'll automatically begin being added to python.org/downloads.

    Other items

    That's all for this week! 👋 If you're interested in more you can read last week's report.

    Thanks for reading! ♡ Did you find this article helpful and want more content like it? Get notified of new posts by subscribing to the RSS feed or the email newsletter.

    This work is licensed under CC BY-SA 4.0

    Categories: FLOSS Project Planets

    Kay Hayen: Nuitka Package Configuration Part 3

    Tue, 2024-01-23 18:00

    This is the third part of a post series under the tag package_config that explains the Nuitka package configuration in more detail. To recap, Nuitka package configuration is the way Nuitka learns about hidden dependencies, needed DLLs, data files, and just generally avoids bloat in the compilation. The details are here on a dedicate page on the web site in Nuitka Package Configuration but reading on will be just fine.

    Problem Package

    Each post will feature one package that caused a particular problem. In this case, we are talking about the package toga.

    Problems like with this package are typically encountered in standalone mode only, but they also affect accelerated mode, since it doesn’t compile all the things desired in that case. Some packages, and in this instance look at what OS they are running on, environment variables, etc. and then in a relatively static fashion, but one that Nuitka cannot see through, loads a what it calls “backend” module.

    We are going to look at that in some detail, and will see a workaround applied with the anti-bloat engine doing code modification on the fly that make the choice determined at compile time, and visible to Nuitka is this way.

    Initial Symptom

    The initial symptom reported was that toga did suffer from broken version lookups and therefor did not work, and we encountered even two things, that prevented it, one was about the version number. It was trying to do int after resolving the version of toga by itself to None.

    Traceback (most recent call last): File "C:\py\dist\toga1.py", line 1, in <module> File "<frozen importlib._bootstrap>", line 1176, in _find_and_load File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 690, in _load_unlocked File "C:\py\dist\toga\__init__.py", line 1, in <module toga> File "<frozen importlib._bootstrap>", line 1176, in _find_and_load File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 690, in _load_unlocked File "C:\py\dist\toga\app.py", line 20, in <module toga.app> File "<frozen importlib._bootstrap>", line 1176, in _find_and_load File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 690, in _load_unlocked File "C:\py\dist\toga\widgets\base.py", line 7, in <module toga.widgets.base> File "<frozen importlib._bootstrap>", line 1176, in _find_and_load File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 690, in _load_unlocked File "C:\py\dist\travertino\__init__.py", line 4, in <module travertino> File "<frozen importlib._bootstrap>", line 1176, in _find_and_load File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 690, in _load_unlocked File "C:\py\dist\setuptools_scm\__init__.py", line 7, in <module setuptools_scm> File "<frozen importlib._bootstrap>", line 1176, in _find_and_load File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 690, in _load_unlocked File "C:\py\dist\setuptools_scm\_config.py", line 15, in <module setuptools_scm._config> File "<frozen importlib._bootstrap>", line 1176, in _find_and_load File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 690, in _load_unlocked File "C:\py\dist\setuptools_scm\_integration\pyproject_reading.py", line 8, in <module setuptools_scm._integration.pyproject_reading> File "<frozen importlib._bootstrap>", line 1176, in _find_and_load File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 690, in _load_unlocked File "C:\py\dist\setuptools_scm\_integration\setuptools.py", line 62, in <module setuptools_scm._integration.setuptools> File "C:\py\dist\setuptools_scm\_integration\setuptools.py", line 29, in _warn_on_old_setuptools ValueError: invalid literal for int() with base 10: 'unknown'

    So, this is clearly something that we consider bloat in the first place, to runtime lookup your own version number. The use of setuptools_scm is implying the use of setuptools, for which the version cannot be determined, and that’s crashing.

    Step 1 - Analysis of initial crashing

    So first thing, we did was to repair setuptools, to know its version. It is doing it a bit different, because it cannot use itself. Our compile time optimization failed there, but also would be overkill. We never came across this, since we avoid setuptools very hard normally, but it’s not good to be incompatible.

    - module-name: 'setuptools.version' anti-bloat: - description: 'workaround for metadata version of setuptools' replacements: "pkg_resources.get_distribution('setuptools').version": "repr(__import__('setuptools.version').version.__version__)"

    We do not have to include all metadata for setuptools here, just to get that one item, so we chose to make a simple string replacement here, that just looks the value up at compile time and puts it into the source code automatically. That removes the pkg_resources.get_distribution() call entirely.

    With that, setuptools_scm was not crashing anymore. That’s good. But we don’t really want it to be included, since it’s good for dynamically detecting the version from git, and what not, but including the framework for building C extensions, not a good idea in the general case. Nuitka therefore said this:

    Nuitka-Plugins:WARNING: anti-bloat: Undesirable import of 'setuptools_scm' (intending to Nuitka-Plugins:WARNING: avoid 'setuptools') in 'toga' (at Nuitka-Plugins:WARNING: 'c:\3\Lib\site-packages\toga\__init__.py:99') encountered. It may Nuitka-Plugins:WARNING: slow down compilation. Nuitka-Plugins:WARNING: Complex topic! More information can be found at Nuitka-Plugins:WARNING: https://nuitka.net/info/unwanted-module.html

    So that’s informing the user to take action. And in the case of optional imports, i.e. ones where using code will handle the ImportError just fine and work without it, we can use do this.

    - module-name: 'toga' anti-bloat: - description: 'remove setuptools usage' no-auto-follow: 'setuptools_scm': '' when: 'not use_setuptools'

    He we say, no not automatically follow setuptools_scm reports, unless there is other code that still does it. In that way, the import still happens if some other part of the code imports the module, but only then. We no longer enforce the non-usage of a module here, we just make that decision based on other uses being present.

    With this the bloat warning, and the inclusion of setuptools_scm into the compilation is removed, and you always want to make as small as possible and remove those packages that do not contribute anything but overhead, aka bloat.

    The next thing discovered was that toga needs the toga-core distribution to version check. For that, we use the common solution, and tell that we want to include the metadata of it, for when toga is part of a compilation.

    - module-name: 'toga' data-files: include-metadata: - 'toga-core'

    So that moved the entire issue of version looks to resolved.

    Step 2 - Dynamic Backend dependency

    Now on to the backend issue. What remained was a need for including the platform specific backend. One that can even be overridden by an environment variable. For full compatibility, we invented something new. Typically what we would have done is to create a toga plugin for the following snippet.

    - module-name: 'toga.platform' variables: setup_code: 'import toga.platform' declarations: 'toga_backend_module_name': 'toga.platform.get_platform_factory().__name__' anti-bloat: - change_function: 'get_platform_factory': "'importlib.import_module(%r)' % get_variable('toga_backend_module_name')"

    There is a whole new thing here, a new feature that was added specifically for this to be easy to do. And with the backend selection being complex and partially dynamic code, we didn’t want to hard code that. So we added support for variables and their use in Nuitka Package Configuration.

    The first block variables defines a mapping of expressions in declarations that will be evaluated at compile time given the setup code under setup_code.

    This then allows us to have a variable with the name of the backend that toga decides to use. We then change the very complex function get_platform_factory that we used used, for compilation, to be replacement that Nuitka will be able to statically optimize and see the backend as a dependency and use it directly at run time, which is what we want.

    Final remarks

    I am hoping you will find this very helpful information and will join the effort to make packaging for Python work out of the box. Adding support for toga was a bit more complex, but with the new tool, once identified to be that kind of backend issue, it might have become a lot more easy.

    Lessons learned. We should cover packages that we routinely remove from compilation, like setuptools, but e.g. also IPython. This will have to added, such that setuptools_scm cannot cloud the vision to actual issues.

    Categories: FLOSS Project Planets

    Quansight Labs Blog: Captioning: A Newcomer’s Guide

    Tue, 2024-01-23 16:41
    What are those words on the bottom of your video screen and where do they come from? Captioning’s normalization in the past several decades may seem like it would render those questions moot, but understanding more about captions means making more informed decisions about when, how, and why we make sure information is accessible.
    Categories: FLOSS Project Planets

    PyCoder’s Weekly: Issue #613 (Jan. 23, 2024)

    Tue, 2024-01-23 14:30

    #613 – JANUARY 23, 2024
    View in Browser »

    Python Packaging, One Year Later: A Look Back at 2023

    This is a follow-on post to Chris’s article from last year called Fourteen tools at least twelve too many. “Are there still fourteen tools, or are there even more? Has Python packaging improved in a year?”
    CHRIS WARRICK

    Running Python on Air-Gapped Systems

    This post describes running Python code on a “soft” air-gapped system, one without direct internet access. Installing packages in a clean environment and moving them to the air-gapped machine has challenges. Read Ibrahim’s take on how he solved the problem.
    IBRAHIM AHMED

    Elevate Your Web Development with MongoDB’s Full Stack FastAPI App Generator

    Get ready to elevate your web development process with the newly released Full Stack FastAPI App Generator by MongoDB, offering a simplified setup process for building modern full-stack web applications with FastAPI and MongoDB →
    MONGODB sponsor

    Add Logging and Notification Messages to Flask Web Projects

    After you implement the main functionality of a web project, it’s good to understand how your users interact with your app and where they may run into errors. In this tutorial, you’ll enhance your Flask project by creating error pages and logging messages.
    REAL PYTHON

    Python 3.13.0 Alpha 3 Is Now Available

    CPYTHON DEV BLOG

    PSF Announces More Developer in Residence Roles

    PYTHON SOFTWARE FOUNDATION

    PSF Announces Foundation Fellow Members for Q3 2023

    PYTHON SOFTWARE FOUNDATION

    Discussions PEP 736: Shorthand Syntax for Keyword Arguments

    PYTHON.ORG

    Python Jobs Python Tutorial Editor (Anywhere)

    Real Python

    More Python Jobs >>>

    Articles & Tutorials Bias, Toxicity, and Truthfulness in LLMs With Python

    How can you measure the quality of a large language model? What tools can measure bias, toxicity, and truthfulness levels in a model using Python? This week on the show, Jodie Burchell, developer advocate for data science at JetBrains, returns to discuss techniques and tools for evaluating LLMs With Python.
    REAL PYTHON podcast

    Postgres vs. DynamoDB: Which Database to Choose

    This article presents various aspects you need to consider when choosing a database for your project - querying, performance, ORMs, migrations, etc. It shows how things are approached differently for Postgres vs. DynamoDB and includes examples in Python.
    JAN GIACOMELLI • Shared by Jan Giacomelli

    Building with Temporal Cloud Webinar Series

    Hear from our technical team on how we’ve built Temporal Cloud to deliver world-class latency, performance, and availability for the smallest and largest workloads. Whether you’re using Temporal Cloud or self-host, this series will be full of insights into how to optimize your Temporal Service →
    TEMPORAL sponsor

    Python App Development: In-Depth Guide for Product Owners

    “As with every technology stack, Python has its advantages and limitations. The key to success is to use Python at the right time and in the right place.” This guide talks about what a product owner needs to know to take on a Python project.
    PAVLO PYLYPENKO • Shared by Alina

    HTTP Requests With Python’s urllib.request

    In this video course, you’ll explore how to make HTTP requests using Python’s handy built-in module, urllib.request. You’ll try out examples and go over common errors, all while learning more about HTTP requests and Python in general.
    REAL PYTHON course

    Beware of Misleading GPU vs CPU Benchmarks

    Nvidia has created GPU-based replacements for NumPy and other tools and promises significant speed-ups, but the comparison may not be accurate. Read on to learn if GPU replacements for CPU-based libraries are really that much faster.
    ITAMAR TURNER-TRAURING

    Django Migration Files: Automatic Clean-Up

    Your Django migrations are piling up in your repo? You want to clean them up without a hassle? Check out this new package django-migration-zero that helps make migration management a piece of cake!
    RONNY VEDRILLA • Shared by Sarah Boyce

    Understanding NumPy’s ndarray

    To understand NumPy, you need to understand the ndarray type. This article starts with Python’s native lists and shows you when you need to move to NumPy’s ndarray data type.
    STEPHEN GRUPPETTA • Shared by Stephen Gruppetta

    Type Information for Faster Python C Extensions

    PyPy is an alternative implementation of Python, and its C API compatibility layer has some performance issues. This article describes on-going work to improve its performance.
    MAX BERNSTEIN

    Fastest Way to Read Excel in Python

    It’s not uncommon to find yourself reading Excel in Python. This article compares several ways to read Excel from Python and how they perform.
    HAKI BENITA

    How Are Requests Processed in Flask?

    This article provides an in-depth walkthrough of how requests are processed in a Flask application.
    TESTDRIVEN.IO • Shared by Michael Herman

    Projects & Code harlequin: The SQL IDE for Your Terminal

    GITHUB.COM/TCONBEER

    AnyText: Multilingual Visual Text Generation and Editing

    GITHUB.COM/TYXSSPA

    Websocket CLI Testing Interface

    GITHUB.COM/LEWOUDAR • Shared by Kevin Tewouda

    Autometrics-py: Metrics to Debug in Production

    GITHUB.COM/AUTOMETRICS-DEV • Shared by Adelaide Telezhnikova

    django-cte: Common Table Expressions (CTE) for Django

    GITHUB.COM/DIMAGI

    Events Weekly Real Python Office Hours Q&A (Virtual)

    January 24, 2024
    REALPYTHON.COM

    SPb Python Drinkup

    January 25, 2024
    MEETUP.COM

    PyLadies Amsterdam: An Introduction to Conformal Prediction

    January 25, 2024
    MEETUP.COM

    PyDelhi User Group Meetup

    January 27, 2024
    MEETUP.COM

    PythOnRio Meetup

    January 27, 2024
    PYTHON.ORG.BR

    Happy Pythoning!
    This was PyCoder’s Weekly Issue #613.
    View in Browser »

    [ Subscribe to 🐍 PyCoder’s Weekly 💌 – Get the best Python news, articles, and tutorials delivered to your inbox once a week >> Click here to learn more ]

    Categories: FLOSS Project Planets

    TechBeamers Python: Python Map vs List Comprehension – The Difference Between the Two

    Tue, 2024-01-23 13:04

    In this tutorial, we’ll explain the difference between Python map vs list comprehension. Both map and list comprehensions are powerful tools in Python for applying functions to each element of a sequence. However, they have different strengths and weaknesses, making them suitable for different situations. Here’s a breakdown: What is the Difference Between the Python […]

    The post Python Map vs List Comprehension – The Difference Between the Two appeared first on TechBeamers.

    Categories: FLOSS Project Planets

    Pages