Planet Python
PyCharm: Data Exploration With pandas
Maybe you’ve heard complicated-sounding phrases such as ‘“Students t-test”, “regression models”, “support vector machines”, and so on. You might think there’s so much you need to learn before you can explore and understand your data, but I am going to show you two tools to help you go faster. These are summary statistics and graphs.
Summary statistics and graphs/plots are used by new and experienced data scientists alike, making them the perfect building blocks for exploring data.
We will be working with this dataset available from Kaggle if you’d like to follow along. I chose this dataset because it has several interesting properties, such as multiple continuous and categorical variables, missing data, and a variety of distributions and skews. I’ll explain each variable I work with and why I chose each one to show you the tools you can apply to your chosen data set.
In our previous blog posts, we looked at where to get data from and bring that data into PyCharm. You can look at steps 1 and 2 from our blog post entitled 7 ways to use Jupyter notebooks in PyCharm to create a new Jupyter notebook and import your data as a CSV file if you need a reminder. You can use the dataset I linked above or pick your own for this walkthrough.
We’re going to be using the pandas library in this blog post, so to ensure we’re all on the same page, your code should look something like the following block in a Jupyter notebook – you’ll need to change the spreadsheet name and location to yours, though. Make sure you’ve imported matplotlib, too, as we will be using that library to explore our data.
import pandas as pd import matplotlib as plt df = pd.read_csv('../data/AmesHousing.csv') dfWhen you run that cell, PyCharm will show you your DataFrame, and we can get started.
Try PyCharm Professional for free
Summary statisticsWhen we looked at where to get data from, we discussed continuous and categorical variables. We can use Jupyter notebooks inside PyCharm to generate different summary statistics for these, and, as you might have already guessed, the summary statistics differ depending on whether the variables are continuous or categorical.
Continuous variables summary statisticsFirst, let’s see how we can view our summary statistics. Click on the small bar graph icon on the right-hand side of your DataFrame and select Compact:
Let me give you a little tip here if you’re unsure which variables are continuous and which are categorical, PyCharm shows different summary statistics for each one. The ones with the mini graphs (blue in this screenshot) are continuous, and those without are categorical.
This data set has several continuous variables, such as Order, PID, MS SubClass, and more, but we will focus on Lot Frontage first. That is the amount of space at the front of the property.
The summary statistics already give us some clues:
There’s a lot of data here, so let’s break it down and explore it to understand it better. Immediately, we can see that we have missing data for this variable; that’s something we want to note, as it might mean we have some issues with the dataset, although we won’t go into that in this blog post!
First, you can see the little histogram in blue in my screenshot, which tells us that we have a positive skew in our data because the data tails off to the right. We can further confirm this with the data because the mean is slightly larger than the median. That’s not entirely surprising, given we’d expect the majority of lot frontages to be of a similar size, but perhaps there are a small number of luxury properties with much bigger lot frontages that are skewing our data. Given this skew, we would be well advised not to use the standard deviation as a measure of dispersion because that is calculated by using all data points, so it’s affected by outliers, which we know we have on one side of our distribution.
Next, we can calculate our interquartile range as the difference between our 25th percentile of 58.0 and our 75th percentile of 80.0, giving us an interquartile range of 22.0. Alongside the interquartile range, it’s helpful to consider the median, the middle value in our data, and unlike the mean, it is not based on every data point. The median is more appropriate for Lot Frontage than the mean because it’s not affected by the outliers we know we have.
Since we’re talking about the median and interquartile range, it is worth saying that box plots are a great way to represent these values visually. We can ask JetBrains AI Assistant to create one for us with a prompt such as this:
Create code using matplotlib for a box plot for ‘Lot Frontage’. Assume we have all necessary imports and the data exists.
Here’s the code that was generated:
plt.figure(figsize=(10, 6)) plt.boxplot(df['Lot Frontage'].dropna(), vert=False) plt.title('Box Plot of Lot Frontage') plt.xlabel('Lot Frontage') plt.show()When I click Accept and run, we get our box plot:
The median is the line inside the box, which, as you can see, is slightly to the left, confirming the presence of the positive or right-hand skew. The box plot also makes it very easy to see a noticeable number of outliers to the right of the box, known as “the tail”. That’s the small number of likely luxury properties that we suspect we have.
It’s important to note that coupling the mean and standard deviation or the median and IQR gives you two pieces of information for that data: a central tendency and the variance. For determining the central tendency, the mean is more prone to being affected by outliers, so it is best when there is no skew in your data, whereas the median is more robust in that regard. Likewise, for the variation, the standard deviation can be affected by outliers in your data. In contrast, the interquartile range will always tell you the distribution of the middle 50% of your data. Your goals determine which measurements you want to use.
Categorical variables summary statisticsWhen it comes to categorical variables in your data, you can use the summary statistics in PyCharm to find patterns. At this point, we need to be clear that we’re talking about descriptive rather than inferential statistics. That means we can see patterns, but we don’t know if they are significant.
Some examples of categorical data in this data set include MS Zoning, Lot Shape, and House Style. You can gain lots of insights just by looking through your data set. For example, looking at the categorical variable Neighborhood, the majority are stated as Other in the summary statistics with 75.8%. This tells you that there might well be a lot of categories in Neighborhood, which is something to bear in mind when we move on to graphs.
As another example, the categorical variable House Style states that about 50% of the houses are one-story, while 30% are two-story, leaving 20% that fall into some other category that you might want to explore in more detail. You can ask JetBrains AI for help here with a prompt like:
Write pandas code that tells me all the categories for ‘House Style’ in my DataFrame ‘df’, which already exists. Assume we have all the necessary imports and that the data exists.
Here’s the resulting code:
unique_house_styles = df['House Style'].unique() print("Unique categories for 'House Style':") print(unique_house_styles)When we run that we can see that the remaining 20% is split between various codes that we might want to research more to understand what they mean:
Unique categories for ‘House Style’:
['1Story' '2Story' '1.5Fin' 'SFoyer' 'SLvl' '2.5Unf' '1.5Unf' '2.5Fin']
Have a look through the data set at your categorical variables and see what insights you can gain!
Before we move on to graphs, I want to touch on one more piece of functionality inside PyCharm that you can use to access your summary statistics called Explain DataFrame. You can access it by clicking on the purple AI icon on the top-right of the DataFrame and then choosing AI Actions | Explain DataFrame.
JetBrains AI lists out your summary statistics but may also add some code snippets that are helpful for you to get your data journey started, such as how to drop missing values, filter rows based on a condition, select specific columns, as well as group and aggregate data.
GraphsGraphs or plots are a way of quickly getting patterns to pop out at you that might not be obvious when you’re looking at the numbers in the summary statistics. We’re going to look at some of the plots you can get PyCharm to generate to help you explore your data.
First, let’s revisit our continuous variable, Lot Frontage. We already learned that we have a positive or right-hand skew from the mini histogram in the summary statistics, but we want to know more!
In your DataFrame in PyCharm, click the Chart View icon on the left-hand side:
Now click the cog on the right-hand side of the chart that says Show series settings and select the Histogram plot icon on the far right-hand side. Click x to clear the values in the X axis and Y axis and then select Lot Frontage with group and sort for the X axis and Lot Frontage with count for the Y axis:
PyCharm generates the same histogram as you see in the summary settings, but we didn’t have to write a single line of code. We can also explore the histogram and mouse over data points to learn more.
Let’s take it to the next level while we’re here. Perhaps we want to see if the condition of the property, as captured by the Overall Cond variable, predicts the sale price.
Change your X axis SalePrice group and sort and your Y axis to SalePrice count and then add the group Overall Cond:
Looking at this chart, we can hypothesize that the overall condition of the property is indeed a predictor of the sale price, as the distribution and skew are remarkably similar. One small note is that grouping histograms like this works best when you have a smaller number of categories. If you change Groups to Neighborhood, which we know has many more categories, it becomes much harder to view!
Moving on, let’s stick with PyCharm’s plotting capabilities and explore bar graphs. These are a companion to frequency charts such as histograms, but can also be used for categorical data. Perhaps you are interested in Neighbourhood (a categorical variable) in relation to SalesPrice.
Click the Bar [chart] icon on the left-hand side of your series setting, then select Neighbourhood as Categories and SalesPrice with the median as the Values:
This helps us understand the neighborhoods with the most expensive and cheapest housing. I chose the median for the SalesPrice as it’s less susceptible to outliers in the data. For example, I can see that housing in Mitchel is likely to be substantially cheaper than in NoRidge.
Line plots are another useful plot for your toolkit. You can use these to demonstrate trends between continuous variables over a period of time. For example, select the Line [graph] icon and then choose Year Built as the X axis and SalePrice with the mean as the Y axis:
This suggests a small positive correlation between the year the house was built and the price of the house, especially after 1950. If you’re feeling adventurous, remove the mean from SalePrice and see how your graph changes when it has to plot every single price!
The last plot I’d like to draw your attention to is scatter plots. These are a great way to see a relationship between two continuous variables and any correlation between them. A correlation shows the strength of a relationship between two variables. To dig deeper, check out this beginner-friendly overview from Real Python.
For example, if we set our X axis to SalePrice and our Y axis to Gr LivArea, we can see that there is a positive correlation between the two variables, and we can also easily spot some outliers in our data, including a couple of houses with a lower sale price but a huge living area!
SummaryHere’s a reminder of what we’ve covered today. You can access your summary statistics in PyCharm either through Explain DataFrame with JetBrains AI or by clicking on the small graph icon on the right-hand side of a DataFrame called Column statistics and then selecting Compact. You can also use Detailed to get even more information than we’ve covered in this blog post.
You can get PyCharm to create graphs to explore your data and create hypotheses for further investigation. Some more commonly used ones are histograms, bar charts, line graphs, and scatter plots.
Finally, you can use JetBrains AI Assistant to generate code with natural language prompts in the AI tool window. This is a quick way to learn more about your data and start thinking about the insights on offer.
Download PyCharm Professional to try it out for yourself! Get an extended trial today and experience the difference PyCharm Professional can make in your data science endeavors. Use the promotion code “PyCharmNotebooks” at checkout to activate your free 60-day subscription to PyCharm Professional. The free subscription is available for individual users only.
Try PyCharm Professional for free
Using both summary statistics and graphs in PyCharm, we can learn a lot about our data, giving us a solid foundation for our next step – cleaning our data, which we will talk about in the next blog post in this series.
Real Python: Python's Magic Methods in Classes
As a Python developer who wants to harness the power of object-oriented programming, you’ll love to learn how to customize your classes using special methods, also known as magic methods or dunder methods. A special method is a method whose name starts and ends with a double underscore. These methods have special meanings in Python.
Python automatically calls magic methods as a response to certain operations, such as instantiation, sequence indexing, attribute managing, and much more. Magic methods support core object-oriented features in Python, so learning about them is fundamental for you as a Python programmer.
In this video course, you’ll:
- Learn what Python’s special or magic methods are
- Understand the magic behind magic methods in Python
- Customize different behaviors of your custom classes with special methods
[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
Real Python: Quiz: Using .__repr__() vs .__str__() in Python
In this quiz, you’ll test your understanding of Python’s .__repr__() and .__str__() special methods. These methods allow you to control how a program displays an object, making your classes more readable and easier to debug and maintain.
[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
Julien Tayon: Is chatgpt good at generating code for tuning a guitar ?
Science he was a patented CS engineer he wanted to prove me that my new guitar tuner was useless since AI could come with a better less convoluted exact example in less lines of code than mine (mine is adapted from a blog on audio processing and fast fourier transform because it was commented and was refreshing me the basics of signal processing).
And I asked him, have you ever heard of the Nyquist frequency ? or the tradeoff an oscilloscope have to do between time locality and accuracy ?
Of course he didn't. And was proud that a wannabee coder would be proven useless thanks to the dumbest AI.
So I actually made this guitar tuner because this time I wanted to have an exact figure around the Hertz.
The problem stated by FFT/Nyquist formula is that if I want an exact number around 1Hz (1 period per second) I should sample at least half a period (hence .5 second), and should not expect a good resolution.
The quoted chatgpt code takes 1024 samples out of 44100/sec, giving it a nice reactivity of 1/44th of a second with an accuracy of 44100/1024/2 => 21Hz.
I know chatgpt does not tune a guitar, but shouldn't the chatgpt user bragging about the superiority of pro as him remember that when we tune we may tune not only at A = 440Hz but maybe A = 432Hz or other ?
A note is defined as a racine 12th of 2 compared to an arbitrary reference (remember an octave is doubling => 12 half tones = 2) ; what makes a temperate scale is not the reference but the relationship between notes and this enable bands of instrument to tune themselves according to the most unreliable but also nice instrument which is the human voice.
Giving the user 3 decimal after the comma is called being precise : it makes you look like a genius in the eye of the crowd. Giving the user 0 decimal but accurate frequency is what makes you look dumb in the eyes of the computer science engineer, but it way more useful in real life.
Here I took the liberty with pysine to generate A on 6 octaves (ref A=440) and use the recommanded chatgpt buffer size acknowledged by a pro CS engineer for tuning your guitar and my default choices. for i in 55.0 110.0 220.0 440.0 880.0 1760.0 ; do python -m pysine $i 3; done Here is the result with a chunk size of 1024 : And here is the result with a chunk size corresponding to half a second of sampling :
I may not be a computer engineer, I am dumb, but checking with easy to use tools that your final result is in sync with your goal is for me more important than diplomas and professional formations.
The code is yet another basic animation in matplotlib with a nice arrow pointing the frequency best fitting item. It is not the best algorithm, but it does the job.
Showing the harmonics as well as the tonal has another benefit it answers the questions : why would I tune my string on the note of the upper string ?
Like tuning E on A ?
Well because -at least for my half broken guitar- it ensures to me that I will tune on the tonal note.
Here is a sample of tuning E on the empty string : And this is how tuning the E string on the A note looks like : And don't pay attention to the 56Hz residual noise triggered by my fans/appliance turning and making a constant noise :D Here is the code import pyaudio import matplotlib.pyplot as plt import matplotlib.animation as animation import numpy as np import time from sys import argv A = 440.0 try: A=float(argv[1]) except IndexError: pass form_1 = pyaudio.paInt16 # 16-bit resolution chans = 1 # 1 channel samp_rate = 44100 # 44.1kHz sampling rate chunk = 44100//2# .5 seconds of sampling for 1Hz accuracy audio = pyaudio.PyAudio() # create pyaudio instantiation # create pyaudio stream stream = audio.open( format = form_1,rate = samp_rate,channels = chans, input = True , frames_per_buffer=chunk ) fig = plt.figure(figsize=(13,8)) ax = fig.add_subplot(111) plt.grid(True) def compute_freq(ref, half_tones): return [ 1.0*ref*(2**((half_tones+12*i )/12)) for i in range(-4,4) ] print(compute_freq(A,0)) note_2_freq = dict( E = compute_freq(A,-5), A = compute_freq(A, 0), D = compute_freq(A, 5), G = compute_freq(A,-2), B = compute_freq(A, 2), ) resolution = samp_rate/(2*chunk) def closest_to(freq): res = dict() for note, freqs in note_2_freq.items(): res[note]=max(freqs) for f in freqs: res[note]= min(res[note], abs(freq -f)) note,diff_freq = sorted(res.items(), key = lambda item : item[1])[0] for f in note_2_freq[note]: if abs(freq-f) == diff_freq: return "%s %s %2.1f %d" % ( note, abs(freq - f ) < resolution and "=" or ( freq > f and "+" or "-"), abs(freq-f), freq ) def init_func(): plt.rcParams['font.size']=18 plt.xlabel('Frequency [Hz]') plt.ylabel('Amplitude [Arbitry Unit]') plt.grid(True) ax.set_xscale('log') ax.set_yscale('log') ax.set_xticks( note_2_freq["E"] + note_2_freq["A"]+ note_2_freq["D"]+ note_2_freq["G"]+ note_2_freq["B"] , labels = ( [ "E" ] * len(note_2_freq["E"]) + [ "A" ] * len(note_2_freq["A"]) + [ "D" ] * len(note_2_freq["D"]) + [ "G" ] * len(note_2_freq["G"]) + [ "B" ] * len(note_2_freq["B"]) ) ) ax.set_xlim(40, 4000) return ax def data_gen(): stream.start_stream() data = np.frombuffer(stream.read(chunk),dtype=np.int16) stream.stop_stream() yield data i=0 def animate(data): global i i+=1 ax.cla() init_func() # compute FFT parameters f_vec = samp_rate*np.arange(chunk/2)/chunk # frequency vector based on window # size and sample rate mic_low_freq = 50 # low frequency response of the mic low_freq_loc = np.argmin(np.abs(f_vec-mic_low_freq)) fft_data = (np.abs(np.fft.fft(data))[0:int(np.floor(chunk/2))])/chunk fft_data[1:] = 2*fft_data[1:] plt.plot(f_vec,fft_data) max_loc = np.argmax(fft_data[low_freq_loc:])+low_freq_loc # max frequency resolution plt.annotate(r'$\Delta f_{max}$: %2.1f Hz, A = %2.1f Hz' % ( resolution, A), xy=(0.7,0.92), xycoords='figure fraction' ) ax.set_ylim([0,2*np.max(fft_data)]) # annotate peak frequency annot = ax.annotate( 'Freq: %s'%(closest_to(f_vec[max_loc])), xy=(f_vec[max_loc], fft_data[max_loc]),\ xycoords='data', xytext=(0,30), textcoords='offset points', arrowprops=dict(arrowstyle="->"), ha='center',va='bottom') #fig.savefig('full_figure-%04d.png' % i) return ax, ani = animation.FuncAnimation( fig, animate, data_gen, init_func, interval=.15, cache_frame_data=False, repeat=True, blit=False ) plt.show()
Talk Python to Me: #483: Reflex Framework: Frontend, Backend, Pure Python
Django Weblog: 2025 DSF Board Candidates
Thank you to the 21 individuals who have chosen to stand for election. This page contains their candidate statements submitted as part of the 2025 DSF Board Nominations.
Our deepest gratitude goes to our departing board members, Çağıl Uluşahin Sonmez, Chaim Kirby, Katie McLaughlin; for your contributions and commitment to the Django community ❤️
Those eligible to vote in this election will receive information on how to vote shortly. Please check for an email with the subject line “2025 DSF Board Voting”. Voting will be open until 23:59 on November 15, 2024 Anywhere on Earth.
Any questions? Reach out via email to foundation@djangoproject.com.
All candidate statements ¶To make it simpler to review all statements, here they are as a list of links. Voters: please take a moment to read all statements before voting! It will take some effort to rank all candidates on the ballot. We believe in you.
- Abigail Gbadago — Accra, Ghana
- Alex Gómez — Barcelona, Spain
- Amir Tarighat — New York
- Ariane Djeupang Jocelyne — Yaounde, Cameroon
- Bhuvnesh Sharma — India
- Chris Achinga — Mombasa, kenya
- Cory Zue — Cape Town, South Africa
- David Vaz — Porto, Portugal
- Gabriel Arias Romero — Mexico
- Jeff Triplett — Lawrence, KS USA
- Julius Nana Acheampong Boakye — Accra Ghana
- Keanya Phelps — Chicago IL US
- Kevin Renskers — The Netherlands
- Kátia Yoshime Nakamura — Berlin, Germany
- Lilian — United States
- Marcelo Elizeche Landó — Paraguay
- Paolo Melchiorre — Italy
- Patryk Bratkowski — Patryk Bratkowski
- Priya Pahwa — India, Asia
- Tom Carrick — Amsterdam, Netherlands
- Vitaly Semotiuck — Rzeszow, Poland
Hi,
I am Abigail(Afi), a DSF member who has contributed to the Django Ecosystem for about four years. I have held the following positions in the community:
- Leadership council member for Black Python Devs (current)
- Open Source Program Manager for Black Python Devs - I am managing 39 of our community members make their first steps in open source (current)
- Programs Team member for DjangoCon US 2024
- Contributed in organizing Django Girls Zanzibar (2023) ahead of the first DjangoCon Africa, co-organiser of Django Girls in Kwahu-Ghana (2019), and coach at Django Girls Ho-Ghana; 2018, 2024 and Zanzibar (2023)
- DjangoCon US Speaker 2023, you can watch my talk here: Strategies for Handling Conflicts and Rollbacks with Django
I have extensive experiences with the community, which have contributed to my growth, and I believe serving on the board is a good way to give back. As such, I am positive that I would bring a refreshing perspective to the board and be a good match for community integration with Django.
As a board member, I plan to increase interactions between the DSF and its user base by providing an official mailing list highlighting non-technical and technical updates that will keep Django users up-to-date with current developments and build a relationship with our user base. Through this, I aim to gather djangonauts from everywhere to support creating the next leaders of the Django community.
In addition, I would like to use my experience in fostering Strategic Partnerships and Fundraising in the nonprofit space to help the DSF Fundraising WG find more sponsors for the DSF. While working with a community, I fostered vital partnerships with about 10 organisations, which contributed to reaching our Fundraising and Partnerships goal despite most organisations slashing nonprofit donations.
As such, I believe those skills, coupled with my community experience, will contribute to the growth of the Django Community, especially when we attract sponsors and increase their efforts and visibility on our social media.
Alex Gómez Barcelona, Spain ¶View personal statementI began developing with Django at version 1.11 and have been an avid user since. I am a member of Djangonaut Space and was previously a Djangonaut in the program. I’m also an active member of Python Spain and Python Barcelona and have coached at multiple DjangoGirls workshops.
I believe the next few years will be crucial for Django's future. It’s important for us to remain relevant and ensure that Django continues to be a choice for new projects, not just for maintaining existing ones.
The DSF needs an executive director, we’ve reached the limit of what a volunteer board can do or be asked to do. This is my first and main priority for 2025 and I believe without such a change we will struggle to meaningfully advance.
An obstacle to enacting an executive director is the need to expand the foundation's funding and pool of sponsors, and I propose that one of our most effective ways to achieve this is by expanding our communications. Too little of the Django user base is reached by the DSF and other non-official Django communications, leaving a wide userbase who may be very willing to support the project but do not know they can.
In support of these goals, I will also make the website a priority. We’re years into attempting to revamp it, the last successful attempt being a decade ago. The website working group is not yet finalized, an executive director will help us push this forward.
The DSF needs fresh perspectives, and with your support I believe I will bring positive changes to the Django community.
Amir Tarighat New York ¶View personal statementHi DSF board members! My name is Amir Tarighat and I’m a software engineer and long time user of Django. I think since version ~1.8. I live in NYC.
I’m 3x VC backed founder and an active investor, currently I am the CEO of Agency which is a Y Combinator backed company.
I’m an expert in cybersecurity and compliance, and have served on several boards including one non-profit and an elected neighborhood council position.
I would love to serve the Django community and help grow its use by helping with fundraising, community events and sponsorships, and with anything security or compliance related. I’d also love to help with anything startup related.
Ariane Djeupang Jocelyne Yaounde, Cameroon ¶View personal statementI am Ariane Djeupang, a junior project manager, Community builder and freelance Machine Learning Engineer from Cameroon.
As a young Black African woman in STEM from the francophone region of Africa and an active DSF member, I’ve dedicated my career to fostering inclusivity and representation in the tech community and I am confident that I bring a unique perspective to the table. My extensive experience organizing major events like:
- DjangoCon US 2024,
- DjangoCon Africa 2023, and
- PyCon Africa 2020 (as a volunteer) | 2024 (as an IOC member ) has equipped me with the skills and insights needed to drive inclusivity and community engagement.
My journey has been fueled by a passion for diversity and representation. I have seen firsthand the incredible impact that inclusive environments can have on underrepresented communities, especially in Africa, and I am dedicated to amplifying these voices within the Django ecosystem. As a mentor in the both the Python and the Django Community, as well as a mentor and community manager at BEL'S AI Initiative in Cameroon, I have empowered many young technologists, fostering a supportive and inclusive community.
I aim to bridge the gap between the DSF Board and our vibrant African community, ensuring that our unique perspectives and needs are heard and addressed. I am committed to being the voice of Africa within the board and representing the board within my community. By voting for me, you are supporting a vision of inclusivity, innovation, and growth for the Django community.
To achieve this, I plan to:
- Launch official DSF multilingual mentoring programs, targeted at underrepresented groups from Africa, with plans to expand globally.
- Introduce the Django Diversity Incubator, offering resources, workshops, scholarships, and global hackathons to underrepresented groups around the world.
- Create a Django Open Source Fellows interns role, to welcome new people into code and non-code contributions.
Thank you for your consideration.
Bhuvnesh Sharma India ¶View personal statementHi everyone! I'm excited to throw my hat in the ring for the DSF Board of Directors.
To me, there appears to be a critical component that could benefit from increased attention: social media and marketing. And I believe It's time we start giving Django the social media attention it deserves.
Let's be real: If we master this social media game, Django's reach will explode, and the entire ecosystem will thrive.
The more we boost Django’s presence online, the more up-and-coming developers will flock to it. And with that surge in usage comes the rise of Django-focused communities—stronger, more engaged, and constantly growing.
Now, here’s where it gets exciting: more visibility leads to a snowball effect.
- Visibility drives growth: More eyes on Django → more users → more contributors
- Quality fuels adoption: More contributors → better Django → increased commercial usage
- Success attracts support: Increased usage → more sponsors → resources for further expansion
Then guess what? We loop back to the start: Django gets bigger, stronger, and better.
Here are few-of-many pointers that I am aiming to start with during my tenure as a board member:
- Boost Django's presence in Asia through targeted outreach and events.
- Launch Django Ambassadors program to recognize influential community members.
- Facilitate non-coding contributions to Django (design, content, event organizing).
- Create a volunteer layer between the DSF and interested individuals who are eager to contribute to specific working groups (WGs).
- Produce engaging social media content similar to Feature Fridays.
I am highly motivated to lead Django’s social media and marketing as a Board member. I have more high-level plans and ideas in mind, and I’m focused on finding the right time and people for their execution. Additionally, I would represent the Asia region and bring valuable diversity in the DSF board. You can read more about my plans in the blog here: Making Django Unstoppable: My Plan to Boost Visibility and Drive Growth
Now talking about myself, I am a django core contributor and have been involved with DSF for around past 1.5 years as a DSF member. I also did Google Summer of Code with Django in 2023 and mentored in Google Summer of Code 2024 with Django. Apart from code contributions I have contributed to Django in various others ways:
- I am Co-Chair at the social media WG at DSF. (all the Feature Fridays posts are created by me :) )
- I was a navigator at Djangonaut Space’s first session.
- I recently started a community called Django India with an aim to popularize Django in India.
Excited for what lies ahead!
Chris Achinga Mombasa, kenya ¶View personal statementMy journey as a software developer has been profoundly shaped by the power of community. From the outset, participating in developer meetups and events, particularly DjangoCon Africa, has not only strengthened my technical skills but also reshaped my understanding of growth—both personal and professional.
Driven by a desire to make a meaningful difference, I am pursuing a position on the Django Software Foundation Board. I bring a commitment to promoting diversity, inclusivity, and accessibility within the Django ecosystem. As a vocal advocate for African and minority communities, I believe my presence on the Board would add a valuable perspective to the DSF’s mission, ensuring that emerging developers from underrepresented backgrounds find opportunities, resources, and community support in Django.
My experience with the Swahilipot Hub Foundation, a Kenyan NGO supporting youth along the coast, has equipped me with essential skills in community engagement and in applying technology for social good. Through this role, I have developed Django-based solutions that empower community self-management—an experience that has reinforced my belief in Django’s potential to uplift communities around the globe. On the DSF Board, I aim to serve not only as a representative for these communities but also as a mentor and technical guide.
Cory Zue Cape Town, South Africa ¶View personal statementI’m running for the board because I love Django, I’ve built my career on it, I want to see it succeed for another 20 years, and I think I can help.
My background is as a Django user and educator. I’ve built several successful products on Django, spoken at multiple DjangoCons and PyCons and have published many popular articles and videos about using Django. I currently run a Django boilerplate product that helps people build apps and start businesses on top of Django. I’m also a member of the DSF and the social media working group.
My platform is relatively simple. I don’t want Django to get left behind. I’ve seen old frameworks like Rails and Laravel continually reinvent themselves, bringing new cohorts of web developers into the fold, while Django has largely stayed the same.
Part of the issue is Django’s reluctance to adopt modern technologies— with better front end being at the top of my list. But I don’t have unrealistic aspirations of adding HTMX, Tailwind, or React to Django, so much as starting the conversation about how the Django ecosystem can have a better story for people who want to use those things.
The other part—and the part I hope to help with more—is cultural. Specifically, getting Django to do a better job at selling itself. This means working harder to pitch and position Django as a great, modern framework for building apps. As well as creating more opportunities and incentives for funding Django.
If elected, I’ll try to be a voice on the board that pushes Django forwards, while understanding that I will often get pushed back. Let’s keep Django great for another 20 years!
David Vaz Porto, Portugal ¶View personal statementI am a software developer with over 20 years of experience and have been passionate about Django since 2007, starting with version 0.96. Over the years, I have not only built my career around Django and Python, but I have also actively contributed to expanding the Django community. My journey has led me to found a consultancy firm focused on these technologies, and I’ve dedicated my efforts to bringing new developers into the community and fostering its growth.
In 2019, during DjangoCon Europe in Copenhagen, I strongly desired to take my community involvement to the next level. I proposed to organize DjangoCon Europe 2020 in Portugal. Though the pandemic reshaped those plans, I co-organized the first virtual-only DjangoCon Europe in 2020, another virtual edition in 2021, and the first hybrid event in 2022. Our 2022 edition set a new record, with over 500 in-person attendees and 200+ online participants. The experience has been gratifying, and I continue to be actively involved in the community by co-organizing DjangoCon Europe 2024 in Vigo, Spain, and preparing for DjangoCon Europe 2025 in Dublin, Ireland.
In addition to my work with Django, I am deeply committed to the growth of the Python community in Portugal. In 2022, I co-founded PyCon Portugal, intending to host the conference in a different city each year to maximize its reach and impact. The first edition in Porto succeeded, followed by Coimbra in 2023, which attracted participants from over 25 countries. By the time of this election, PyCon Portugal 2024 in Braga will have concluded, furthering our goal of uniting and strengthening the Portuguese Python community.
I am enthusiastic, committed, and pragmatic. In every initiative I’ve taken, I strive to make a positive and meaningful impact, influencing and empowering those around me. My experience organizing large-scale events, building communities, and fostering collaboration can be valuable to the Django Software Foundation.
I look forward to contributing my skills and dedication to help guide the DSF’s efforts in the years ahead.
Gabriel Arias Romero Mexico ¶View personal statementsolo soy un fan y me encanta el framework
Jeff Triplett Lawrence, KS USA ¶View personal statementI'm running for the Django Software Foundation board of directors to help serve the community and reshape the board and foundation.
The key to making the DSF more sustainable is the stability that hiring an Executive Director brings. From day-to-day communications to supporting the Django Fellows to improving our ability to fundraise, everything revolves around having someone whose job is to run and support the foundation. I believe an ED will help Django get a seat to more conversations involving open source and web standards that we get passed over today.
I bring over two decades of non-profit experience, including co-founding DEFNA (the other Django non-profit) and serving on the Python Software Foundation, including leadership roles (Treasurer and Vice Chair). I have also helped organize DjangoCon US for over a decade, and we have seen many community members and leaders grow through that community-building experience. I'm an advisor for Black Python Devs and have been a mentor through the Djangonaut Space project.
I want to revise and revisit our sponsorship plans and fundraising goals. They have not changed much over the years despite companies' needs changing significantly. We did this with the PSF, and it increased the number of developers in resident roles (the PSF's version of Fellows) we could fund. It's time for the DSF to revise our plans.
I want to revise our approach to DjangoCons and other "why aren't they called DjangoCon" community events. Why aren't more of these promoted or listed through the Django website?
I firmly believe in the Campsite Rule: "Always leave the campground cleaner than you found it." I feel good about the mark I have left on the Django and Python communities over this past decade, and I am happy to serve the Django community in a more significant role if given the opportunity.
Julius Nana Acheampong Boakye Accra Ghana ¶View personal statementI'm excited to nominate myself for the Django Software Foundation's Board of Directors. With 4 years of experience in the tech industry, I've seen the impact Django can have on a project's success. I've contributed to the community through speaking at conferences, organizers global DjangoCon conference , teaching Django on campuses and am committed to using my skills to help the board make informed decisions.
My goals are to increase diversity and inclusion within the community and improve the overall health and stability of the Django project. If elected, I promise to be an active and engaged member, always putting the needs of the community first.
Thank you for considering my nomination. I'm excited to serve the Django community and contribute to its continued success.
Keanya Phelps Chicago IL US ¶View personal statementI am excited to submit my candidacy for the Django Software Foundation (DSF) board. Having transitioned into software development after a career change, I feel like I bring a unique perspective to the challenges and opportunities within the Django ecosystem. I am deeply passionate about diversity, inclusion, and mentorship,
My journey into tech by way of Django, has been shaped by collaboration, continuous learning, and the support of mentors, which is why I am eager to give back to the Django community. I am particularly enthusiastic about contributing to initiatives that promote diverse voices and create inclusive environments where everyone feels empowered too contribute and to leave things better than how they found them.
In addition to my commitment to diversity, I am driven by a love of running projects, research, and collaboration.
As a member of the DSF board, I would bring fresh ideas, a collaborative spirit, and a dedication to making Django an even more inclusive, forward-thinking community.
Kevin Renskers The Netherlands ¶View personal statementI’ve been using Django since 2009, and apart from blogging about Django for 15 years, I’ve always been mostly on the sidelines. It’s about time to get more involved with the community, share my experience and expertise, offer my time. I’m mainly interested in the enforcement of the Django trademark and code of conduct, ensuring a healthy community.
Kátia Yoshime Nakamura Berlin, Germany ¶View personal statementI am a Software Engineer with over 10 years of experience, working with Django both personally and professionally since 2015. My journey with Django started in 2015 when I attended my first Django Girls event in Brazil. Since then, I’ve built my career around Django, contributing to the community while actively attending, participating in and helping organize Python and Django conferences/events.
In 2018 and 2019, I helped organize PyCon Balkan in Belgrade (Serbia). Since 2016, I've coached and organized Django Girls workshops around the world, including in Rio de Janeiro (Brazil), Budapest (Hungary), Brno (Czechia), Belgrade (Serbia), Porto (Portugal), and Vigo (Spain).
Over the past few years, I've been deeply involved in DjangoCon events, particularly in Europe, where I’ve volunteered and organized Django Girls workshops.
Since 2020, I’ve had the privilege of serving as a board member of the Django Software Foundation (DSF). The pandemic brought us significant challenges, but we've built a resilient team, eager to push Django forward with fresh perspectives and new solutions. I’ve also been involved in the early efforts to shape a long-term plan for future conferences across Europe, focusing on engaging more organizers and selecting host teams earlier - up to two years in advance - for better flexibility and planning. However, there's still more we aim to achieve.
I’d love to keep supporting our Django community as a board member, promoting more diversity and inclusiveness while encouraging collaboration and exciting initiatives.
Lilian United States ¶View personal statementI’m Lilian 👋, a DSF Member, Django ORM contributor, and Djangonaut Space Coordinator.
Lots of talent is locked up in the industry simply due to gatekeeping. Let’s improve processes and tap into this pool of talent, so we can move Django forward in the right direction.
The DSF should do more to facilitate the connection between newcomers and maintainers. Let’s create a space where we provide the support system they need to collaborate productively, for technical teams and working groups alike.
We also need to facilitate bolder decision making. For the framework: sponsored features and fundamentals like async support, JIT, type annotations. For the Foundation: more transparency, an Executive Director, a newsletter.
How can we achieve this?- Coordination with the Steering Council for tech decisions, via a Board Liaison role.
- Gather feedback from program organizers to determine gaps that need support.
- Facilitate collaboration among newcomers and maintainers.
- Better marketing: such as promoting community initiatives.
- Documented playbooks! To scale the Working Groups concept.
Frustrated by the status quo in the industry, and yet inspired by changes happening to Django, I’m motivated to help more people get involved with Django as code contributors and leaders.
Marcelo Elizeche Landó Paraguay ¶View personal statementWhy I’m RunningBefore assisting to DjangoCon US, I saw Django as just part of the larger Python community. But seeing how this community goes above and beyond to support both longtime members and newcomers changed that for me. When others suggested I run for the board, it felt like a way I could give back and share what makes Django special on a global scale.
A Bit About MeI co-founded and organized the Python Paraguay community, starting with our first PyDay in 2015, which was a huge success and sparked a lasting momentum. Since then, I’ve organized meetups, events, workshops, and grown our community to almost two thousand members—the most active tech group in Paraguay! I also used Django for projects that make a difference: AyudaPy.org, a mutual aid platform during COVID-19 (which I presented at DjangoCon US 2022), and Lista Hũ, a tool to protect against scammers, both of which highlight Django’s potential for social good.
Ideas for Django- Learning Curve: Improving the Django tutorial and expanding learning resources can make Django more accessible and less intimidating for newcomers. Creating more comprehensive, step-by-step guides will empower new developers and ease their journey into Django.
- Supporting Global Accessibility: Expanding Django’s reach by focusing on language accessibility and gathering regional feedback is key. Adding questions to the Django Developers Survey on preferred languages and translation quality could help the community prioritize localization efforts, ensuring developers worldwide feel supported in their native languages.
I believe this community is on the right path, and it would be an honor to contribute as a board member
Paolo Melchiorre Italy ¶View personal statementThe Django community is the best one I could be a part of, and since I started using Django, I have seen wonderful initiatives born and thrive within it (e.g., Django Girls+, Djangonaut Space, Django Fellow). We should bring this momentum to other areas as well: fundraising, the website, development sprints, content translations, self-promotion (e.g., release videos), multimedia content (e.g., videos, books, podcasts, photos, …), feedback from Django users, Django's environmental impact.
I think that the Django Software Foundation has the potential to facilitate and promote these initiatives. It also has the authority to relate to other Open Source communities, to seek synergies, and with big corporations, to grow from an economic point of view, being able to pay more people (e.g., Django Fellows, Directors, UI/UX experts, …)
I believe I can give a boost to these initiatives, with my experience in the Django community, and with an original point of view in the Board, as a member of the Italian Python community, and founder of a local community.
Patryk Bratkowski Patryk Bratkowski ¶View personal statementHello, Djangonauts!
If you are one of the regulars on the official Django Discord server, my passion and dedication to both the Django community and framework should be no secret. As a helper, I have helped countless other developers use Django successfully. As a moderator, I do my best to ensure that we have a community that we can all be proud to consider our own, regardless of our background. An environment that is inclusive, diverse, and welcoming. To me, it feels like home, and I hope you all feel the same way.
For those I haven't yet had the pleasure to meet on Discord or elsewhere, I hope we do soon.
About me:
- I have been building on the web since the Geocities days, and have over 17 years of professional experience, meaning I know how to get things done.
- I have experience building and managing communities, including forums and subreddits, meaning I can readily help with the technical and human aspects.
- I am proactive, and lucky enough to have a lot of flexibility in how I spend my time, meaning I can help turn decisions into action.
- I am open-minded, and eager to learn, meaning I am looking forward to working for the community, with the board, rather than wanting to impose my own ideas.
- I am a polyglot speaking more than four languages fluently, meaning I feel connections to others, regardless of geographical borders.
If elected, my goals will be:
- Collaborating with the other board members. Django's popularity and stability is a testament to the fantastic work current and past board members and developers have done, and while I may have my own ideas, I would first want to know more about any backlog, plans, or other issues that need to be resolved rather than bring about drastic changes.
- Efficiently implementing board decisions. While plans may sometimes forcibly change, they at least need someone to take charge of them. I am happy to lend my technical expertise when required, and deal with other roadblocks.
- Community representation. As a fairly visible member of the Django community, I am looking forward to ensuring the community feels represented and heard, and seeing what more we can do to help the community grow.
- Increase representation of non-English speakers. While English is the de facto business language, there are other large markets that would benefit from better support.
As Django nears twenty years of existence, becoming a board member certainly gives us some big shoes to fill, but between my passion, this amazing community's support, and the time I can dedicate to the position, I am confident that I can help the community continue to thrive, make a tangible difference, and better serve the community we all know and love.
Best regards, and best of luck to all the other applicants,
Pat
Priya Pahwa India, Asia ¶View personal statementBalancing code, community, and collaboration, I am actively holding the following position of responsibilities:
- Co-Chair of the Fundraising Working Group at the Django Software Foundation
- Session Organizer of Djangonaut Space
- Software Development Engineer (Django backend and Infra) at a wealth tech startup.
- 2023 SWE intern (Django techstack) under the GitHub Octernships program.
Having had the experience of building inclusive student tech communities and organizing numerous meetups and global hackathons as a GitHub Campus Expert, I can bring fresh perspectives to the DSF Board and bridge the currently huge gap between the student community and the potential Django leadership positions. As a DSF Board of Directors, I will push for initiatives to:
- Build a Django Evangelist Program or a Django Developer Advocacy Working Group
- Introduce a dedicated Django track at student hackathons to increase the framework’s visibility amongst budding developers.
- Establish a robust DEIB (Diversity, Equity, Inclusion, and Belonging) framework in both theory and practice for DSF
- Include subtle subconscious yet impactful details, such as designing the assets of custom merchandise—like stickers—that represent diverse races and backgrounds to ensure everyone feels valued.
- Continue driving fundraising efforts to engage potential corporate sponsors with a structured funding roadmap and prospectus that aligns with our community needs.
- Develop a one-stop-solution DSF community handbook - an easily accessible guide for newcomers
I’m dedicated to bringing the voice of the Asian Indian community to the DSF Board. The lack of DjangoCons and a strong local Django network in this region limits talented individuals from essential growth opportunities. I aim to foster a sense of belonging at the table, expand rewards in exchange for volunteering, and ensure the Django community thrives everywhere, especially in underserved areas with psychological safety and welcoming ways for one and all.
Tom Carrick Amsterdam, Netherlands ¶View personal statementHello! For those that don't know me, I've also been actively contributing features for most of the last decade. I help run the Discord, the accessibility team, and I'm on the fundraising working group. If that sounds like a lot of time commitment already, you're right. If you vote for me I might have to become dormant in some other roles.
But I don't really want to talk about my perspective as a contributor, I want to talk about my experience as a user. I've been using Django since around 2008. We have great batteries for 2008. For 2024? I am not so sure. I feel like we are missing things like:
- Built in 2FA with WebAuthn / passkeys.
- Better serialization to make APIs without needing a second framework.
- A better frontend story, whether that's tutorials on integrating frameworks or how to use simpler solutions like HTMX, template components (or all of the above).
- A more modern, accessible admin interface with better UX.
- Simpler project setup for small projects, including deployment and static files (integrating white noise?).
- (type hints maybe?)
- I could increase the size of this Wishlist by several factors and still not be done.
The reason I believe we're missing these things is simple (and possibly wrong). Django is getting bigger, more mature, and prioritises stability. These are all great things, but they do slow down development when almost all new features are contributed by people volunteering their time.
To fix this, Django needs money, which is why I joined the fundraising group, and then there is the question of spending that money. And for the me the priorities are clear:
- Spend money to make more money.
- Hire more fellows and widen their remit to contributing new features.
And that's my "manifesto", if you like.
Vitaly Semotiuck Rzeszow, Poland ¶View personal statementhttps://www.linkedin.com/in/vitaly-sem/
Your move nowThat’s it, you’ve read it all 🌈! Be sure to vote if you’re eligible, by using the link shared over email. To support the future of Django, donate to the Django Software Foundation on our website or via GitHub Sponsors.
Paolo Melchiorre: 2025 Django Software Foundation board nomination
My self-nomination statement for the 2025 Django Software Foundation (DSF) board of directors elections
Trey Hunner: Adding keyboard shortcuts to the Python REPL
I talked about the new Python 3.13 REPL a few months ago and after 3.13 was released. I think it’s awesome.
I’d like to share a secret feature within the Python 3.13 REPL which I’ve been finding useful recently: adding custom keyboard shortcuts.
This feature involves a PYTHONSTARTUP file, use of an unsupported Python module, and dynamically evaluating code.
In short, we may be getting ourselves into trouble. But the result is very neat!
Thanks to Łukasz Llanga for inspiring this post via his excellent EuroPython keynote talk.
The goal: keyboard shortcuts in the REPLFirst, I’d like to explain the end result.
Let’s say I’m in the Python REPL on my machine and I’ve typed numbers =:
1 >>> numbers =I can now hit Ctrl-N to enter a list of numbers I often use while teaching (Lucas numbers):
1 numbers = [2, 1, 3, 4, 7, 11, 18, 29]That saved me some typing!
Getting a prototype workingFirst, let’s try out an example command.
Copy-paste this into your Python 3.13 REPL:
1 2 3 4 5 6 7 8 9 10 11 from _pyrepl.simple_interact import _get_reader from _pyrepl.commands import Command class Lucas(Command): def do(self): self.reader.insert("[2, 1, 3, 4, 7, 11, 18, 29]") reader = _get_reader() reader.commands["lucas"] = Lucas reader.bind(r"\C-n", "lucas")Now hit Ctrl-N.
If all worked as planned, you should see that list of numbers entered into the REPL.
Cool! Now let’s generalize this trick and make Python run our code whenever it starts.
But first… a disclaimer.
Here be dragons 🐉Notice that _ prefix in the _pyrepl module that we’re importing from? That means this module is officially unsupported.
The _pyrepl module is an implementation detail and its implementation may change at any time in future Python versions.
In other words: _pyrepl is designed to be used by Python’s standard library modules and not anyone else. That means that we should assume this code will break in a future Python version.
Will that stop us from playing with this module for the fun of it?
It won’t.
Creating a PYTHONSTARTUP fileSo we’ve made one custom key combination for ourselves. How can we setup this command automatically whenever the Python REPL starts?
We need a PYTHONSTARTUP file.
When Python launches, if it sees a PYTHONSTARTUP environment variable it will treat that environment variable as a Python file to run on startup.
I’ve made a /home/trey/.python_startup.py file and I’ve set this environment variable in my shell’s configuration file (~/.zshrc):
1 export PYTHONSTARTUP=$HOME/.python_startup.pyTo start, we could put our single custom command in this file:
1 2 3 4 5 6 7 8 9 10 11 12 13 try: from _pyrepl.simple_interact import _get_reader from _pyrepl.commands import Command except ImportError: pass # Not in the new pyrepl OR _pyrepl implementation changed else: class Lucas(Command): def do(self): self.reader.insert("[2, 1, 3, 4, 7, 11, 18, 29]") reader = _get_reader() reader.commands["lucas"] = Lucas reader.bind(r"\C-n", "lucas")Note that I’ve stuck our code in a try-except block. Our code only runs if those _pyrepl imports succeed.
Note that this might still raise an exception when Python starts if the reader object’s command attribute or bind method change in a way that breaks our code.
Personally, I’d like to see those breaking changes occur print out a traceback the next time I upgrade Python. So I’m going to leave those last few lines without their own catch-all exception handler.
Generalizing the codeHere’s a PYTHONSTARTUP file with a more generalized solution:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 try: from _pyrepl.simple_interact import _get_reader from _pyrepl.commands import Command except ImportError: pass else: # Hack the new Python 3.13 REPL! cmds = { r"\C-n": "[2, 1, 3, 4, 7, 11, 18, 29]", r"\C-f": '["apples", "oranges", "bananas", "strawberries", "pears"]', } from textwrap import dedent reader = _get_reader() for n, (key, text) in enumerate(cmds.items(), start=1): name = f"CustomCommand{n}" exec(dedent(f""" class _cmds: class {name}(Command): def do(self): self.reader.insert({text!r}) reader.commands[{name!r}] = {name} reader.bind({key!r}, {name!r}) """)) # Clean up all the new variables del _get_reader, Command, dedent, reader, cmds, text, key, name, _cmds, nThis version uses a dictionary to map keyboard shortcuts to the text they should insert.
Note that we’re repeatedly building up a string of Command subclasses for each shortcut, using exec to execute the code for that custom Command subclass, and then binding the keyboard shortcut to that new command class.
At the end we then delete all the variables we’ve made so our REPL will start the clean global environment we normally expect it to have:
1 2 3 4 Python 3.13.0 (main, Oct 8 2024, 10:37:56) [GCC 11.4.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> dir() ['__annotations__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__']Is this messy?
Yes.
Is that a needless use of a dictionary that could have been a list of 2-item tuples instead?
Yes.
Does this work?
Yes.
Doing more interesting and risky stuffNote that there are many keyboard shortcuts that may cause weird behaviors if you bind them.
For example, if you bind Ctrl-i, your binding may trigger every time you try to indent. And if you try to bind Ctrl-m, your binding may be ignored because this is equivalent to hitting the Enter key.
So be sure to test your REPL carefully after each new binding you try to invent.
If you want to do something more interesting, you could poke around in the _pyrepl package to see what existing code you can use/abuse.
For example, here’s a very hacky way of making a binding to Ctrl-x followed by Ctrl-r to make this import subprocess, type in a subprocess.run line, and move your cursor between the empty string within the run call:
1 2 3 4 5 6 7 8 9 10 11 12 class _cmds: class Run(Command): def do(self): from _pyrepl.commands import backward_kill_word, left backward_kill_word(self.reader, self.event_name, self.event).do() self.reader.insert("import subprocess\n") code = 'subprocess.run("", shell=True)' self.reader.insert(code) for _ in range(len(code) - code.index('""') - 1): left(self.reader, self.event_name, self.event).do() reader.commands["subprocess_run"] = _cmds.Run reader.bind(r"\C-x\C-r", "subprocess_run") What keyboard shortcuts are available?As you play with customizing keyboard shortcuts, you’ll likely notice that many key combinations result in strange and undesirable behavior when overridden.
For example, overriding Ctrl-J will also override the Enter key… at least it does in my terminal.
I’ll list the key combinations that seem unproblematic on my setup with Gnome Terminal in Ubuntu Linux.
Here are Control key shortcuts that seem to be complete unused in the Python REPL:
- Ctrl-N
- Ctrl-O
- Ctrl-P
- Ctrl-Q
- Ctrl-S
- Ctrl-V
Note that overriding Ctrl-H is often an alternative to the backspace key
Here are Alt/Meta key shortcuts that appear unused on my machine:
- Alt-A
- Alt-E
- Alt-G
- Alt-H
- Alt-I
- Alt-J
- Alt-K
- Alt-M
- Alt-N
- Alt-O
- Alt-P
- Alt-Q
- Alt-S
- Alt-V
- Alt-W
- Alt-X
- Alt-Z
You can add an Alt shortcut by using \M (for “meta”). So r"\M-a" would capture Alt-A just as r"\C-a" would capture Ctrl-A.
Here are keyboard shortcuts that can be customized but you might want to consider whether the current default behavior is worth losing:
- Alt-B: backward word (same as Ctrl-Left)
- Alt-C: capitalize word (does nothing on my machine…)
- Alt-D: kill word (delete to end of word)
- Alt-F: forward word (same as Ctrl-Right)
- Alt-L: downcase word (does nothing on my machine…)
- Alt-U: upcase word (does nothing on my machine…)
- Alt-Y: yank pop
- Ctrl-A: beginning of line (like the Home key)
- Ctrl-B: left (like the Left key)
- Ctrl-E: end of line (like the End key)
- Ctrl-F: right (like the Right key)
- Ctrl-G: cancel
- Ctrl-H: backspace (same as the Backspace key)
- Ctrl-K: kill line (delete to end of line)
- Ctrl-T: transpose characters
- Ctrl-U: line discard (delete to beginning of line)
- Ctrl-W: word discard (delete to beginning of word)
- Ctrl-Y: yank
- Alt-R: restore history (within history mode)
Find something fun while playing with the _pyrepl package’s inner-workings?
I’d love to hear about it! Comment below to share what you found.
Real Python: Beautiful Soup: Build a Web Scraper With Python
Web scraping is the automated process of extracting data from the internet. The Python libraries Requests and Beautiful Soup are powerful tools for the job. To effectively harvest the vast amount of data available online for your research, projects, or personal interests, you’ll need to become skilled at web scraping.
In this tutorial, you’ll learn how to:
- Inspect the HTML structure of your target site with your browser’s developer tools
- Decipher data encoded in URLs
- Use Requests and Beautiful Soup for scraping and parsing data from the internet
- Step through a web scraping pipeline from start to finish
- Build a script that fetches job offers from websites and displays relevant information in your console
If you like learning with hands-on examples and have a basic understanding of Python and HTML, then this tutorial is for you! Working through this project will give you the knowledge and tools you need to scrape any static website out there on the World Wide Web. You can download the project source code by clicking on the link below:
Get Your Code: Click here to download the free sample code that you’ll use to learn about web scraping in Python.
Take the Quiz: Test your knowledge with our interactive “Beautiful Soup: Build a Web Scraper With Python” quiz. You’ll receive a score upon completion to help you track your learning progress:
Interactive Quiz
Beautiful Soup: Build a Web Scraper With PythonIn this quiz, you'll test your understanding of web scraping using Python. By working through this quiz, you'll revisit how to inspect the HTML structure of a target site, decipher data encoded in URLs, and use Requests and Beautiful Soup for scraping and parsing data from the Web.
What Is Web Scraping?Web scraping is the process of gathering information from the internet. Even copying and pasting the lyrics of your favorite song can be considered a form of web scraping! However, the term “web scraping” usually refers to a process that involves automation. While some websites don’t like it when automatic scrapers gather their data, which can lead to legal issues, others don’t mind it.
If you’re scraping a page respectfully for educational purposes, then you’re unlikely to have any problems. Still, it’s a good idea to do some research on your own to make sure you’re not violating any Terms of Service before you start a large-scale web scraping project.
Reasons for Automated Web ScrapingSay that you like to surf—both in the ocean and online—and you’re looking for employment. It’s clear that you’re not interested in just any job. With a surfer’s mindset, you’re waiting for the perfect opportunity to roll your way!
You know about a job site that offers precisely the kinds of jobs you want. Unfortunately, a new position only pops up once in a blue moon, and the site doesn’t provide an email notification service. You consider checking up on it every day, but that doesn’t sound like the most fun and productive way to spend your time. You’d rather be outside surfing real-life waves!
Thankfully, Python offers a way to apply your surfer’s mindset. Instead of having to check the job site every day, you can use Python to help automate the repetitive parts of your job search. With automated web scraping, you can write the code once, and it’ll get the information that you need many times and from many pages.
Note: In contrast, when you try to get information manually, you might spend a lot of time clicking, scrolling, and searching, especially if you need large amounts of data from websites that are regularly updated with new content. Manual web scraping can take a lot of time and be highly repetitive and error-prone.
There’s so much information on the internet, with new information constantly being added. You’ll probably be interested in some of that data, and much of it is out there for the taking. Whether you’re actually on the job hunt or just want to automatically download all the lyrics of your favorite artist, automated web scraping can help you accomplish your goals.
Challenges of Web ScrapingThe internet has grown organically out of many sources. It combines many different technologies, styles, and personalities, and it continues to grow every day. In other words, the internet is a hot mess! Because of this, you’ll run into some challenges when scraping the web:
-
Variety: Every website is different. While you’ll encounter general structures that repeat themselves, each website is unique and will need personal treatment if you want to extract the relevant information.
-
Durability: Websites constantly change. Say you’ve built a shiny new web scraper that automatically cherry-picks what you want from your resource of interest. The first time you run your script, it works flawlessly. But when you run the same script a while later, you run into a discouraging and lengthy stack of tracebacks!
Unstable scripts are a realistic scenario because many websites are in active development. If a site’s structure changes, then your scraper might not be able to navigate the sitemap correctly or find the relevant information. The good news is that changes to websites are often small and incremental, so you’ll likely be able to update your scraper with minimal adjustments.
Still, keep in mind that the internet is dynamic and keeps on changing. Therefore, the scrapers you build will probably require maintenance. You can set up continuous integration to run scraping tests periodically to ensure that your main script doesn’t break without your knowledge.
An Alternative to Web Scraping: APIsSome website providers offer application programming interfaces (APIs) that allow you to access their data in a predefined manner. With APIs, you can avoid parsing HTML. Instead, you can access the data directly using formats like JSON and XML. HTML is primarily a way to visually present content to users.
When you use an API, the data collection process is generally more stable than it is through web scraping. That’s because developers create APIs to be consumed by programs rather than by human eyes.
The front-end presentation of a site might change often, but a change in the website’s design doesn’t affect its API structure. The structure of an API is usually more permanent, which means it’s a more reliable source of the site’s data.
However, APIs can change as well. The challenges of both variety and durability apply to APIs just as they do to websites. Additionally, it’s much harder to inspect the structure of an API by yourself if the provided documentation lacks quality.
Read the full article at https://realpython.com/beautiful-soup-web-scraper-python/ »[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
Real Python: Quiz: Beautiful Soup: Build a Web Scraper With Python
In this quiz, you’ll test your understanding of web scraping with Python, Requests, and Beautiful Soup.
By working through this quiz, you’ll revisit how to inspect the HTML structure of your target site with your browser’s developer tools, decipher data encoded in URLs, use Requests and Beautiful Soup for scraping and parsing data from the Web, and gain an understanding of what a web scraping pipeline looks like.
[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
Python Bytes: #407 Back to the future, destination 3.14
Zato Blog: Salesforce API integrations and connected apps
This instalment in a series of articles about API integrations with Salesforce covers connected apps - how to create them and how to obtain their credentials needed to exchange REST messages with Salesforce.
In Salesforce's terminology, a connected app is, essentially, an API client. It has credentials, a set of permissions, and it works on behalf of a user in an automated manner.
In particular, the kind of a connected app that I am going to create below is one that can be used in backend, server-side integrations that operate without any direct input from end users or administrators, i.e. the app is created once, its permissions and credentials are set once, and then it is able to work uninterrupted in the background, on server side.
Server-side systems are quite unlike other kinds of apps, such as mobile ones, that assume there is a human operator involved - they have their own work characteristics, related yet different, and I am not going to cover them here.
Note that permission types and their scopes are a separate, broad subject and they will described in a separate how-to article.
Finally, I assume that you are either an administrator in a Salesforce organization or that you are preparing information for another person with similar grants in Salesforce.
Conceptually, there is nothing particularly unusual about Salesforce connected apps, it is just its own mini-world of jargon and, at the end of the day, it simply enables you to invoke APIs that Salesforce is built on. It is just that knowing where to click, what to choose and how to navigate the user interface can be a daunting challenge that this article hopes to make easier to overcome.
The stepsFor an automated, server-side connected app to make use of Salesforce APIs, the requirements are:
- Having access to username/password credentials
- Creating a connected app
- Granting permissions to the app (not covered in this article)
- Obtaining a customer key and customer secret for the app
You will note that there are four credentials in total:
- Username
- Password
- Customer key
- Customer secret
Also, depending on what chapter of the Salesforce documentation you are reading, you will note that the customer key can be also known as "client_id" whereas another name for the customer secret is "client_secret". These two pairs mean the same.
Access to username/password credentialsFor starters, you need to have an account in Salesforce, a combination of username + password that you can log in with and on whose behalf the connected app will be created:
Creating a connected appOnce you are logged in, go to Setup in the top right-hand corner:
In the search box, look up "app manager":
Next, click the "New Connected App" button to the right:
Fill out the basic details such as "Connect App Name" and make sure that you select "Enable OAuth Settings". Then, given that in this document we are not dealing with the subject of permissions at all, grant full access to the connected app and finally click "Save" at the bottom of the page.
Obtaining a customer key and customer secretWe have a connected app but we still do not know what its customer key and secret are. To reveal it, go to the "App Manager" once more, either via the search box or using the menu on the left hand side.
Find your app in the list and click "View" in the list of actions. Observe that it is "View", not "Edit" or "Manage", where you can check what the credentials are:
The customer key and secret van be now revealed in the "API (Enable OAuth Settings)" section:
This concludes the process - you have a connected app and all the credentials needed now.
TestingSeeing as this document is part of a series of how-tos in the context of Zato, if you would like to integrate with Salesforce in Python, at this point you will be able to follow the steps in another where everything is detailed separately.
Just as a quick teaser, it would look akin to the below.
... # Salesforce REST API endpoint to invoke path = '/sobjects/Campaign/' # Build the request to Salesforce based on what we received request = { 'Name': input.name, 'Segment__c': input.segment, } # Create a reference to our connection definition .. salesforce = self.cloud.salesforce['My Salesforce Connection'] # .. obtain a client to Salesforce .. with salesforce.conn.client() as client: # type: SalesforceClient # .. create the campaign now. response = client.post(path, request) ...On a much lower level, however, if you would just like to quickly test out whether you configured the connected app correctly, you can invoke from command line a Salesforce REST endpoint that will return an OAuth token, as below.
Note that, as I mentioned it previously, client_id is the same as customer key and client_secret is the same as customer secret.
curl https://example.my.salesforce.com/services/oauth2/token \ -H "X-PrettyPrint: 1" \ --header 'Content-Type: application/x-www-form-urlencoded' \ --data-urlencode 'grant_type=password' \ --data-urlencode 'username=hello@example.com' \ --data-urlencode 'password=my.password' \ --data-urlencode 'client_id=my.customer.key' \ --data-urlencode 'client_secret=my.client.secret'The result will be, for instance:
{ "access_token" : "008e0000000PTzLPb!4Vzm91PeIWJo.IbPzoEZf2ygEM.6cavCt0YwAGSM", "instance_url" : "https://example.my.salesforce.com", "id" : "https://login.salesforce.com/id/008e0000000PTzLPb/0081fSUkuxPDrir000j1", "token_type" : "Bearer", "issued_at" : "1649064143961", "signature" : "dwb6rwNIzl76kZq8lQswsTyjW2uwvTnh=" }Above, we have an OAuth bearer token on output - this can be used in subsequent, business REST calls to Salesforce but how to do it exactly in practice is left for another article.
Next steps:➤ Read about how to use Python to build and integrate enterprise APIs that your tests will cover
➤ Python API integration tutorial
➤ Python Integration platform as a Service (iPaaS)
➤ What is an Enterprise Service Bus (ESB)? What is SOA?
Real Python: Quiz: How to Reset a pandas DataFrame Index
In this quiz, you’ll test your understanding of how to reset a pandas DataFrame index.
By working through the questions, you’ll review your knowledge of indexing and also expand on what you learned in the tutorial.
You’ll need to do some research outside of the tutorial to answer all the questions. Embrace this challenge and let it take you on a learning journey.
[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
Real Python: The Real Python Podcast – Episode #225: Python Getting Faster and Leaner & Ideas for Django Projects
What changes are happening under the hood in the latest versions of Python? How are these updates laying the groundwork for a faster Python in the coming years? Christopher Trudeau is back on the show this week, bringing another batch of PyCoder's Weekly articles and projects.
[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
Real Python: Quiz: The Python Standard REPL: Try Out Code and Ideas Quickly
In this quiz, you’ll test your understanding of The Python Standard REPL: Try Out Code and Ideas Quickly.
The Python REPL allows you to run Python code interactively, which is useful for testing new ideas, exploring libraries, refactoring and debugging code, and trying out examples.
[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
Python Software Foundation: Announcing Python Software Foundation Fellow Members for Q2 2024! 🎉
The PSF is pleased to announce its second batch of PSF Fellows for 2024! Let us welcome the new PSF Fellows for Q2! The following people continue to do amazing things for the Python community:
Leonard Richardson
Winnie Ke
Thank you for your continued contributions. We have added you to our Fellow roster.
The above members help support the Python ecosystem by being phenomenal leaders, sustaining the growth of the Python scientific community, maintaining virtual Python communities, maintaining Python libraries, creating educational material, organizing Python events and conferences, starting Python communities in local regions, and overall being great mentors in our community. Each of them continues to help make Python more accessible around the world. To learn more about the new Fellow members, check out their links above.
Let's continue recognizing Pythonistas all over the world for their impact on our community. The criteria for Fellow members is available online: https://www.python.org/psf/fellows/. If you would like to nominate someone to be a PSF Fellow, please send a description of their Python accomplishments and their email address to psf-fellow at python.org. Quarter 3 nominations are currently in review. We are accepting nominations for Quarter 4 through November 20th, 2024.
Are you a PSF Fellow and want to help the Work Group review nominations? Contact us at psf-fellow at python.org.
Talk Python to Me: #482: Pre-commit Hooks for Python Devs
PyPy: A DSL for Peephole Transformation Rules of Integer Operations in the PyPy JIT
As is probably apparent from the sequence of blog posts about the topic in the last year, I have been thinking about and working on integer optimizations in the JIT compiler a lot. This work was mainly motivated by Pydrofoil, where integer operations matter a lot more than for your typical Python program.
In this post I'll describe my most recent change, which is a new small domain specific language that I implemented to specify peephole optimizations on integer operations in the JIT. It uses pattern matching to specify how (sequences of) integer operations should be simplified and optimized. The rules are then compiled to RPython code that then becomes part of the JIT's optimization passes.
To make it less likely to introduce incorrect optimizations into the JIT, the rules are automatically proven correct with Z3 as part of the build process (for a more hands-on intro to how that works you can look at the knownbits post). In this blog post I want to motivate why I introduced the DSL and give an introduction to how it works.
MotivationThis summer, after I wrote my scripts to mine JIT traces for missed optimization opportunities, I started implementing a few of the integer peephole rewrite that the script identified. Unfortunately, doing so led to the problem that the way we express these rewrites up to now is very imperative and verbose. Here's a snippet of RPython code that shows some rewrites for integer multiplication (look at the comments to see what the different parts actually do). You don't need to understand the code in detail, but basically it's in very imperative style and there's quite a lot of boilerplate.
def optimize_INT_MUL(self, op): arg0 = get_box_replacement(op.getarg(0)) b0 = self.getintbound(arg0) arg1 = get_box_replacement(op.getarg(1)) b1 = self.getintbound(arg1) if b0.known_eq_const(1): # 1 * x == x self.make_equal_to(op, arg1) elif b1.known_eq_const(1): # x * 1 == x self.make_equal_to(op, arg0) elif b0.known_eq_const(0) or b1.known_eq_const(0): # 0 * x == x * 0 == 0 self.make_constant_int(op, 0) else: for lhs, rhs in [(arg0, arg1), (arg1, arg0)]: lh_info = self.getintbound(lhs) if lh_info.is_constant(): x = lh_info.get_constant_int() if x & (x - 1) == 0: # x * (2 ** c) == x << c new_rhs = ConstInt(highest_bit(lh_info.get_constant_int())) op = self.replace_op_with(op, rop.INT_LSHIFT, args=[rhs, new_rhs]) self.optimizer.send_extra_operation(op) return elif x == -1: # x * -1 == -x op = self.replace_op_with(op, rop.INT_NEG, args=[rhs]) self.optimizer.send_extra_operation(op) return else: # x * (1 << y) == x << y shiftop = self.optimizer.as_operation(get_box_replacement(lhs), rop.INT_LSHIFT) if shiftop is None: continue if not shiftop.getarg(0).is_constant() or shiftop.getarg(0).getint() != 1: continue shiftvar = get_box_replacement(shiftop.getarg(1)) shiftbound = self.getintbound(shiftvar) if shiftbound.known_nonnegative() and shiftbound.known_lt_const(LONG_BIT): op = self.replace_op_with( op, rop.INT_LSHIFT, args=[rhs, shiftvar]) self.optimizer.send_extra_operation(op) return return self.emit(op)Adding more rules to these functions is very tedious and gets super confusing when the functions get bigger. In addition I am always worried about making mistakes when writing this kind of code, and there is no feedback at all about which of these rules are actually applied a lot in real programs.
Therefore I decided to write a small domain specific language with the goal of expressing these rules in a more declarative way. In the rest of the post I'll describe the DSL (most of that description is adapted from the documentation about it that I wrote).
The Peephole Rule DSL Simple transformation rulesThe rules in the DSL specify how integer operation can be transformed into cheaper other integer operations. A rule always consists of a name, a pattern, and a target. Here's a simple rule:
add_zero: int_add(x, 0) => xThe name of the rule is add_zero. It matches operations in the trace of the form int_add(x, 0), where x will match anything and 0 will match only the constant zero. After the => arrow is the target of the rewrite, i.e. what the operation is rewritten to, in this case x.
The rule language has a list of which of the operations are commutative, so add_zero will also optimize int_add(0, x) to x.
Variables in the pattern can repeat:
sub_x_x: int_sub(x, x) => 0This rule matches against int_sub operations where the two arguments are the same (either the same box, or the same constant).
Here's a rule with a more complicated pattern:
sub_add: int_sub(int_add(x, y), y) => xThis pattern matches int_sub operations, where the first argument was produced by an int_add operation. In addition, one of the arguments of the addition has to be the same as the second argument of the subtraction.
The constants MININT, MAXINT and LONG_BIT (which is either 32 or 64, depending on which platform the JIT is built for) can be used in rules, they behave like writing numbers but allow bit-width-independent formulations:
is_true_and_minint: int_is_true(int_and(x, MININT)) => int_lt(x, 0)It is also possible to have a pattern where some arguments needs to be a constant, without specifying which constant. Those patterns look like this:
sub_add_consts: int_sub(int_add(x, C1), C2) # incomplete # more goes here => int_sub(x, C)Variables in the pattern that start with a C match against constants only. However, in this current form the rule is incomplete, because the variable C that is being used in the target operation is not defined anywhere. We will see how to compute it in the next section.
Computing constants and other intermediate resultsSometimes it is necessary to compute intermediate results that are used in the target operation. To do that, there can be extra assignments between the rule head and the rule target.:
sub_add_consts: int_sub(int_add(x, C1), C2) # incomplete C = C1 + C1 => int_sub(x, C)The right hand side of such an assignment is a subset of Python syntax, supporting arithmetic using +, -, *, and certain helper functions. However, the syntax allows you to be explicit about unsignedness for some operations. E.g. >>u exists for unsigned right shifts (and I plan to add >u, >=u, <u, <=u for comparisons).
Here's an example of a rule that uses >>u:
urshift_lshift_x_c_c: uint_rshift(int_lshift(x, C), C) mask = (-1 << C) >>u C => int_and(x, mask) ChecksSome rewrites are only true under certain conditions. For example, int_eq(x, 1) can be rewritten to x, if x is known to store a boolean value. This can be expressed with checks:
eq_one: int_eq(x, 1) check x.is_bool() => xA check is followed by a boolean expression. The variables from the pattern can be used as IntBound instances in checks (and also in assignments) to find out what the abstract interpretation of the JIT knows about the value of a trace variable (IntBound is the name of the abstract domain that the JIT uses for integers, despite the fact that it also stores knownbits information nowadays).
Here's another example:
mul_lshift: int_mul(x, int_lshift(1, y)) check y.known_ge_const(0) and y.known_le_const(LONG_BIT) => int_lshift(x, y)It expresses that x * (1 << y) can be rewritten to x << y but checks that y is known to be between 0 and LONG_BIT.
Checks and assignments can be repeated and combined with each other:
mul_pow2_const: int_mul(x, C) check C > 0 and C & (C - 1) == 0 shift = highest_bit(C) => int_lshift(x, shift)In addition to calling methods on IntBound instances, it's also possible to access their attributes, like in this rule:
and_x_c_in_range: int_and(x, C) check x.lower >= 0 and x.upper <= C & ~(C + 1) => x Rule Ordering and LivenessThe generated optimizer code will give preference to applying rules that produce a constant or a variable as a rewrite result. Only if none of those match do rules that produce new result operations get applied. For example, the rules sub_x_x and sub_add are tried before trying sub_add_consts, because the former two rules optimize to a constant and a variable respectively, while the latter produces a new operation as the result.
The rule sub_add_consts has a possible problem, which is that if the intermediate result of the int_add operation in the rule head is used by some other operations, then the sub_add_consts rule does not actually reduce the number of operations (and might actually make things slightly worse due to increased register pressure). However, currently it would be extremely hard to take that kind of information into account in the optimization pass of the JIT, so we optimistically apply the rules anyway.
Checking rule coverageEvery rewrite rule should have at least one unit test where it triggers. To ensure this, the unit test file that mainly checks integer optimizations in the JIT has an assert at the end of a test run, that every rule fired at least once.
Printing rule statisticsThe JIT can print statistics about which rule fired how often in the jit-intbounds-stats logging category, using the PYPYLOG mechanism. For example, to print the category to stdout at the end of program execution, run PyPy like this:
PYPYLOG=jit-intbounds-stats:- pypy ...The output of that will look something like this:
int_add add_reassoc_consts 2514 add_zero 107008 int_sub sub_zero 31519 sub_from_zero 523 sub_x_x 3153 sub_add_consts 159 sub_add 55 sub_sub_x_c_c 1752 sub_sub_c_x_c 0 sub_xor_x_y_y 0 sub_or_x_y_y 0 int_mul mul_zero 0 mul_one 110 mul_minus_one 0 mul_pow2_const 1456 mul_lshift 0 ... Termination and ConfluenceRight now there are unfortunately no checks that the rules actually rewrite operations towards "simpler" forms. There is no cost model, and also nothing that prevents you from writing a rule like this:
neg_complication: int_neg(x) # leads to infinite rewrites => int_mul(-1, x)Doing this would lead to endless rewrites if there is also another rule that turns multiplication with -1 into negation.
There is also no checking for confluence (yet?), i.e. the property that all rewrites starting from the same input trace always lead to the same output trace, no matter in which order the rules are applied.
ProofsIt is very easy to write a peephole rule that is not correct in all corner cases. Therefore all the rules are proven correct with Z3 before compiled into actual JIT code, by default. When the proof fails, a (hopefully minimal) counterexample is printed. The counterexample consists of values for all the inputs that fulfil the checks, values for the intermediate expressions, and then two different values for the source and the target operations.
E.g. if we try to add the incorrect rule:
mul_is_add: int_mul(a, b) => int_add(a, b)We get the following counterexample as output:
Could not prove correctness of rule 'mul_is_add' in line 1 counterexample given by Z3: counterexample values: a: 0 b: 1 operation int_mul(a, b) with Z3 formula a*b has counterexample result vale: 0 BUT target expression: int_add(a, b) with Z3 formula a + b has counterexample value: 1If we add conditions, they are taken into account and the counterexample will fulfil the conditions:
mul_is_add: int_mul(a, b) check a.known_gt_const(1) and b.known_gt_const(2) => int_add(a, b)This leads to the following counterexample:
Could not prove correctness of rule 'mul_is_add' in line 46 counterexample given by Z3: counterexample values: a: 2 b: 3 operation int_mul(a, b) with Z3 formula a*b has counterexample result vale: 6 BUT target expression: int_add(a, b) with Z3 formula a + b has counterexample value: 5Some IntBound methods cannot be used in Z3 proofs because their control flow is too complex. If that is the case, they can have Z3-equivalent formulations defined (in every case this is done, it's a potential proof hole if the Z3 friendly reformulation and the real implementation differ from each other, therefore extra care is required to make very sure they are equivalent).
It's possible to skip the proof of individual rules entirely by adding SORRY_Z3 to its body (but we should try not to do that too often):
eq_different_knownbits: int_eq(x, y) SORRY_Z3 check x.known_ne(y) => 0 Checking for satisfiabilityIn addition to checking whether the rule yields a correct optimization, we also check whether the rule can ever apply. This ensures that there are some runtime values that would fulfil all the checks in a rule. Here's an example of a rule violating this:
never_applies: int_is_true(x) check x.known_lt_const(0) and x.known_gt_const(0) # impossible condition, always False => xRight now the error messages if this goes wrong are not completely easy to understand. I hope to be able to improve this later:
Rule 'never_applies' cannot ever apply in line 1 Z3 did not manage to find values for variables x such that the following condition becomes True: And(x <= x_upper, x_lower <= x, If(x_upper < 0, x_lower > 0, x_upper < 0)) Implementation NotesThe implementation of the DSL is done in a relatively ad-hoc manner. It is parsed using rply, there's a small type checker that tries to find common problems in how the rules are written. Z3 is used via the Python API, like in the previous blog posts that are using it. The pattern matching RPython code is generated using an approach inspired by Luc Maranget's paper Compiling Pattern Matching to Good Decision Trees. See this blog post for an approachable introduction.
ConclusionNow that I've described the DSL, here are the rules that are equivalent to the imperative code in the motivation section:
mul_zero: int_mul(x, 0) => 0 mul_one: int_mul(x, 1) => x mul_minus_one: int_mul(x, -1) => int_neg(x) mul_pow2_const: int_mul(x, C) check C > 0 and C & (C - 1) == 0 shift = highest_bit(C) => int_lshift(x, shift) mul_lshift: int_mul(x, int_lshift(1, y)) check y.known_ge_const(0) and y.known_le_const(LONG_BIT) => int_lshift(x, y)The current status of the DSL is that it got merged to PyPy's main branch. I rewrote a part of the integer rewrites into the DSL, but some are still in the old imperative style (mostly for complicated reasons, the easily ported ones are all done). Since I've only been porting optimizations that had existed prior to the existence of the DSL, performance numbers of benchmarks didn't change.
There are a number of features that are still missing and some possible extensions that I plan to work on in the future:
All the integer operations that the DSL handles so far are the variants that do not check for overflow (or where overflow was proven to be impossible to happen). In regular Python code the overflow-checking variants int_add_ovf etc are much more common, but the DSL doesn't support them yet. I plan to fix this, but don't completely understand how the correctness proofs for them should be done correctly.
A related problem is that I don't understand what it means for a rewrite to be correct if some of the operations are only defined for a subset of the input values. E.g. division isn't defined if the divisor is zero. In theory, a division operation in the trace should always be preceded by a check that the divisor isn't zero. But sometimes other optimization move the check around and the connection to the division gets lost or muddled. What optimizations can we still safely perform on the division? There's lots of prior work on this question, but I still don't understand what the correct approach in our context would be.
Ordering comparisons like int_lt, int_le and their unsigned variants are not ported to the DSL yet. Comparisons are an area where the JIT is not super good yet at optimizing away operations. This is a pretty big topic and I've started a project with Nico Rittinghaus to try to improve the situation a bit more generally.
A more advanced direction of work would be to implement a simplified form of e-graphs (or ae-graphs). The JIT has like half of an e-graph data structure already, and we probably can't afford a full one in terms of compile time costs, but maybe we can have two thirds or something?
Thank you to Max Bernstein and Martin Berger for super helpful feedback on drafts of the post!
The Python Show: 48 - Writing About Python with David Mertz
In this episode of the Python Show Podcast, David Mertz is our guest. David is a prolific writer about the Python programming language. From his extremely popular IPM Developerworks articles to his multiple books on the Python language, David has been a part of the Python community for decades.
We ended up chatting about:
The history of Python
Book writing
Conference speaking
The PSF
and more!
David Mertz’s website
PyDev of the Week: David Mertz on Mouse vs Python
David Mertz on InformIT / Pearson
Real Python: Python Thread Safety: Using a Lock and Other Techniques
Python threading allows you to run parts of your code concurrently, making the code more efficient. However, when you introduce threading to your code without knowing about thread safety, you may run into issues such as race conditions. You solve these with tools like locks, semaphores, events, conditions, and barriers.
By the end of this tutorial, you’ll be able to identify safety issues and prevent them by using the synchronization primitives in Python’s threading module to make your code thread-safe.
In this tutorial, you’ll learn:
- What thread safety is
- What race conditions are and how to avoid them
- How to identify thread safety issues in your code
- What different synchronization primitives exist in the threading module
- How to use synchronization primitives to make your code thread-safe
To get the most out of this tutorial, you’ll need to have basic experience working with multithreaded code using Python’s threading module and ThreadPoolExecutor.
Get Your Code: Click here to download the free sample code that you’ll use to learn about thread safety techniques in Python.
Take the Quiz: Test your knowledge with our interactive “Python Thread Safety: Using a Lock and Other Techniques” quiz. You’ll receive a score upon completion to help you track your learning progress:
Interactive Quiz
Python Thread Safety: Using a Lock and Other TechniquesIn this quiz, you'll test your understanding of Python thread safety. You'll revisit the concepts of race conditions, locks, and other synchronization primitives in the threading module. By working through this quiz, you'll reinforce your knowledge about how to make your Python code thread-safe.
Threading in PythonIn this section, you’ll get a general overview of how Python handles threading. Before discussing threading in Python, it’s important to revisit two related terms that you may have heard about in this context:
- Concurrency: The ability of a system to handle multiple tasks by allowing their execution to overlap in time but not necessarily happen simultaneously.
- Parallelism: The simultaneous execution of multiple tasks that run at the same time to leverage multiple processing units, typically multiple CPU cores.
Python’s threading is a concurrency framework that allows you to spin up multiple threads that run concurrently, each executing pieces of code. This improves the efficiency and responsiveness of your application. When running multiple threads, the Python interpreter switches between them, handing the control of execution over to each thread.
By running the script below, you can observe the creation of four threads:
Python threading_example.py import threading import time from concurrent.futures import ThreadPoolExecutor def threaded_function(): for number in range(3): print(f"Printing from {threading.current_thread().name}. {number=}") time.sleep(0.1) with ThreadPoolExecutor(max_workers=4, thread_name_prefix="Worker") as executor: for _ in range(4): executor.submit(threaded_function) Copied!In this example, threaded_function prints the values zero to two that your for loop assigns to the loop variable number. Using a ThreadPoolExecutor, four threads are created to execute the threaded function. ThreadPoolExecutor is configured to run a maximum of four threads concurrently with max_workers=4, and each worker thread is named with a “Worker” prefix, as in thread_name_prefix="Worker".
In print(), the .name attribute on threading.current_thread() is used to get the name of the current thread. This will help you identify which thread is executed each time. A call to sleep() is added inside the threaded function to increase the likelihood of a context switch.
You’ll learn what a context switch is in just a moment. First, run the script and take a look at the output:
Shell $ python threading_example.py Printing from Worker_0. number=0 Printing from Worker_1. number=0 Printing from Worker_2. number=0 Printing from Worker_3. number=0 Printing from Worker_0. number=1 Printing from Worker_2. number=1 Printing from Worker_1. number=1 Printing from Worker_3. number=1 Printing from Worker_0. number=2 Printing from Worker_2. number=2 Printing from Worker_1. number=2 Printing from Worker_3. number=2 Copied!Each line in the output represents a print() call from a worker thread, identified by Worker_0, Worker_1, Worker_2, and Worker_3. The number that follows the worker thread name shows the current iteration of the loop each thread is executing. Each thread takes turns executing the threaded_function, and the execution happens in a concurrent rather than sequential manner.
For example, after Worker_0 prints number=0, it’s not immediately followed by Worker_0 printing number=1. Instead, you see outputs from Worker_1, Worker_2, and Worker_3 printing number=0 before Worker_0 proceeds to number=1. You’ll notice from these interleaved outputs that multiple threads are running at the same time, taking turns to execute their part of the code.
This happens because the Python interpreter performs a context switch. This means that Python pauses the execution state of the current thread and passes control to another thread. When the context switches, Python saves the current execution state so that it can resume later. By switching the control of execution at specific intervals, multiple threads can execute code concurrently.
You can check the context switch interval of your Python interpreter by typing the following in the REPL:
Python >>> import sys >>> sys.getswitchinterval() 0.005 Copied!The output of calling the getswitchinterval() is a number in seconds that represents the context switch interval of your Python interpreter. In this case, it’s 0.005 seconds or five milliseconds. You can think of the switch interval as how often the Python interpreter checks if it should switch to another thread.
An interval of five milliseconds doesn’t mean that threads switch exactly every five milliseconds, but rather that the interpreter considers switching to another thread at these intervals.
The switch interval is defined in the Python docs as follows:
Read the full article at https://realpython.com/python-thread-lock/ »[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]