Feeds
Russell Coker: Creating a Micro Users’ Group
Fosdem had a great lecture Building an Open Source Community One Friend at a Time [1]. I recommend that everyone who is involved in the FOSS community watches this lecture to get some ideas.
For some time I’ve been periodically inviting a few friends to visit for lunch, chat about Linux, maybe do some coding, and watch some anime between coding. It seems that I have accidentally created a micro users’ group.
LUGs were really big in the mid to late 90s and still quite vibrant in the early 2000’s. But they seem to have decreased in popularity even before Covid19 and since Covid19 a lot of people have stopped attending large meetings to avoid health risks. I think that a large part of the decline of users’ groups has been due to the success of YouTube. Being able to choose from thousands of hours of lectures about computers on YouTube is a disincentive to spending the time and effort needed to attend a meeting with content that’s probably not your first choice of topic. Attending a formal meeting where someone you don’t know has arranged a lecture might not have a topic that’s really interesting to you. Having lunch with a couple of friends and watching a YouTube video that one of your friends assures you is really good is something more people will find interesting.
In recent times homeschooling [2] has become more widely known. The same factors that allow learning about computers at home also make homeschooling easier. The difference between the traditional LUG model of having everyone meet at a fixed time for a lecture and a micro LUG of a small group of people having an informal meeting is similar to the difference between traditional schools and homeschooling.
I encourage everyone to create their own micro LUG. All you have to do is choose a suitable time and place and invite some people who are interested. Have a BBQ in a park if the weather is good, meet at a cafe or restaurant, or invite people to visit you for lunch on a weekend.
Related posts:
- Creating a Micro Conference The TEDxVolcano The TED conference franchise has been extended to...
- BLUG This weekend I went to the Ballarat install-fest, mini-conf, and...
- Recruiting at a LUG Meeting I’m at the main meeting of Linux Users of Victoria...
Trey Hunner: PyCon 2024 Reflection
I traveled back home from PyCon US 2024 last week. This is my reflection on my time at PyCon.
Attempting to eat veganSince 2020, I’ve been gradually eating more plant-based and a few months ago I decided to take PyCon as an opportunity to attempt exclusively vegan eating outside my own home. As I noted on Mastodon, it was a challenge and I failed every day at least once but I found the experience worthwhile. Our food system is very dairy-oriented.
Staying hydrated and fedOne of the first things I did before heading to the convention center was walk to Target and buy snacks and drinks. When at PyCon, I prefer to spend 30 minutes and $20 to have a backup plan for last minute hydration and calories (even if not the greatest calories). I never quite know when I might sleep through breakfast, find lunch lacking, or wish I’d eaten more dinner.
A tutorial, an orientation, a lightning talk, and open spacesMy responsibilities at PyCon this year included teaching a tutorial and helping run the Newcomer’s Orientation with Kojo and Sumana.
Yngve and Marie offered to act as teaching assistants during my tutorial and I was very grateful for their help! Rodrigo and Krishna also offered to TA just before my tutorial started and I was extra grateful to have even more help than I’d expected. The attendees were mostly better prepared than I expected they would be, which was also great. It’s always great to spend less time on setup and more time exploring Python together.
The newcomer’s orientation the next day went well. We kept it fairly brief and were able to address about 10 minutes of audience questions before the opening reception started.
Once my PyCon responsibilities completed, I invented a few more (light) responsibilities for myself. 😅 I signed up to give a lightning talk on how to give a lightning talk. They slotted it as the first talk of the first lightning talk session on Friday night. I kept this talk pretty much the same as the one I presented DjangoCon 2016. I could have made the transitions fancier, but I decided to embrace the idea of simplicity with the hope that audience members might think “look if that first speaker can give such a simple and succinct presentation, maybe I can too.”
On Saturday I ran an open space on Python Learning. Some of you showed up because you’re on my mailing list or you’re paying Python Morsels subscribers. Many folks showed up because the topic was interesting, either as a learner or as a teacher. I really enjoyed the round-table-style conversation we had.
I also ran a Cabo Card game open space during lunch on Sunday on the 4th floor rooftop. Cabo is my usual conference ice breaker game and I played it at least a few nights in The Westin lobby as well.
Seeing conference friends, old and newFor me, PyCon is largely about having conversations. The talks and tutorials are great for starting me thinking about an idea. The hallway track, open spaces, and meals are great for continuing conversations about those ideas (or other ideas).
My first morning in Pittsburgh, I chatted with Naomi Ceder and Reuven Lerner. I’m glad I ran into them before the conference kicked off because (as often happens at PyCon) I only very briefly saw either of them during the rest of PyCon!
After my tutorial that afternoon, I did dinner with Marie, Yngve, and Rodrigo at Rosewater Mediterranean (good vegan options, assuming you enjoy falafel and various sauces). As sometimes happens at PyCon, another PyCon attendee, Sachin, joined our table because we noticed him eating on his own at a table near us and invited him to join us.
On Saturday, Melanie, David, Jay, and I had a sort of mini San Diego Python study group reunion dinner before inviting folks to join us for Cabo and Knucklebones one night. The 4 of us originally met each other (along with Carol and other wonderful Python folks) at the San Diego Python study group about 10 years ago.
I had some wonderful conversations about ways to improve the Python documentation over dinner (at Nicky’s Thai) on Sunday night with so many docs-concerned folks who I highly respect. I’m really excited that Python has the documentation editorial board and I’m hopeful that that board, with the help of many others community members, will usher in big improvements to the documentation in the coming years.
I also met a number of Internet acquaintances IRL for the first time at PyCon. I met Tereza and Jessica, who I know from our work in the PSF Code of Conduct workgroup. I met Steve Lott, who I originally knew as a prolific question-answerer. I also met Hugo, a CPython core dev, the Python 3.14 & 3.15 release manager, and a social media user (which is how I’ve primarily interacted with him because the Internet is occasionally lovely). I was also very excited to meet many Python Morsels members as well as folks who know me through my weekly Python tips newsletter.
I was grateful to chat with Hynek and Al about creating talks, YouTube videos, and other online content. I also enjoyed chatting with Glyph a bit about our experiences consulting and training and (in hindsight) wished I’d planned an open space for either consultants or trainers, both of which have been held at PyCon before but it just takes someone to stick it on the open space board.
Many folks I only saw very briefly (I said a quick hi and bye to Andrew over lunch during the sprints) and some I didn’t see at all (Frank was at PyCon but we never ran into each other). Some I essentially saw through playing a few rounds of Cabo (Thomas and Ethan among many others). We also ran into at least 4 other PyCon attendees in the airport on Tuesday afternoon, including Bob and Julian, who it’s always a pleasure to see.
A Mastodon-oriented PyConOn Thursday night I had the feeling that the number of Mastodon posts I saw on the #PyConUS hashtag was greater than the number of Twitter posts. I (very unscientifically) counted up the number of posts I was seeing on each and found that my perception was correct: Mastodon seemed to slightly overtake Twitter at PyCon this year.
Over dinner on Wednesday, I tried to convince Marie, Yngve, and Rodrigo to get Mastodon accounts just to follow the hashtag during PyCon. I succeeded: Marie and Yngve and Rodrigo!
Mastodon will never be the social media platform. Its decentralized nature is too much of a barrier for many folks. However, it does seem to be used by enough somewhat nerdy Python folks to now be one the most used social media platform for PyCon posting.
The talksI ended up spending little time in the talks during PyCon. This wasn’t on purpose. I just happened to attend many open spaces, take personal breaks, and end up in hallway conversations often. I did see many of the lightning talks live, as well as Jay, Simon, and Sumana’s keynotes (all of them were exceptional) and the opening and closing remarks. I also watched a few talks from my hotel room while taking breaks.
While I’m often a bit light on my talk load at PyCon, I do recommend folks attend a good handful of live talks during PyCon, as Jon and others recommend. I wish I had seen more talks live. I also wish I had attended a few open spaces that I missed.
At any one time, I know that I’m always missing about 90% of what’s scheduled during PyCon (if you include the talks and the open spaces). That’s assuming I don’t ditch the conference entirely for a few hours and walk across a bridge or ride a funicular (neither of which I did, as I stuck around the venue the whole time this year). I am glad I saw, did, and talked about everything I did, but there’s always something I wish I’d seen/done!
The sprintsThanks to the documentation dinner, I had a couple documentation-related ideas in mind on the first day of sprints. But I’m also really excited about the new Python REPL coming in Python 3.13 (in case you can’t tell from how much I talk about it), so I sprinted on that instead. Łukasz assigned me the task of researching keyboard shortcuts that the new REPL is missing (compared to the current one on Linux and Mac) so I spent some time researching that. I got to see the REPL running on Anthony’s laptop on Windows and I am so excited that Windows support will be included before 3.13.0 lands! 🎉
Partly inspired by Carol Willing’s PyCon preview message, I also thanked Pablo, Łukasz, and Lysandros in-person for all their work on the new Python REPL. 🤗
Until next yearI’ll be keynoting at PyOhio this year.
Besides PyOhio, I’m not sure whether I’ll make it to another conference until PyCon US next year. I’d love to attend all of them, but I do have work and personal goals that need accomplishing too!
I hope to see you at PyCon US 2025! In the meantime, if you’re wishing we’d exchanged contact details or met in-person, please feel free to stay in touch through Mastodon, LinkedIn, my weekly emails, YouTube, or Twitter.
ImageX: Countless Benefits of Interactive Calculators and One Drupal Module to Easily Add Them to Forms
Authored by Nadiia Nykolaichuk, João Paulo Constantino, Ana Carolina, and Gabriel Passarelli.
PyCoder’s Weekly: Issue #631 (May 28, 2024)
#631 – MAY 28, 2024
View in Browser »
In this video course, you’ll learn the basics of GUI programming with Tkinter, the de facto Python GUI framework. Master GUI programming concepts such as widgets, geometry managers, and event handlers. Then, put it all together by building two applications: a temperature converter and a text editor.
REAL PYTHON course
This article from the developer of pyastgrep introduces you to the tool which can now be used as a library. The post talks about how to use it and what kind of linting it does best.
LUKE PLANT
Stop wasting 30% of your team’s sprint on maintaining legacy codebases. Automatically migrate and keep up-to-date on Python versions, so that you can focus on being productive while staying secure, without the risk of breaking changes - Get a code assessment today →
ACTIVESTATE sponsor
Django 5.1 has gone alpha so the list of features targeting this release has more or less solidified. This article introduces you to what is coming in Django 5.1.
JEFF TRIPLETT
This quiz is designed to push your knowledge of pivot tables a little bit further. You won’t find all the answers by reading the tutorial, so you’ll need to do some investigating on your own. By finding all the answers, you’re sure to learn some other interesting things along the way.
REAL PYTHON
Python Enhancement Proposal 649: Deferred Evaluation Of Annotations Using Descriptors has been re-targeted to the Python 3.14 release
PYTHON.ORG
This is part 5 of a deep dive into writing automated tests, but also works well as an independent article. This post talks about the taxonomy of testing, like the differences between unit and integration tests, and how nobody can quite agree on a definition of either.
BITECODE
In this tutorial, you’ll get to know some of the most commonly used built-in exceptions in Python. You’ll learn when these exceptions can appear in your code and how to handle them. Finally, you’ll learn how to raise some of these exceptions in your code.
REAL PYTHON
This article is a deep dive on the hiring and firing practices in the software field, and unlike most articles focuses on senior engineering roles. It isn’t a “first job” post, but a “how the decision process works” article.
ED CREWE
Streamlit is a wonderful tool for building dashboards with its peculiar execution model, but using asyncio data sources with it can be a real pain. This article is about how you correctly use those two technologies together.
HANDMADESOFTWARE • Shared by Thorin Schiffer
EuroPython happens in Prague July 8-14 and as the conference approaches more and more is happening. This posting from their May newsletter highlights the keynotes and other announcements.
EUROPYTHON
This guide admits to being “yet another”, but unlike most that are out there, spends less time discussing the cosmetic aspects of a good commit message and more time on the content.
SIMON TATHAM
Python Software Foundation securing this sponsorship affects the entire Python ecosystem, most notably the security and reliability of the Python Package Index (PyPI).
SOCKET.DEV • Shared by Sarah Gooding
Sumana gave the closing keynote address at PyCon US this year and this posting shares all the links and references from the talk.
SUMANA HARIHARESWARA
Learn to use the Python calendar module to create and customize calendars in plain text, HTML or directly in your terminal.
REAL PYTHON
This post is a collection of accessibility resources mostly for web sites, but some tools can be used elsewhere as well.
SARAH ABDEREMANE
GITHUB.COM/APPSILON • Shared by Appsilon
Oven: Explore Python Packages tkforge: Drag & Drop in Figma to Create a Python GUI tach: Enforce a Modular, Decoupled Package Architecture Events Weekly Real Python Office Hours Q&A (Virtual) May 29, 2024
REALPYTHON.COM
May 30, 2024
MEETUP.COM
June 1 to June 3, 2024
NOKIDBEHIND.ORG
June 1 to June 2, 2024
DJANGOGIRLS.ORG
June 1, 2024
MEETUP.COM
June 3, 2024
J.MP
June 5 to June 10, 2024
DJANGOCON.EU
June 7 to June 10, 2024
PYCON.CO
Happy Pythoning!
This was PyCoder’s Weekly Issue #631.
View in Browser »
[ Subscribe to 🐍 PyCoder’s Weekly 💌 – Get the best Python news, articles, and tutorials delivered to your inbox once a week >> Click here to learn more ]
Ned Batchelder: One way to fix Python circular imports
In Python, a circular import is when two files each try to import the other, causing a failure when a module isn’t fully initialized. The best way to fix this situation is to organize your code in layers so that the importing relationships naturally flow in just one direction. But sometimes it works to simply change the style of import statement you use. I’ll show you.
Let’s say you have these files:
1# one.py2from two import func_two
3
4def func_one():
5 func_two()
1# two.py
2from one import func_one
3
4def do_work():
5 func_one()
6
7def func_two():
8 print("Hello, world!")
1# main.py
2from two import do_work
3do_work()
If we run main.py, we get this:
% python main.pyTraceback (most recent call last):
File "main.py", line 2, in <module>
from two import do_work
File "two.py", line 2, in <module>
from one import func_one
File "one.py", line 2, in <module>
from two import func_two
ImportError: cannot import name 'func_two' from partially initialized
module 'two' (most likely due to a circular import) (two.py)
When Python imports a module, it executes the file line by line. Every global in the file (top-level name including functions and classes) becomes an attribute on the module object being constructed. In two.py, we import from one.py at line 2. At that moment, the two module has been created, but it has no attributes yet because nothing has been defined yet. It will eventually have do_work and func_two, but we haven’t executed those def statements yet, so they don’t exist. Like a function call, when the import statement is run, it begins executing the imported file, and doesn’t come back to the current file until the import is done.
The import of one.py starts, and its line 2 tries to get a name from the two module. As we just said, the two module exists, but has no names defined yet. That gives us the error.
Instead of importing names from modules, we can import whole modules instead. All we do is change the form of the imports, and how we reference the functions from the imported modules, like this:
1# one.py2import two # was: from two import func_two
3
4def func_one():
5 two.func_two() # was: func_two()
1# two.py
2import one # was: from one import func_one
3
4def do_work():
5 one.func_one() # was: func_one()
6
7def func_two():
8 print("Hello, world!")
1# main.py
2from two import do_work
3do_work()
Running the fixed code, we get this:
% python main.pyHello, world!
It works because two.py imports one at line 2, and then one.py imports two at its line 2. That works just fine, because the two module exists. It’s still empty like it was before the fix, but now we aren’t trying to find a name in it during the import. Once all of the imports are done, the one and two modules both have all their names defined, and we can access them from inside our functions.
The key idea here is that “from two import func_two” tries to find func_two during the import, before it exists. Deferring the name lookup to the body of the function by using “import two” lets all of the modules get themselves fully initialized before we try to use them, avoiding the circular import error.
As I mentioned at the top, the best way to fix circular imports is to structure your code so that modules don’t have mutual dependencies like this. But that isn’t always easy, and this can buy you a little time to get your code working again.
Evolving Web: Evolving Web Wins Pantheon Award for Social Impact
We are thrilled to announce that Evolving Web has been honored with the Social Impact Award in the Inaugural Pantheon Partner Awards for our work on the Planned Parenthood Direct website.
The winners were announced at the Pantheon Partner dinner, held during DrupalCon Portland on May 6, 2024. Congratulations to the other winners who took to the stage with us:
- Elevated Third – Partner of the Year Award
- WebMD Ignite – Innovation Award
- HoundER – Rookie of the Year Award
- Forum One – Customer First Award
- Danny Pfeiffer – Friends of Pantheon Partners Award
Pantheon’s Partner Awards recognize the outstanding contributions of digital agencies that drive positive change. We’re proud to be acknowledged for our role in the Planned Parenthood Direct project, which supports reproductive rights and enhancing access to reproductive and sexual healthcare. Our work on the project demonstrates our commitment to creating impact through user-centric digital experiences.
A Mission-Driven CollaborationIn the U.S., reproductive and sexual health care services vary from state to state. Planned Parenthood Direct (PPD) aims to provide trusted care from anywhere by offering “on-the-go” services. We collaborated with PPD to build a secure, mobile-first website on that informs users of available services in their state. The site also encourages users to download the PPD app, which they can use to order birth control.
Designing for Impact and InclusionOur team undertook the challenge of creating a highly informative, accessible website that appeals to a younger audience.
- We created dedicated pages for each state, ensuring they’re easy for PPD to update and optimized for search engines.
- We created a new visual brand identity that incorporates bold design principles for a youthful, reassuring, and non-stigmatizing user experience.
- Our mobile-first approach ensured that the site meets the needs of an audience who prefer mobile devices.
- We also followed accessibility best practices to ensure a user-friendly experience for all, including users with disabilities.
Security was a paramount concern, given the political climate surrounding reproductive rights. We ensured a highly secure online experience using a decoupled architecture with Next.js for the front-end and Drupal 10 for the back-end. Hosting on Pantheon added additional layers of security, including HTTPS certificates and DDoS protection.
Our work on the Planned Parenthood Direct website included the development of 17 custom components and 14 content types in Layout Builder. This empowers PPD’s content editors to create flexible, engaging, and visually appealing layouts. The results is streamlined content creation and management, allowing PPD to maintain and grow their website effectively.
Outstanding Results & Continued CommitmentThe new Planned Parenthood Direct website has been instrumental in continuing PPD’s mission to support human rights and ensure access to sexual and reproductive healthcare.
A big thank you to Pantheon for recognizing our efforts, and to Planned Parenthood Direct for trusting us with this important project. We’re honoured to have partnered with you both.
As we celebrate this award, we’re reminded of the importance of our work and the impact it has on communities. We look forward to future opportunities to make a difference.
Partner with us to turn your vision into a powerful digital experience that drives change.
Evolving Web: Evolving Web Wins Pantheon Award for Social Impact
We are thrilled to announce that Evolving Web has been honored with the Social Impact Award in the Inaugural Pantheon Partner Awards for our work on the Planned Parenthood Direct website.
The winners were announced at the Pantheon Partner dinner, held during DrupalCon Portland on May 6, 2024. Congratulations to the other winners who took to the stage with us:
- Elevated Third – Partner of the Year Award
- WebMD Ignite – Innovation Award
- HoundER – Rookie of the Year Award
- Forum One – Customer First Award
- Danny Pfeiffer – Friends of Pantheon Partners Award
Pantheon’s Partner Awards recognize the outstanding contributions of digital agencies that drive positive change. We’re proud to be acknowledged for our role in the Planned Parenthood Direct project, which supports reproductive rights and enhancing access to reproductive and sexual healthcare. Our work on the project demonstrates our commitment to creating impact through user-centric digital experiences.
A Mission-Driven CollaborationIn the U.S., reproductive and sexual health care services vary from state to state. Planned Parenthood Direct (PPD) aims to provide trusted care from anywhere by offering “on-the-go” services. We collaborated with PPD to build a secure, mobile-first website on that informs users of available services in their state. The site also encourages users to download the PPD app, which they can use to order birth control.
Designing for Impact and InclusionOur team undertook the challenge of creating a highly informative, accessible website that appeals to a younger audience.
- We created dedicated pages for each state, ensuring they’re easy for PPD to update and optimized for search engines.
- We created a new visual brand identity that incorporates bold design principles for a youthful, reassuring, and non-stigmatizing user experience.
- Our mobile-first approach ensured that the site meets the needs of an audience who prefer mobile devices.
- We also followed accessibility best practices to ensure a user-friendly experience for all, including users with disabilities.
Security was a paramount concern, given the political climate surrounding reproductive rights. We ensured a highly secure online experience using a decoupled architecture with Next.js for the front-end and Drupal 10 for the back-end. Hosting on Pantheon added additional layers of security, including HTTPS certificates and DDoS protection.
Our work on the Planned Parenthood Direct website included the development of 17 custom components and 14 content types in Layout Builder. This empowers PPD’s content editors to create flexible, engaging, and visually appealing layouts. The results is streamlined content creation and management, allowing PPD to maintain and grow their website effectively.
Outstanding Results & Continued CommitmentThe new Planned Parenthood Direct website has been instrumental in continuing PPD’s mission to support human rights and ensure access to sexual and reproductive healthcare.
A big thank you to Pantheon for recognizing our efforts, and to Planned Parenthood Direct for trusting us with this important project. We’re honoured to have partnered with you both.
As we celebrate this award, we’re reminded of the importance of our work and the impact it has on communities. We look forward to future opportunities to make a difference.
Partner with us to turn your vision into a powerful digital experience that drives change.
Go Deh: Recreating the CVM algorithm for estimating distinct elements gives problems
Someone at work posted a link to this Quanta Magazine article. It describes a novel, and seemingly straight-forward way to estimate the number of distinct elements in a datastream.
Quanta describes the algorithm, and as an example gives "counting the number of distinct words in Hamlet".
Following QuantaI looked at the description and decided to follow their text. They carefully described each round of the algorithm which I coded up and then looked for the generalizations and implemented a loop over alll items in the stream ....
It did not work! I got silly numbers. I could download Hamlet split it into words, (around 32,000), do len(set(words) to get the exact number of distinct words, (around 7,000), then run it through the algorithm and get a stupid result with tens of digits for the estimated number of distinct words.
I re-checked my implementation of the Quanta-described algorithm and couldn't see any mistake, but I had originally noticed a link to the original paper. I did not follow it at first as original papers can be heavily into maths notation and I prefer reading algorithms described in code/pseudocode.
I decided to take a look at the original.
The CVM Original PaperI scanned the paper.
I read the paper.
I looked at Algorithm 1 as a probable candidate to decypher into Python, but the description was cryptic. Heres that description taken from the paper:
AI To the rescue!?I had a brainwave💡lets chuck it at two AI's and see what they do. I had Gemini and I had Copilot to hand and asked them each to express Algorithm 1 as Python. Gemini did something, and Copilot finally did something but I first had to open the page in Microsoft Edge.
There followed hours of me reading and cross-comparing between the algorithm and the AI's. If I did not understand where something came from I would ask the generating AI; If I found an error I would first, (and second and...), try to get the AI to make a fix I suggested.
At this stage I was also trying to get a feel for how the AI's could help me, (now way past what I thought the algorithm should be, just to see what it would take to get those AI's to cross T's and dot I's on a good solution).
Not a good use of time! I now know that asking questions to update one of the 20 to 30 lines of the Python function might fix that line, but unfix another line you had fixed before. Code from the AI does not have line numbers making it difficult to state what needs changing, and where.They can suggest type hints and create the beginnings of docstrings, but, for example, it pulled out the wrong authors for the name of the algorithm.
In line 1 of the algorithm, the initialisation of thresh is clearly shown, I thought, but both AI's had difficulty getting the Python right. eventually I cut-n-pasted the text into each AI, where they confidentially said "OF course...", made a change, and then I had to re-check for any other changes.
I first created this function:
def F0_Estimator(stream: Collection[Any], epsilon: float, delta: float) -> float: """ ... """ p = 1 X = set() m = len(stream) thresh = math.ceil(12 / (epsilon ** 2) * math.log(8 * m / delta))for item in stream: X.discard(item) if random.random() < p: X.add(item) if len(X) == thresh: X = {x_item for x_item in X if random.random() < 0.5} p /= 2 return len(X) / p
I tested it with Hamlet data and it made OK estimates.
Elated, I took a break.
Hacker NewsThe next evening I decided to do a search to see If anyone else was talking about the algorithm and found a thread on Hacker News that was right up my street. People were discussing those same problems found in the Quanta Article - and getting similar ginormous answers. They had one of the original Authors of the paper making comments! And others had created code from the actual paper and said it was also easier than the Quanta description.
The author mentioned that no less than Donald Knuth had taken an interest in their algorithm and had noted that the expression starting `X = ...` four lines from the end could, thoretically, make no change to X, and the solution was to encase the assignment in a while loop that only exited if len(X) < thresh.
Code updateI decided to add that change:
def F0_Estimator(stream: Collection[Any], epsilon: float, delta: float) -> float: """ Estimates the number of distinct elements in the input stream.This function implements the CVM algorithm for the problem of estimating the number of distinct elements in a stream of data. The stream object must support an initial call to __len__
Parameters: stream (Collection[Any]): The input stream as a collection of hashable items. epsilon (float): The desired relative error in the estimate. It must be in the range (0, 1). delta (float): The desired probability of the estimate being within the relative error. It must be in the range (0, 1).
Returns: float: An estimate of the number of distinct elements in the input stream. """ p = 1 X = set() m = len(stream) thresh = math.ceil(12 / (epsilon ** 2) * math.log(8 * m / delta))
for item in stream: X.discard(item) if random.random() < p: X.add(item) if len(X) == thresh: while len(X) == thresh: # Force a change X = {x_item for x_item in X if random.random() < 0.5} # Random, so could do nothing p /= 2 return len(X) / p
In the code above, the variable thresh, (threshhold), named from Algorithm 1, is used in the Quanta article to describe the maximum storage available to keep items from the stream that have been seen before. You must know the length of the stream - m, epsilon, and delta to calculate thresh.
If you were to have just the stream and thresh as the arguments you could return both the estimate of the number of distinct items in the stream as well as counting the number of total elements in the stream.
Epsilon could be calculated from the numbers we now know.
This function implements the CVM algorithm for the problem of estimating the number of distinct elements in a stream of data. The stream object does NOT have to support a call to __len__
Parameters: stream (Iterable[Any]): The input stream as an iterable of hashable items. thresh (int): The max threshhold of stream items used in the estimation.py
Returns: tuple[float, int]: An estimate of the number of distinct elements in the input stream, and the count of the number of items in stream. """ p = 1 X = set() m = 0 # Count of items in stream
for item in stream: m += 1 X.discard(item) if random.random() < p: X.add(item) if len(X) == thresh: while len(X) == thresh: # Force a change X = {x_item for x_item in X if random.random() < 0.5} # Random, so could do nothing p /= 2 return len(X) / p, m
def F0_epsilon( thresh: int, m: int, delta: float=0.05, # 0.05 is 95% ) -> float: """ Calculate the relative error in the estimate from F0_Estimator2(...)
Parameters: thresh (int): The thresh value used in the call TO F0_Estimator2. m (int): The count of items in the stream FROM F0_Estimator2. delta (float): The desired probability of the estimate being within the relative error. It must be in the range (0, 1) and is usually 0.05 to 0.01, (95% to 99% certainty).
Returns: float: The calculated relative error in the estimate
""" return math.sqrt(12 / thresh * math.log(8 * m / delta))Testingdef stream_gen(k: int=30_000, r: int=7_000) -> list[int]: "Create a randomised list of k ints of up to r different values." return random.choices(range(r), k=k)
def stream_stats(s: list[Any]) -> tuple[int, int]: length, distinct = len(s), len(set(s)) return length, distinct
# %% print("CVM ALGORITHM ESTIMATION OF NUMBER OF UNIQUE VALUES IN A STREAM")
stream_size = 2**18reps = 5target_uniques = 1while target_uniques < stream_size: the_stream = stream_gen(stream_size+1, target_uniques) target_uniques *= 4 size, unique = stream_stats(the_stream)
print(f"\n Actual:\n {size = :_}, {unique = :_}\n Estimations:")
delta = 0.05 threshhold = 2 print(f" All runs using {delta = :.2f} and with estimate averaged from {reps} runs:") while threshhold < size: estimate, esize = F0_Estimator2(the_stream.copy(), threshhold) estimate = sum([estimate] + [F0_Estimator2(the_stream.copy(), threshhold)[0] for _ in range(reps - 1)]) / reps estimate = int(estimate + 0.5) epsilon = F0_epsilon(threshhold, esize, delta) print(f" With {threshhold = :7_} -> " f"{estimate = :_}, +/-{epsilon*100:.0f}%" + (f" {esize = :_}" if esize != size else "")) threshhold *= 8
The algorithm generates an estimate based on random sampling, so I run it multiple times for the same input and report the mean estimate from those runs.
Sample outputCVM ALGORITHM ESTIMATION OF NUMBER OF UNIQUE VALUES IN A STREAM
Actual: size = 262_145, unique = 1 Estimations: All runs using delta = 0.05 and with estimate averaged from 5 runs: With threshhold = 2 -> estimate = 1, +/-1026% With threshhold = 16 -> estimate = 1, +/-363% With threshhold = 128 -> estimate = 1, +/-128% With threshhold = 1_024 -> estimate = 1, +/-45% With threshhold = 8_192 -> estimate = 1, +/-16% With threshhold = 65_536 -> estimate = 1, +/-6%
Actual: ... Actual: size = 262_145, unique = 1_024 Estimations: All runs using delta = 0.05 and with estimate averaged from 5 runs: With threshhold = 2 -> estimate = 16_384, +/-1026% With threshhold = 16 -> estimate = 768, +/-363% With threshhold = 128 -> estimate = 1_101, +/-128% With threshhold = 1_024 -> estimate = 1_018, +/-45% With threshhold = 8_192 -> estimate = 1_024, +/-16% With threshhold = 65_536 -> estimate = 1_024, +/-6%
Actual: size = 262_145, unique = 4_096 Estimations: All runs using delta = 0.05 and with estimate averaged from 5 runs: With threshhold = 2 -> estimate = 13_107, +/-1026% With threshhold = 16 -> estimate = 3_686, +/-363% With threshhold = 128 -> estimate = 3_814, +/-128% With threshhold = 1_024 -> estimate = 4_083, +/-45% With threshhold = 8_192 -> estimate = 4_096, +/-16% With threshhold = 65_536 -> estimate = 4_096, +/-6%
Actual: size = 262_145, unique = 16_384 Estimations: All runs using delta = 0.05 and with estimate averaged from 5 runs: With threshhold = 2 -> estimate = 0, +/-1026% With threshhold = 16 -> estimate = 15_155, +/-363% With threshhold = 128 -> estimate = 16_179, +/-128% With threshhold = 1_024 -> estimate = 16_986, +/-45% With threshhold = 8_192 -> estimate = 16_211, +/-16% With threshhold = 65_536 -> estimate = 16_384, +/-6%
Actual: size = 262_145, unique = 64_347 Estimations: All runs using delta = 0.05 and with estimate averaged from 5 runs: With threshhold = 2 -> estimate = 26_214, +/-1026% With threshhold = 16 -> estimate = 73_728, +/-363% With threshhold = 128 -> estimate = 61_030, +/-128% With threshhold = 1_024 -> estimate = 64_422, +/-45% With threshhold = 8_192 -> estimate = 64_760, +/-16% With threshhold = 65_536 -> estimate = 64_347, +/-6%
Looks good!
WikipediaAnother day, and I decide to start writing this blog post. I searched again and found the Wikipedia article on what it called the Count-distinct problem.
Looking through it, It had this wrong description of the CVM algorithm:
The, (or a?), problem with the wikipedia entry is that it shows
...within the while loop. You need an enclosing if |B| >= s for the while loop and the assignment to p outside the while loop, but inside this new if statement.
It's tough!Both Quanta Magazine, and whoever added the algorithm to Wikipedia got the algorithm wrong.
I've written around two hundred tasks on site Rosettacode.org for over a decade. Others had to read my description and create code in their chosen language to implement those tasks. I have learnt from the feedback I got on talk pages to hone that craft, but details matter. Examples matter. Constructive feedback matters.
END.
Real Python: Efficient Iterations With Python Iterators and Iterables
Python’s iterators and iterables are two different but related tools that come in handy when you need to iterate over a data stream or container. Iterators power and control the iteration process, while iterables typically hold data that you want to iterate over one value at a time.
Iterators and iterables are fundamental components of Python programming, and you’ll have to deal with them in almost all your programs. Learning how they work and how to create them is key for you as a Python developer.
In this video course, you’ll learn how to:
- Create iterators using the iterator protocol in Python
- Understand the differences between iterators and iterables
- Work with iterators and iterables in your Python code
- Use generator functions and the yield statement to create generator iterators
- Build your own iterables using different techniques, such as the iterable protocol
- Use the asyncio module and the await and async keywords to create asynchronous iterators
[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
Python Software Foundation: Thinking about running for the Python Software Foundation Board of Directors? Let’s talk!
PSF Board elections are a chance for the community to choose representatives to help the PSF create a vision for and build the future of the Python community. This year there are 3 seats open on the PSF board. Check out who is currently on the PSF Board. (Débora Azevedo, Kwon-Han Bae, and Tania Allard are at the end of their current terms.)
Office Hours DetailsThis year, the PSF Board is running Office Hours so you can connect with current members to ask questions and learn more about what being a part of the Board entails. There will be two Office Hour sessions:
- June 11th, 4 PM UTC
- June 18th, 12 PM UTC
Make sure to check what time that is for you. We welcome you to join the PSF Discord and navigate to the #psf-elections channel to participate in Office Hours. The server is moderated by PSF Staff and locked between office hours sessions. If you’re new to Discord, check out some Discord Basics to help you get started.
People who care about the Python community, who want to see it flourish and grow, and also have a few hours a month to attend regular meetings, serve on committees, participate in conversations, and promote the Python community. Check out our Life as Python Software Foundation Director video to learn more about what being a part of the PSF Board entails. We also invite you to review our Annual Impact Report for 2023 to learn more about the PSF mission and what we do.
You can nominate yourself or someone else. We encourage you to reach out to people before you nominate them to ensure they are enthusiastic about the potential of joining the Board. Nominations open on Tuesday, June 11th, 2:00 PM UTC, so you have a few weeks to research the role and craft a nomination statement. The nomination period ends on June 25th, 2:00 PM UTC.
Robin Wilson: How to install the Python triangle package on an Apple Silicon Mac
I was recently trying to set up RasterVision on my Apple Silicon Mac (specifically a M1 MacBook Pro, but I’m pretty sure this applies to any Apple Silicon Mac). It all went fine until it came time to install the triangle package, when I got an error. The error output is fairly long, but the key part is the end part here:
triangle/core.c:196:12: fatal error: 'longintrepr.h' file not found #include "longintrepr.h" ^~~~~~~~~~~~~~~ 1 error generated. error: command '/usr/bin/clang' failed with exit code 1 [end of output]It took me quite a bit of searching to find the answer (Google just isn’t very good at giving relevant results these days), but actually it turns out to be very simple. The latest version of triangle on PyPI doesn’t work on Apple Silicon, but the code in the Github repository does work, so you can install directly from Github with this command:
pip install git+https://github.com/drufat/triangle.gitand it should all work fine.
Once you’ve done this, install rastervision again and it should recognise that the triangle package is already installed and not try to install it again.
Call for Papers – Qt World Summit 2025 in Munich
Qt World Summit is back and bigger than ever! We are looking for speakers, collaborators, and industry thought leaders to share their expertise and thoughts at the upcomingQt World Summit on May 6-7th, 2025 in Munich, Germany.
*Please note we are looking for live talks only.
Open Source AI Definition – Weekly update May 27
- @juliaferraioli and the AWS team have reopened the debate regarding access to training data. This comes in a new forum which mirrors concerns raised in a previous one. They argue that to achieve modifiability, an AI system must ship the original training dataset used to train it. Full transparency and reproducibility require the release of all datasets used to train, validate, test, and benchmark. For Ferraioli, data is considered equivalent to source code for AI systems, therefore its inclusion should not be optional. In a message signed by the AWS Open Source team, she proposed that original training datasets or synthetic data with justification for non-release be required to meet the Open Source AI standard.
- @stefano added some reminders as we reopen this debate. These are the points to keep in mind:
- Abandon the mental map that makes you look for the source of AI (or ML) as that map has been driving us in circles. Instead, we’re looking for the “preferred form to make modifications to the system”
- The law in most legislation around the world makes it illegal to distribute data, because of copyright, privacy and other laws. It’s also not as clear how the law treats datasets and it’s constantly changing
- Text of draft 0.0.8 is drafted to be vague on purpose regarding “Data information”. This is to resist the test of time and technology changes.
- When criticizing the draft, please provide specific examples in your question, and avoid arguing in the abstract.
- @danish_contractor argues that the current draft is likely to disincentivize openness due to the community viewing models (BLOOM or StarCoder), which include usage restrictions to prevent harms, less favorably despite being more transparent, reproducible, and thus more “open” than models like Mistral.
- @Pam Chestek clarified that Open Source has two angles: the rights to use, study, modify and share, coupled with those rights being unrestricted. Both are equally important.
- This debate echoes earlier ones on recognizing open components of an AI system.
- The FAQ page is starting to take shape and we would appreciate more feedback. So far, we have preliminary answers to these questions:
- Why is the original training dataset not required?
- Why the grant of freedoms is to its users?
- What are the model parameters?
- Are model parameters copyrightable?
- What does “Available under OSD-compliant license” mean?
- What does “Available under OSD-conformant terms” mean?
- Why is the Open Source AI Definition includes a list of components while the Open Source Definition for software doesn’t say anything about documentation, roadmap and other useful things?
- Why is there no mention of safety and risk limitations in the Open Source AI Definition?
- @vamiller has submitted on behalf of the LLM360 team a review of their models. In his view the v0.0.8 reflect the principles of Open Source applied to AI. He asks about the ODC-By licence, arguing that it is compatible with OSI’s principles but it’s a data-only license.
- The next town hall meeting will take place on May 31st at 3:00 pm – 4:00 pm UTC. We encourage all who can participate to attend. This week, we will delve deeper into the issues regarding access (or not) to training data.
Specbee: How CKEditor 5 is transforming content authoring experience in Drupal 10
Malayalam open font design competition announced
Rachana Institute of Typography, in association with KaChaTaThaPa Foundation and Sayahna Foundation, is launching a Malayalam font design competition for students, professionals, and amateur designers.
Selected fonts will be published under Open Font License for free use.
It is not necessary to know details of font technology; skills to design characters would suffice.
Timelines, regulations, prizes and more details are available at the below URLs.
English: https://sayahna.net/fcomp-en
Malayalam: https://sayahna.net/fcomp-ml
Interested participants may register at https://sayahna.net/fcomp
Last day for registration is 30th June 2024.
Sahil Dhiman: A Late, Late Debconf23 Post
After much procrastination, I have gotten around to completing DebConf23 (DC23), Kochi blog post. I kind of lost the original etherpad which started before DebConf23, for jotting down things. So I started afresh with whatever I can remember, months after the actual conference ended. So things might be as accurate as my memory.
DebConf23, was the 24th annual Debian Conference, happened in Infopark, Kochi, India from 10th September to 17th September 2023. It was preceded by DebCamp from 3rd September to 9th September 2023.
The first formal bid to host DebConf in India was made during DebConf18 in Hsinchu, Taiwan by Raju Dev, which didn’t came our way. In next DebConf, DebConf19 in Curitiba, Brazil, with help and support from Sruthi, Utkarsh and the whole team, India got the opportunity to host DebConf22, which eventually became DebConf23 for the reasons you all know.
I initially met the local team on the sidelines of DebConf20, which was also my first DebConf. DC20 introduced me to how things work in Debian. Having recently switched to Debian and video teams called for volunteer email pulled me in. Things stuck, and I kept hanging out and helping the local Indian DC team with various stuff. We did manage to organize multiple events leading to DebConf23 including MiniDebConf India 2021 Online, MiniDebConf Palakkad 2022, MiniDebConf Tamil Nadu 2023 and DebUtsav Kochi 2023, which gave us quite a bit of experience and workout. Many local organizers from these conferences later joined various DebConf teams during the conference to help out.
For DebConf23, originally, I was part of publicity team because that was my usual thing, but after a team redistribution exercise, Sruthi and Praveen moved me to sponsorship team, as anyhow we didn’t have to do much publicity and sponsorship was one of those things I could get involved remotely. Sponsorship team had to take care of raising funds by reaching out to sponsors, managing invoices and fulfillment. Praveen joined as well in sponsorship team. We had help from international sponsorship team, Anisa, Daniel and various TOs which took care of reaching out to international orgs, and we took care of reaching out to Indian organizations for sponsorship. It was really proud moment when my present employer, Unmukti (makers of hopbox) came aboard as Bronze sponsor. Though fundraising seem to be hit hard from tech industry slowdown and layoffs. Many of our yesteryear sponsors couldn’t sponsor.
We had biweekly local team meetings, which were turned to weekly as we neared the event. This was done in addition to bi-weekly global team meeting.
Pathu, DebConf23 mascotTo describe the venue, the conference happened in InfoPark, Kochi with the main conference hall being Athulya Hall and food, accommodation and two smaller halls in Four Point Hotel, right outside Infopark. We got the Athulya Hall as part of venue sponsorship from Infopark. The distance between both of them was around 300 meters. Halls were named Anamudi, Kuthiran and Ponmudi based on hills and mountain areas in host state of Kerala. Other than Annamudi hall which was the main hall, I couldn’t remember the names of the hall, I still can’t. Four Points was big and expensive, and we had, as expected, cost overruns. Due to how DebConf function, an Indian university wasn’t suitable to host a conference of this scale.
Four Point's Infinity Pool at NightI landed in Kochi on the first day of DebCamp on 3rd September. As usual, met Abraham first, and the better part of the next hour was spent on meet and greet. It was my first IRL DebConf so met many old friends and new folks. I got a room to myself. Abraham lived nearby and hadn’t taken the accommodation, so I asked him to join. He finally joined from second day onwards. All through the conference, room 928 became in-famous for various reasons, and I had various roommates for company. In DebCamp days, we would get up to have breakfast and go back to sleep and get active only past lunch for hacking and helping in the hack lab for the day, followed by fun late night discussions and parties.
Nilesh, Chirag and Apple at DC23The team even managed to get a press conference arranged as well, and we got an opportunity to go to Press Club, Ernakulam. Sruthi and Jonathan gave the speech and answered questions from journalists. The event was covered by media as well due to this.
Ernakulam Press ClubEvery night, the team use to have 9 PM meetings for retrospection and planning for next day, which was always dotted with new problems. Every day, we used to hijack Silent Hacklab for the meeting and gently ask the only people there at the time to give us space.
DebConf, it itself is a well oiled machine. Network was brought up from scratch. Video team build the recording, audio mixing, live-streaming, editing and transcoding infrastructure was built on site. A gaming rig served as router and gateway. We got a dual internet connection, a 1 Gbps sponsored leased line from Kerala Vision and a paid backup 100 Mbps connection from a different provider. IPv6 was added through HE’s Tunnelbroker. Overall the network worked fine as additionally we had hotel Wi-Fi as well, so the conference network wasn’t stretched much. I must highlight, DebConf is my only conference where almost everything and every piece of software in developed in-house, for the conference and modified according to need on the fly. Even event recording cameras, audio check, direction, recording and editing is all done on in-house software by volunteers-attendees (in some cases remote ones as well) all trained on the sideline of the conference. The core recording and mixing equipment is owned by Debian and travels to each venue. The rest is sourced locally.
Gaming Rig which served as DC23 gateway routerIt was fun seeing how almost all the things were coordinates over text on IRC. If a talk/event was missing a talkmeister or a director or a camera person, a quick text on #debconf channel would be enough for someone to volunteer. Video team had a dedicated support channel for each conference venue for any issues and were quick to respond and fix stuff.
Network information. Screengrab from closing ceremonyIt rained for the initial days, which gave us a cool weather. Swag team had decided to hand out umbrella’s in swag kit which turned out to be quite useful. The swag kit was praised for quality and selection - many thanks to Anupa, Sruthi and others. It was fun wearing different color T-shirts, all designed by Abraham. Red for volunteers, light green for Video team, green for core-team i.e. staff, yellow for conference attendees.
With highvoltageWe were already acclimatized by the time DebConf really started as we had been talking, hacking and hanging out since last 7 days, but rush really started with the start of DebConf. More people joined on the first and second day of the conference. As has been the tradition, an opening talk was prepared by the Sruthi and local team (which I highly recommend getting more insights of the process). DebConf day 1 also saw Job fair, where Canonical and FOSSEE, IIT Bombay had stalls for community interactions, which judging by the crowd itself turned out to be quite a hit.
For me, association with DebConf (and Debian) started due to volunteering with video team, so anyhow I was going to continue doing that this conference as well. I usually volunteer for talks/events which anyhow I’m interested in. Handling the camera, talkmeister-ing and direction are fun activities, though I didn’t do sound this time around. Sound seemed difficult, and I didn’t want to spoil someone’s stream and recording. Talk attendance varied a lot, like in Bits from DPL talk, the hall was full but for some there were barely enough people to handle the volunteering tasks, but that’s what usually happens. DebConf is more of a place to come together and collaborate, so talk attendance is an afterthought sometimes.
Audience in highvoltage's Bits from DPL talkI didn’t submit any talk proposals this time around, as just being in the orga team was too much work already, and I knew, the talk preparation would get delayed to the last moment and I would have to rush through it.
Enrico's talkFrom Day 2 onward, more sponsor stalls were introduced in the hallway area. Hopbox by Unmukti , MostlyHarmless and Deeproot (joint stall) and FOSEE. MostlyHarmless stall had nice mechanical keyboards and other fun gadgets. Whenever I got the time, I would go and start typing racing to enjoy the nice, clicky keyboards.
As the DebConf tradition dictates, we had a Cheese and Wine party. Everyone brought in cheese and other delicacies from their region. Then there was yummy Sadya. Sadya is a traditional vegetarian Malayalis lunch served over banana leaves. There were loads of different dishes served, the names of most I couldn’t pronounce or recollect properly, but everything was super delicious.
Day four was day trip and I choose to go to Athirappilly Waterfalls and Jungle safari. Pictures would describe the beauty better than words. The journey was a bit long though.
Athirappilly FallsTea Gardens
Late that day, we heard the news of Abraham gone missing. We lost Abraham. He had worked really hard all through the years for Debian and making this conference. Talks were cancelled for the next day and Jonathan addressed everyone. We went to Abraham’s home the next day to meet his family. Team had arranged buses to Abraham’s place. It was an unfortunate moment that I only got an opportunity to visit his place after he was gone.
Days went by slowly after that. The last day marked by a small conference dinner. Some of the people had already left. All through the day and next, we kept saying goodbye to friends, with whom we spent almost a fortnight together.
Group photo with all DebConf T-shirts chronologicallyThis was 2nd trip to Kochi. Vistara Airway’s UK886 has become the default flight now. Almost learned how to travel in and around Kochi by Metro, Water Metro, Airport Shuttle and auto. Things are quite accessible in Kochi but metro is a bit expensive compared to Delhi. I left Kochi on 19th. My flight out was due to leave around 8 PM, so I had the whole day and nothing to do. A direct option would have taken less than 1 hour, but as I had time, I choose to take the long way to the airport. Took an auto rickshaw to Kakkanad Water Metro station. Took the water metro to Vyttila Water Metro station. Vyttila serves as intermobility hub which connects water metro, metro, bus at once place. I switched to Metro here at Vyttila Metro station till Aluva Metro station. Here, I had lunch and then boarded the Airport feeder bus to reach Kochi Airport. All in all, I did auto rickshaw > water metro > metro > feeder bus to reach Airport. I was fun and scenic. I must say, public transport and intermodal integration is quite good and once can transition seamlessly from one mode to next.
Kochi Water MetroScenes from Kochi Water Metro
DebConf23 served its purpose of getting existing Debian people together, as well as getting new people interested and contributing to Debian. People who came are still contributing to Debian, and that’s amazing.
Streaming video stats. Screengrab from closing ceremonyThe conference wasn’t without its fair share of trouble. There were multiple money transfer woes, and being in India didn’t help. Many thanks to multiple organizations who were proactive in helping out. On top of this, there was conference visa uncertainty and other issues which troubled visa team a lot.
Kudos to everyone who made this possible. Surely, I’m going to miss the name, so thank you for it, you know how much you have done to make this event possible.
Now, DebConf24 is scheduled for Busan, South Korea, and work is already in full swing. As usual, I’m helping with the fundraising part and plan to attended too. Let’s see if I can make it or not.
DebConf23 Group Photo. Click to enlarge.Credit - Aigars Mahinovs
In the end, we kept on saying, no DebConf at this scale would come back to India for the next 10 or 20 years. It’s too much trouble to be frank. It was probably the peak that we might not reach again. I would be happy to be proven wrong though :)
Talking Drupal: Talking Drupal #452 - Starshot & Experience Builder
Today we are talking about web design and development, from a group of people with one thing in common… We love Drupal. This is episode #452 Starshot & Experience Builder.
For show notes visit: www.talkingDrupal.com/452
Topics- What is Starshot
- What is Experience builder
- How will Starshot build on Drupal Core
- Will Experience builder be added to Core
- Listener thejimbirch:
- When will people hear about their pledge
- Listener brook_heaton:
- Will experience builder be compatible with layout builder
- Will Experience builder allow people to style content
- Listener Matthieu Scarset
- Who is Starshot trying to compete with
- Listener Andy Blum
- Does the DA or other major hosting companies plan to set up cheap, easy hosted Drupal
- Listener Ryan Szarma
- Who does this initiative serve in the business community
- How can people get involved
Lauri Eskola - lauriii
HostsNic Laflin - nLighteneddevelopment.com nicxvan John Picozzi - epam.com johnpicozzi Matthew Grasmick - grasmash
MOTW CorrespondentMartin Anderson-Clutz - mandclu.com mandclu
- Brief description:
- Have you ever wanted to have your modules create content when they’re installed? There’s a module for that.
- Module name/project name:
- Brief history
- How old: created in Oct 2015 by prolific contributor Lee Rowlands (larowlan) though the most recent releases are by Sascha Grossenbacher (Berdir), also a maintainer of many popular Drupal modules
- Versions available: 2.0.0-alpha2, which works with Drupal 9 and 10
- Maintainership
- Security coverage: opted in, but needs a stable release
- Test coverage
- Documentation
- Number of open issues: 105 open issues, 29 of which are bugs against the current branch
- Usage stats:
- Almost 20,000 sites
- Module features and usage
- Provides a way for modules to include default content, in the same way that many modules already include default configuration
- The module exports content as YAML files, and your module can specify the content that should be exported by listing the UUIDs in the info.yml file
- It also provides a number of drush commands, to export a single entity, to export an entity and all of its dependencies, or to bulk export all of the content referenced in a module’s .info.yml file
- There is also a companion project to export default content using an action within a view, which also makes me think it could probably be automated with something like ECA if you needed that
- Exported content should be kept in a content directory in your module, where it will imported during install on any site that has the default_content module installed
- I thought this would be a good module to cover today because Drupal core’s recipe system also includes support for default content, so when you install a recipe it will similarly import any YAML-encoded content in the recipe. In fact, I used this module for the first time exporting taxonomy terms I wanted a recipe to create as default values for a taxonomy it creates. Since Recipes will be a big part of Starshot, I expect default_content to be getting a lot of use in the coming months
ADCI Solutions: Field mapping when integrating Drupal with Salesforce
Real Python: How to Create Pivot Tables With pandas
A pivot table is a data analysis tool that allows you to take columns of raw data from a pandas DataFrame, summarize them, and then analyze the summary data to reveal its insights.
Pivot tables allow you to perform common aggregate statistical calculations such as sums, counts, averages, and so on. Often, the information a pivot table produces reveals trends and other observations your original raw data hides.
Pivot tables were originally implemented in early spreadsheet packages and are still a commonly used feature of the latest ones. They can also be found in modern database applications and in programming languages. In this tutorial, you’ll learn how to implement a pivot table in Python using pandas’ DataFrame.pivot_table() method.
Before you start, you should familiarize yourself with what a pandas DataFrame looks like and how you can create one. Knowing the difference between a DataFrame and a pandas Series will also prove useful.
In addition, you may want to use the data analysis tool Jupyter Notebook as you work through the examples in this tutorial. Alternatively, JupyterLab will give you an enhanced notebook experience, but feel free to use any Python environment you wish.
The other thing you’ll need for this tutorial is, of course, data. You’ll use the Sales Data Presentation - Dashboards data, which is freely available for you to use under the Apache 2.0 License. The data has been made available for you in the sales_data.csv file that you can download by clicking the link below.
Get Your Code: Click here to download the free sample code you’ll use to create a pivot table with pandas.
This table provides an explanation of the data you’ll use throughout this tutorial:
Column Name Data Type (PyArrow) Description order_number int64 Order number (unique) employee_id int64 Employee’s identifier (unique) employee_name string Employee’s full name job_title string Employee’s job title sales_region string Sales region employee works within order_date timestamp[ns] Date order was placed order_type string Type of order (Retail or Wholesale) customer_type string Type of customer (Business or Individual) customer_name string Customer’s full name customer_state string Customer’s state of residence product_category string Category of product (Bath Products, Gift Basket, Olive Oil) product_number string Product identifier (unique) product_name string Name of product quantity int64 Quantity ordered unit_price double Selling price of one product sale_price double Total sale price (unit_price × quantity)As you can see, the table stores data for a fictional set of orders. Each row contains information about a single order. You’ll become more familiar with the data as you work through the tutorial and try to solve the various challenge exercises contained within it.
Throughout this tutorial, you’ll use the pandas library to allow you to work with DataFrames and the newer PyArrow library. The PyArrow library provides pandas with its own optimized data types, which are faster and less memory-intensive than the traditional NumPy types pandas uses by default.
If you’re working at the command line, you can install both pandas and pyarrow using python -m pip install pandas pyarrow, perhaps within a virtual environment to avoid clashing with your existing environment. If you’re working within a Jupyter Notebook, you should use !python -m pip install pandas pyarrow. With the libraries in place, you can then read your data into a DataFrame:
Python >>> import pandas as pd >>> sales_data = pd.read_csv( ... "sales_data.csv", ... parse_dates=["order_date"], ... dayfirst=True, ... ).convert_dtypes(dtype_backend="pyarrow") Copied!First of all, you used import pandas to make the library available within your code. To construct the DataFrame and read it into the sales_data variable, you used pandas’ read_csv() function. The first parameter refers to the file being read, while parse_dates highlights that the order_date column’s data is intended to be read as the datetime64[ns] type. But there’s an issue that will prevent this from happening.
In your source file, the order dates are in dd/mm/yyyy format, so to tell read_csv() that the first part of each date represents a day, you also set the dayfirst parameter to True. This allows read_csv() to now read the order dates as datetime64[ns] types.
With order dates successfully read as datetime64[ns] types, the .convert_dtypes() method can then successfully convert them to a timestamp[ns][pyarrow] data type, and not the more general string[pyarrow] type it would have otherwise done. Although this may seem a bit circuitous, your efforts will allow you to analyze data by date should you need to do this.
If you want to take a look at the data, you can run sales_data.head(2). This will let you see the first two rows of your dataframe. When using .head(), it’s preferable to do so in a Jupyter Notebook because all of the columns are shown. Many Python REPLs show only the first and last few columns unless you use pd.set_option("display.max_columns", None) before you run .head().
If you want to verify that PyArrow types are being used, sales_data.dtypes will confirm it for you. As you’ll see, each data type contains [pyarrow] in its name.
Note: If you’re experienced in data analysis, you’re no doubt aware of the need for data cleansing. This is still important as you work with pivot tables, but it’s equally important to make sure your input data is also tidy.
Tidy data is organized as follows:
- Each row should contain a single record or observation.
- Each column should contain a single observable or variable.
- Each cell should contain an atomic value.
If you tidy your data in this way, as part of your data cleansing, you’ll also be able to analyze it better. For example, rather than store address details in a single address field, it’s usually better to split it down into house_number, street_name, city, and country component fields. This allows you to analyze it by individual streets, cities, or countries more easily.
In addition, you’ll also be able to use the data from individual columns more readily in calculations. For example, if you had columns room_length and room_width, they can be multiplied together to give you room area information. If both values are stored together in a single column in a format such as "10 x 5", the calculation becomes more awkward.
The data within the sales_data.csv file is already in a suitably clean and tidy format for you to use in this tutorial. However, not all raw data you acquire will be.
It’s now time to create your first pandas pivot table with Python. To do this, first you’ll learn the basics of using the DataFrame’s .pivot_table() method.
Get Your Code: Click here to download the free sample code you’ll use to create a pivot table with pandas.
Take the Quiz: Test your knowledge with our interactive “How to Create Pivot Tables With pandas” quiz. You’ll receive a score upon completion to help you track your learning progress:
Interactive Quiz
How to Create Pivot Tables With pandasThis quiz is designed to push your knowledge of pivot tables a little bit further. You won't find all the answers by reading the tutorial, so you'll need to do some investigating on your own. By finding all the answers, you're sure to learn some other interesting things along the way.
How to Create Your First Pivot Table With pandasNow that your learning journey is underway, it’s time to progress toward your first learning milestone and complete the following task:
Calculate the total sales for each type of order for each region.
Read the full article at https://realpython.com/how-to-pandas-pivot-table/ »[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]