Feeds
Darren Oh: How Drupal Forge began
parallel @ Savannah: GNU Parallel 20240822 ('Southport') released
GNU Parallel 20240822 ('Southport') has been released. It is available for download at: lbry://@GnuParallel:4
Quote of the month:
honestly the coolest software i've ever seen gotta be gnu parallel or
ffmpeg, nothing like them
-- @scootykins scoot
New in this release:
- --match Match input source with regexp to set replacement fields.
- {:%fmt} Use printf formatting of replacement strings.
- Bug fixes and man page updates.
News about GNU Parallel:
- Powerful GNU parallel, more than a loop https://www.linkedin.com/pulse/powerful-gnu-parallel-more-than-loop-zhenguo-zhang-18dxc
- How To Increase File Transfer Speed Using Parallel Rsync? https://contentbase.com/blog/increase-file-transfer-speed-parallel-rsync/
- Converting WebP Images to PNG Using parallel and dwebp https://bytefreaks.net/2024/07/27
- Turbocharging the Box CLI with GNU Parallel https://medium.com/box-developer-blog/turbocharging-the-box-cli-with-gnu-parallel-ee44c48811c0
GNU Parallel - For people who live life in the parallel lane.
If you like GNU Parallel record a video testimonial: Say who you are, what you use GNU Parallel for, how it helps you, and what you like most about it. Include a command that uses GNU Parallel if you feel like it.
GNU Parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. The typical input is a list of files, a list of hosts, a list of users, a list of URLs, or a list of tables. A job can also be a command that reads from a pipe. GNU Parallel can then split the input and pipe it into commands in parallel.
If you use xargs and tee today you will find GNU Parallel very easy to use as GNU Parallel is written to have the same options as xargs. If you write loops in shell, you will find GNU Parallel may be able to replace most of the loops and make them run faster by running several jobs in parallel. GNU Parallel can even replace nested loops.
GNU Parallel makes sure output from the commands is the same output as you would get had you run the commands sequentially. This makes it possible to use output from GNU Parallel as input for other programs.
For example you can run this to convert all jpeg files into png and gif files and have a progress bar:
parallel --bar convert {1} {1.}.{2} ::: *.jpg ::: png gif
Or you can generate big, medium, and small thumbnails of all jpeg files in sub dirs:
find . -name '*.jpg' |
parallel convert -geometry {2} {1} {1//}/thumb{2}_{1/} :::: - ::: 50 100 200
You can find more about GNU Parallel at: http://www.gnu.org/s/parallel/
You can install GNU Parallel in just 10 seconds with:
$ (wget -O - pi.dk/3 || lynx -source pi.dk/3 || curl pi.dk/3/ || \
fetch -o - http://pi.dk/3 ) > install.sh
$ sha1sum install.sh | grep 883c667e01eed62f975ad28b6d50e22a
12345678 883c667e 01eed62f 975ad28b 6d50e22a
$ md5sum install.sh | grep cc21b4c943fd03e93ae1ae49e28573c0
cc21b4c9 43fd03e9 3ae1ae49 e28573c0
$ sha512sum install.sh | grep ec113b49a54e705f86d51e784ebced224fdff3f52
79945d9d 250b42a4 2067bb00 99da012e c113b49a 54e705f8 6d51e784 ebced224
fdff3f52 ca588d64 e75f6033 61bd543f d631f592 2f87ceb2 ab034149 6df84a35
$ bash install.sh
Watch the intro video on http://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Walk through the tutorial (man parallel_tutorial). Your command line will love you for it.
When using programs that use GNU Parallel to process data for publication please cite:
O. Tange (2018): GNU Parallel 2018, March 2018, https://doi.org/10.5281/zenodo.1146014.
If you like GNU Parallel:
- Give a demo at your local user group/team/colleagues
- Post the intro videos on Reddit/Diaspora*/forums/blogs/ Identi.ca/Google+/Twitter/Facebook/Linkedin/mailing lists
- Get the merchandise https://gnuparallel.threadless.com/designs/gnu-parallel
- Request or write a review for your favourite blog or magazine
- Request or build a package for your favourite distribution (if it is not already there)
- Invite me for your next conference
If you use programs that use GNU Parallel for research:
- Please cite GNU Parallel in you publications (use --citation)
If GNU Parallel saves you money:
- (Have your company) donate to FSF https://my.fsf.org/donate/
GNU sql aims to give a simple, unified interface for accessing databases through all the different databases' command line clients. So far the focus has been on giving a common way to specify login information (protocol, username, password, hostname, and port number), size (database and table size), and running queries.
The database is addressed using a DBURL. If commands are left out you will get that database's interactive shell.
When using GNU SQL for a publication please cite:
O. Tange (2011): GNU SQL - A Command Line Tool for Accessing Different Databases Using DBURLs, ;login: The USENIX Magazine, April 2011:29-32.
GNU niceload slows down a program when the computer load average (or other system activity) is above a certain limit. When the limit is reached the program will be suspended for some time. If the limit is a soft limit the program will be allowed to run for short amounts of time before being suspended again. If the limit is a hard limit the program will only be allowed to run when the system is below the limit.
DrupalEasy: DrupalEasy Podcast S17E2 - Janez Urevc - Gander
We talk with Janez Urevc from Tag1 Consulting about Gander, an open-source automated performance testing framework.
URLs mentioned- Gander (includes many helpful links!)
- 40-minute video presentation introducing Gander (including a demo)
- Drupal.org docs page for performance testing with Gander
- https://gander.tag1.io - dashboard that Tag1 hosts for the Drupal community
- Two-hour Gander workshop at DrupalCon Barcelona on September 24, 2024
Professional module development - 15 weeks, 90 hours, live, online course.
Drupal Career Online - 12 weeks, 77 hours, live online, beginner-focused course.
We're using the machine-driven Amazon Transcribe service to provide an audio transcript of this episode.
SubscribeSubscribe to our podcast on iTunes, iHeart, Amazon, YouTube, or Spotify.
If you'd like to leave us a voicemail, call 321-396-2340. Please keep in mind that we might play your voicemail during one of our future podcasts. Feel free to call in with suggestions, rants, questions, or corrections. If you'd rather just send us an email, please use our contact page.
CreditsPodcast edited by Amelia Anello.
Jonathan Dowland: Fediverse and feeds
It's clear that Twitter has been circling the drain for years, but things have been especially bad in recent times. I haven't quit (I have some sympathy with the viewpoint don't cede territory to fascists) but I try to read it much less, and I certainly post much less.
Especially at the moment, I really appreciate distractions.
Last time I wrote about Mastodon (by which I meant the Fediverse1), I was looking for a new instance to try. I settled on Debian's social instance2. I'm now trying to put any energy I might spend engaging on Twitter, into engaging in the Fediverse instead. (You can follow me via the handle @jon@dow.land, I think, which should repoint to my actual handle, @jmtd@pleroma.debian.social.)
There are other potential successors to Twitter: two big ones are Bluesky and Facebook-owned Threads. They are effectively cookie-cutter copies of the Twitter model, and so, we will repeat the same mistakes there. Sadly I see the majority of communities and sub-cultures I follow are migrating to one or the other of these.
The Fediverse (or the Mastodon-ish bits of it) should avoid the fate of Twitter. JWZ puts it better and more succinctly than I can.
The Fedi experience is, sadly, pretty clunky. So I want to try and write a bit from time to time with tips and tricks that might improve people's experiences.
First up, something I discovered only today about Mastodon instances. As JWZ noted, If you are worried about picking the "right" Mastodon instance, don't. Just spin the wheel.. You can spend too much time trying to guess a good answer to this. Better to just get started.
At the same time, individual instances are supposed to cater to specific niches. So it could be useful to sample the public posts from an entire instance. For example, to find people to follow, or decide to hop over to that instance yourself. You can't (I think) follow an entire instance from within yours, but, they usually have a public page which shows you the latest traffic.
For example, the infosec-themed instance infosec.exchange has one here: https://infosec.exchange/public/local
These pages don't provide RSS or Atom feeds3, sadly. I hope that's on the software's roadmap, and hasn't been spurned for ideological reasons. For now at least, OpenRSS provide RSS/Atom feeds for many Mastodon instances. For example, an RSS/Atom feed of the above: https://openrss.org/infosec.exchange/public/local
One can add these feeds to your Feed reader and over time get a flavour for the kind of discourse that takes place on given instances.
I think the OpenRSS have to manually add Mastodon instances to their service. I tried three instances and only one (infosec.exchange) worked. I'm not sure but I think trying an instance that doesn't work automatically puts it on OpenRSS's backlog.
- the Fediverse-versus-Mastodon nomenclature problem is just the the tip of the iceberg, in terms of adoption problems. Mastodon provides a twitter-like service that participates in the Fediverse. But it isn't correct to call the twitter-like service "Mastodon" because other softwares also participate in/provide that service. And it's not correct to call it "Fediverse" because that describes a bigger thing, with e.g. youtube clones also taking part. I'm not sure what the right term should be for "the twitter-like thing". Also, everything I wrote here is probably subtly wrong.↩
- Debian's instance actually runs Pleroma, an alternative to Mastodon. Why should it matter? I think it's healthy for there to be more than one implementation in an open ecosystem. However the experience can be janky, as the features don't perfectly align, some Mastodon features/APIs are not documented/standardised/etc.↩
- I have to remind myself that the concept of RSS/Atom feeds and Feedreaders might need explaining to a modern audience too. Perhaps in another blog post.↩
Twin Cities Drupal Camp: Introducing Lightning Talks
We've added a fun new event to our conference this year – Lightning Talks!
Some of you may be familiar with the concept, but we'll explain it here, as well as how we're planning to do it. (Note that this an in-person event for registered attendees only.)
What are Lightning TalksA lightning talk is a very short presentation lasting only a few minutes. Each speaker gets a maximum of 5 minutes to present on a topic of your choice. You will have access to the projector for slides.
Because these are very short and fast presentations (thus the "lightning" part), it's meant to be brief, snappy, and fun. We'll rotate quickly between speakers to keep things moving and entertaining.
These are not the same as the 45 minute sessions held during the rest of Camp. Speakers will need to focus on a single message or just a few quick key points.
Speakers can talk about serious and technical topics, or they can do something lighthearted and silly too. This is always a fun event!
When Will They Be Held?We're going to hold the lightning talks between 3-4pm on the first day of Camp (Thu, Sep 12), in the West Wing (the big room). Right before happy hour!
How Can I Participate?We've made a form to gather signups ahead of time. We'll only have time for 10-12 presenters, so we may not be able to include every submission. Sign up today!
Do I Have to Participate?Talk of group presentations may be triggering for some people, but don't worry! No one is required to participate. Having you in the audience is all we ask.
Additional Resources- What are Lightning Talks by the University of North Carolina
Python Anywhere: Belated announcement of latest updates
Here is a slightly delayed (and short) run-down of the new stuff that we deployed recently.
The main change for this update is that we have updated the underlying OS running PythonAnywhere to Ubuntu 22.04. This is an LTS release so it will be supported for some time to come. This will not affect user environments, but it is setting us up for a new user environment that should be coming soon.
We have also:
- Started the process of updating our file servers to be more robust
- Improved our alerting so that we are alerted to many new forms of failure on PythonAnywhere
- Made some improvements to the ASGI beta systems and their documentation
- Fixed a number of security issues
- Fixed various bugs
Real Python: Primer on Jinja Templating
Templates are an essential ingredient in full-stack web development. With Jinja, you can build rich templates that power the front end of your Python web applications.
But you don’t need to use a web framework to experience the capabilities of Jinja. When you want to create text files with programmatic content, Jinja can help you out.
In this tutorial, you’ll learn how to:
- Install the Jinja template engine
- Create your first Jinja template
- Render a Jinja template in Flask
- Use for loops and conditional statements with Jinja
- Nest Jinja templates
- Modify variables in Jinja with filters
- Use macros to add functionality to your front end
You’ll start by using Jinja on its own to cover the basics of Jinja templating. Later you’ll build a basic Flask web project with two pages and a navigation bar to leverage the full potential of Jinja.
Throughout the tutorial, you’ll build an example app that showcases some of Jinja’s wide range of features. To see what it’ll do, skip ahead to the final section.
You can also find the full source code of the web project by clicking on the link below:
Source Code: Click here to download the source code that you’ll use to explore Jinja’s capabilities.
Take the Quiz: Test your knowledge with our interactive “Primer on Jinja Templating” quiz. You’ll receive a score upon completion to help you track your learning progress:
Interactive Quiz
Primer on Jinja TemplatingIn this quiz, you'll test your understanding of Jinja templating. Jinja is a powerful tool for building rich templates in Python web applications, and it can also be used to create text files with programmatic content.
This tutorial is for you if you want to learn more about the Jinja template language or if you’re getting started with Flask.
Get Started With JinjaJinja is not only a city in the Eastern Region of Uganda and a Japanese temple, but also a template engine. You commonly use template engines for web templates that receive dynamic content from the back end and render it as a static page in the front end.
But you can use Jinja without a web framework running in the background. That’s exactly what you’ll do in this section. Specifically, you’ll install Jinja and build your first templates.
Install JinjaBefore exploring any new package, it’s a good idea to create and activate a virtual environment. That way, you’re installing any project dependencies in your project’s virtual environment instead of system-wide.
Select your operating system below and use your platform-specific command to set up a virtual environment:
Windows PowerShell PS> python -m venv venv PS> .\venv\Scripts\activate (venv) PS> Copied! Shell $ python -m venv venv $ source venv/bin/activate (venv) $ Copied!With the above commands, you create and activate a virtual environment named venv by using Python’s built-in venv module. The parentheses (()) surrounding venv in front of the prompt indicate that you’ve successfully activated the virtual environment.
After you’ve created and activated your virtual environment, it’s time to install Jinja with pip:
Shell (venv) $ python -m pip install Jinja2 Copied!Don’t forget the 2 at the end of the package name. Otherwise, you’ll install an old version that isn’t compatible with Python 3.
It’s worth noting that although the current major version is actually greater than 2, the package that you’ll install is nevertheless called Jinja2. You can verify that you’ve installed a modern version of Jinja by running pip list:
Shell (venv) $ python -m pip list Package Version ---------- ------- Jinja2 3.x ... Copied!To make things even more confusing, after installing Jinja with an uppercase J, you have to import it with a lowercase j in Python. Try it out by opening the interactive Python interpreter and running the following commands:
Read the full article at https://realpython.com/primer-on-jinja-templating/ »[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
Drupal Association blog: Why I'm a Ripplemaker.. by Nikki Flores
I first learned Drupal in 2008 and it was the backbone of one of my consulting service's very first sites. We used the Quiz module and built online certifications based on scoring. This feature was a huge hit for that client and carried them forward, such that they were able to get acquired and eventually retire.
Implementing sites on Drupal carried my partner and I through a decade of consulting as we built out websites for national and international public agencies, nonprofits, membership organizations, and e-commerce. Drupal fed my family through a variety of projects, paid for my healthcare, tuition, and our housing.
In my current role at Lullabot as an employee-owner, Drupal has continued to evolve. We use Drupal for our enterprise projects, for state and local governments as well as education and publishing clients: my current client, a state government, is converting their agencies into a unified platform.
From the community side, I organized the first DrupalCamp in Hawaii and am nearing the end of my elected term as a community board member for the Drupal Association, and continue to speak on panels, coordinate, organize, and present at DrupalCon and other events.
There is nothing that I do that is special: anyone else in the Drupal community can, and is welcome, to contribute, connect, and engage to the level that they have energy to do so, I recognize how fortunate I am to work at a company that invests in Drupal and also supports "internal time" so we continue to learn, grow, and develop our skills.
Because I have gained so much from Drupal, I'm very happy to encourage you to join me as a Ripple Maker—making a monthly contribution, and in my case, making an allocation to the Drupal Association as a beneficiary from my estate, helps me know that the core values of our community: collaboration, questioning and commenting, making items incrementally better, and continuing to encourage the next generation, will last.
Nikki Flores
Senior Technical Project Manager
Lullabot
Real Python: Quiz: Primer on Jinja Templating
In this quiz, you’ll test your understanding of Jinja templating. Jinja is a powerful tool for building rich templates in Python web applications, and it can also be used to create text files with programmatic content.
[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
Droptica: 10 SEO Features a Modern CMS Should Have. Using Drupal as an Example
In this blog post, I'll introduce ten SEO features that every modern CMS should have and show you how easy it is to implement them in Drupal. So, if you have an existing website, you'll easily see what you're missing. And if you're just planning to build a new one, you'll get a ready-made list of features to copy to your web specifications and requirements. I invite you to read the article or watch an episode of the “Nowoczesny Drupal” series (the video is in Polish).
Drupal.org blog: GitLab CI templates will make Drupal 11 the default version to run
Whenever a new major version of Drupal is released, we update Drupal's GitLab CI testing templates to automatically update the versions being tested. Here's an outline of our plan:
Where we are nowDrupal 11 was released on August 6th. You can learn more about it on the Drupal 11 landing page.
This means that we are in the middle of a transition period where many sites and modules will want to be in Drupal 11, whereas some others might still want to stay in Drupal 10.
From a GitLab CI point of view, testing for both Drupal 10 and 11 simultaneously has been available for months, providing module maintainers with a great tool to test their code before Drupal 11 was launched.
This was available by setting one variable in the .gitlab-ci.yml like this:
variables: OPT_IN_TEST_NEXT_MAJOR: 1Many maintainers have leveraged this already and we can see many modules already claiming full Drupal 11 support within days of the release. To be more specific, as of August 20th, 2870 projects have no compatibility errors anymore, and 1720 have made Drupal 11 compatible releases.
Where we want to beWe are preparing to update the default testing configuration for the GitLab CI templates, but we want to make sure to continue to support maintainers who still need to test against Drupal 10 and 11. We've outlined the changes we'll be making and the timeline below.
As of today:
- Current version (default) is Drupal 10
- Next major version is Drupal 11
- Previous major version is Drupal 9
When we do the shift, this will change to:
- Current version (default) is Drupal 11
- Next major version will be Drupal 12 (when development starts) - see note below.
- Previous major version is Drupal 10
For modules that were testing Drupal 10 and Drupal 11 simultaneously, the change will be as easy as this:
variables: # OPT_IN_TEST_NEXT_MAJOR: 1 OPT_IN_TEST_PREVIOUS_MAJOR: 1Instead of opting in to test the next major, all you need to do is opt into the previous major.
Note: Drupal 12 development branch does not exist yet. Enabling this version might not do anything until this branch is created.
StepsWe are actively working on making the above switch in this issue: Update templates so 11.0 is the default/current branch.
We are going to be taking the following steps in the coming days / weeks.
Step 1: Make all modules start testing against Drupal 11We will set the default value for OPT_IN_TEST_NEXT_MAJOR to 1 temporarily, and release version 1.5.6 of the templates. This will automatically become the default for all Contrib.
Modules that have not yet tested their code against Drupal 11 will now see "Next Major" test jobs in their pipelines, in addition to the "current" Drupal 10 variant. These new jobs have allow_failure: true, so the overall result of the pipelines should not change. This should show a good sense of where the module is at in relation to Drupal 11. Maintainers can still override the variable to be 0 if they don't want this behavior.
The expected date for this change is: August 26th, 2024 (next Monday)
Step 2: Roll out the shift and make it available for ContribWhen the issue Update templates so 11.0 is the default/current branch and all its dependencies all sorted, we will deploy the changes and create a new release 1.6.0. This will be available to Contrib projects using "gitlab ref" main or 1.x-latest
The expected date for this change is: September 5th, 2024 (2 weeks from now)
Step 3: Make the shift default for all ContribThen we will make this new release be the default for all contrib projects automatically.
However, we have provided several alternatives for modules that don't want to do the shift at this point. Any of the following can be used:
- You can pin the version of the templates for your module to 1.5.6. This is the latest version released before the switch. Learn more about pinning the templates version in this page. Note that this means you will not get any updates to the templates for new features or bug fixes, until you un-pin the release.
- You can set OPT_IN_TEST_PREVIOUS_MAJOR to 1 and OPT_IN_TEST_CURRENT to 0 to continue testing Drupal 10 and not Drupal 11.
- You can configure your own variants as described on this page.
- You can tweak the key variables used when creating variants so they have the versions that you desire. Check the above link for that information.
For those wanting to do the shift, you will not need to do anything at all.
The expected date for this change is: September 12th, 2024 (3 weeks from now)
After the shift is madeOnwards and upwards, that means that Drupal 11 is the default version to be tested for all new issues, merge requests, and pipelines for all contrib projects, allowing us to keep the Drupal ecosystem up to date and relevant.
There are some issues that are not blockers for this change, but are related, so we encourage you to see the issue list before reporting anything new, but otherwise create a new issue if you discover a problem and don't find it in the queue.
PyCharm: How to Build Chatbots With LangChain
This is a guest post from Dido Grigorov, a deep learning engineer and Python programmer with 17 years of experience in the field.
Chatbots have evolved far beyond simple question-and-answer tools. With the power of large language models (LLMs), they can understand the context of conversations and generate human-like responses, making them invaluable for customer support applications and other types of virtual assistance.
LangChain, an open-source framework, streamlines the process of building these conversational chatbots by providing tools for seamless model integration, context management, and prompt engineering.
In this blog post, we’ll explore how LangChain works and how chatbots interact with LLMs. We’ll also guide you step by step through building a context-aware chatbot that delivers accurate, relevant responses using LangChain and GPT-3.
What are the chatbots in the realm of LLMs?Chatbots in the field of LLMs are cutting-edge software that simulate human-like conversations with users through text or voice interfaces. These chatbots exploit the advanced capabilities of LLMs, which are neural networks trained on huge amounts of text data which allows them to produce human-like responses to a wide range of input prompts.
One among all other matters is that LLM-based chatbots can take a conversation’s context into account when generating a response. This means they can keep coherence across several exchanges and can process complex queries to produce outputs that are in line with the users’ intentions. Additionally, these chatbots assess the emotional tone of a user’s input and adjust their responses to match the user’s sentiments.
Chatbots are highly adaptable and personalized. They learn from how users interact with them thus improving on their responses by adjusting them according to individual preferences and needs.
What is LangChain?LangChain is a framework that’s open-source developed for creating apps that use large language models (LLMs). It comes with tools and abstractions to better personalize the information produced from these models while maintaining accuracy and relevance.
One common term you can see when you read about LLMs is “prompt chains”. A prompt chain refers to a sequence of prompts or instructions used in the context of artificial intelligence and machine learning, with the purpose to guide the AI model through a multi-step process to generate more accurate, detailed, or refined outputs. This method can be employed for various tasks, such as writing, problem-solving, or generating code.
Developers can create new prompt chains using LangChain, which is one of the strongest sides of the framework. They can even modify existing prompt templates without needing to train the model again when using new datasets.
How does LangChain work?LangChain is a framework designed to simplify the development of applications that utilize language models. It offers a suite of tools that help developers efficiently build and manage applications that involve natural language processing (NLP) and Large Language Models. By defining the steps needed to achieve the desired outcome (this might be a chatbot, task automation, virtual assistant, customer support, and even more), developers can adapt language models flexibly to specific business contexts using LangChain.
Here’s a high-level overview of how LangChain works.
Model integrationLangChain supports various Language models including those from OpenAI, Hugging Face, Cohere, Anyscale, Azure Models, Databricks, Ollama, Llama, GPT4All, Spacy, Pinecone, AWS Bedrock, MistralAI, among others. Developers can easily switch between different models or use multiple models in one application. They can build custom-developed model integration solutions, which allow developers to take advantage of specific capabilities tailored to their specific applications.
ChainsThe core concept of LangChain is chains, which bring together different AI components for context-aware responses. A chain represents a set of automated actions between a user prompt and the final model output. There are two types of chains provided by LangChain:
- Sequential chains: These chains enable the output of a model or function to be used as an input for another one. This is particularly helpful in making multi-step processes that depend on each other.
- Parallel chains: It allows for simultaneous running of multiple tasks, with their outputs merged at the end. This makes it perfect for doing tasks that can be divided into subtasks that are completely independent.
LangChain facilitates the storage and retrieval of information across various interactions. This is essential where there is need for persistence of context such as with chat-bots or interactive agents. There are also two types of memory provided:
- Short-term memory – Helps keep track of recent sessions.
- Long-term memory – Allows retention of information from previous sessions enhancing system recall capability on past chats and user preferences.
LangChain provides many tools, but the most used ones are Prompt Engineering, Data Loaders and Evaluators. When it comes to Prompt Engineering, LangChain contains utilities to develop good prompts, which are very important in getting the best responses from language models.
If you want to load up files like csv, pdf or other format, Data Loaders are here to help you to load and pre-process different types of data hence making them usable in model interactions.
Evaluation is an essential part of working with machine learning models and large language models. That’s why LangChain provides Evaluators – tools used for testing language models and chains so that generated results meet the required criteria, which might include:
Datasets criteria:
- Manually curated examples: Start with high-quality, diverse inputs.
- Historical logs: Use real user data and feedback.
- Synthetic data: Generate examples based on initial data.
Types of evaluations:
- Human: Manual scoring and feedback.
- Heuristic: Rule-based functions, both reference-free and reference-based.
- LLM-as-judge: LLMs score outputs based on encoded criteria.
- Pairwise: Compare two outputs to pick the better one.
Application evaluations:
- Unit tests: Quick, heuristic-based checks.
- Regression testing: Measure performance changes over time.
- Back-testing: Re-run production data on new versions.
- Online evaluation: Evaluate in real-time, often for guardrails and classifications.
Agents
LangChain agents are essentially autonomous entities that leverage LLMs to interact with users, perform tasks, and make decisions based on natural language inputs.
Action-driven agents use language models to decide on optimal actions for predefined tasks. On the other side interactive agents or interactive applications such as chatbots make use of these agents, which also take into account user input and stored memory when responding to queries.
How do chatbots work with LLMs?LLMs underlying chatbots use Natural Language Understanding (NLU) and Natural Language Generation (NLG), which are made possible through pre-training of models on vast textual data.
Natural Language Understanding (NLU)- Context awareness: LLMs can understand the subtlety and allusions in a conversation, and they can keep track of the conversation from one turn to the next. This makes it possible for the chatbots to generate logical and contextually appropriate responses to the clients.
- Intent recognition: These models should be capable of understanding the user’s intent from their queries, whether the language is very specific or quite general. They can discern what the user wants to achieve and determine the best way to help them reach that goal.
- Sentiment analysis: Chatbots can determine the emotion of the user through the tone of language used and adapt to the user’s emotional state, which increases the engagement of the user.
- Response generation: When LLMs are asked questions, the responses they provide are correct both in terms of grammar and the context. This is because the responses that are produced by these models mimic human communication, due to the training of the models on vast amounts of natural language textual data.
- Creativity and flexibility: Apart from simple answers, LLM-based chatbots can tell a story, create a poem, or provide a detailed description of a specific technical issue and, therefore, can be considered to be very flexible in terms of the provided material.
- Learning from interactions: Chatbots make the interaction personalized because they have the ability to learn from the users’ behavior, as well as from their choices. It can be said that it is constantly learning, thereby making the chatbot more effective and precise in answering questions.
- Adaptation to different domains: The LLMs can be tuned to particular areas or specialties that allow the chatbots to perform as subject matter experts in customer relations, technical support, or the healthcare domain.
LLMs are capable of understanding and generating text in multiple languages, making them suitable for applications in diverse linguistic contexts.
Building your own chatbot with LangChain in five stepsThis project aims to build a chatbot that leverages GPT-3 to search for answers within documents. First, we scrape content from online articles, split them into small chunks, compute their embeddings, and store them in Deep Lake. Then, we use a user query to retrieve the most relevant chunks from Deep Lake, which are incorporated into a prompt for generating the final answer with the LLM.
It’s important to note that using LLMs carries a risk of generating hallucinations or false information. While this may be unacceptable for many customer support scenarios, the chatbot can still be valuable for assisting operators in drafting answers that they can verify before sending to users.
Next, we’ll explore how to manage conversations with GPT-3 and provide examples to demonstrate the effectiveness of this workflow
Step 1: Project creation, prerequisites, and required library installationFirst create your PyCharm project for the chatbot. Open up Pycharm and click on “new project”. Then give a name of your project.
Once ready with the project set up, generate your `OPENAI_API_KEY` on the OpenAI API Platform Website, once you are logged in (or sign up on the OpenAI website for that purpose). To do that go to the “API Keys” section on the left navigation menu and then click on the button “+Create new secret key”. Don’t forget to copy your key.
After that get your `ACTIVELOOP_TOKEN` by signing up on the Activeloop website. Once logged in, just click on the button “Create API Token” and you’ll be navigated to the token creation page. Copy this token as well.
Once you have both the token and the key, open your configuration settings in PyCharm, by clicking on the 3 dots button next to the run and debug buttons, and choose “Edit”. You should see the following window:
Now locate the field “Environment variables” and find the icon on the right side of the field. Then click there – you’ll see the following window:
And now by clicking the + button start adding your environmental variables and be careful with their names. They should be the same as mentioned above: `OPENAI_API_KEY` and `ACTIVELOOP_TOKEN`. When ready just click OK on the first window and then “Apply” and “OK” on the second one.
That’s a very big advantage of PyCharm and I very much love it, because it handles the environment variables for us automatically without the requirement for additional calls to them, allowing us to think more about the creative part of the code.
Note: ActiveLoop is a technology company that focuses on developing data infrastructure and tools for machine learning and artificial intelligence. The company aims to streamline the process of managing, storing, and processing large-scale datasets, particularly for deep learning and other AI applications.
DeepLake is an ActiveLoop’s flagship product. It provides efficient data storage, management, and access capabilities, optimized for large-scale datasets often used in AI.
Install the required librariesWe’ll use the `SeleniumURLLoader` class from LangChain, which relies on the `unstructured` and `selenium` Python libraries. Install these using pip. It is recommended to install the latest version, although the code has been specifically tested with version 0.7.7.
To do that use the following command in your PyCharm terminal:
pip install unstructured seleniumNow we need to install langchain, deeplake and openai. To do that just use this command in your terminal (same window you used for Selenium) and wait a bit until everything is successfully installed:
pip install langchain==0.0.208 deeplake openai==0.27.8 psutil tiktokenTo make sure all libraries are properly installed, just add the following lines needed for our chatbot app and click on the Run button:
from langchain.embeddings.openai import OpenAIEmbeddings from langchain.vectorstores import DeepLake from langchain.text_splitter import CharacterTextSplitter from langchain import OpenAI from langchain.document_loaders import SeleniumURLLoader from langchain import PromptTemplateAnother way to install your libraries is through the settings of PyCharm. Open them and go to the section Project -> Project Interpreter. Then locate the + button, search for your package and hit the button “Install Package”. Once ready, close it, and on the next window click “Apply” and then “OK”.
Step 2: Splitting content into chunks and computing their embeddingsAs previously mentioned, our chatbot will “communicate” with content coming out of online articles, that’s why I picked Digitaltrends.com as my source of data and selected 8 articles to start. All of them are organized into a Python list and assigned to a variable called “articles”.
articles = ['https://www.digitaltrends.com/computing/claude-sonnet-vs-gpt-4o-comparison/', 'https://www.digitaltrends.com/computing/apple-intelligence-proves-that-macbooks-need-something-more/', 'https://www.digitaltrends.com/computing/how-to-use-openai-chatgpt-text-generation-chatbot/', 'https://www.digitaltrends.com/computing/character-ai-how-to-use/', 'https://www.digitaltrends.com/computing/how-to-upload-pdf-to-chatgpt/']We load the documents from the provided URLs and split them into chunks using the `CharacterTextSplitter` with a chunk size of 1000 and no overlap:
# Use the selenium to load the documents loader = SeleniumURLLoader(urls=articles) docs_not_splitted = loader.load() # Split the documents into smaller chunks text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0) docs = text_splitter.split_documents(docs_not_splitted)If you run the code till now you should receive the following output, if everything works well:
[Document(page_content="techcrunch\n\ntechcrunch\n\nWe, TechCrunch, are part of the Yahoo family of brandsThe sites and apps that we own and operate, including Yahoo and AOL, and our digital advertising service, Yahoo Advertising.Yahoo family of brands.\n\n When you use our sites and apps, we use \n\nCookiesCookies (including similar technologies such as web storage) allow the operators of websites and apps to store and read information from your device. Learn more in our cookie policy.cookies to:\n\nprovide our sites and apps to you\n\nauthenticate users, apply security measures, and prevent spam and abuse, and\n\nmeasure your use of our sites and apps\n\n If you click '", metadata={'source': ……………]
Next, we generate the embeddings using OpenAIEmbeddings and save them in a DeepLake vector store hosted in the cloud. Ideally, in a production environment, we could upload an entire website or course lesson to a DeepLake dataset, enabling searches across thousands or even millions of documents.
By leveraging a serverless Deep Lake dataset in the cloud, applications from various locations can seamlessly access a centralized dataset without the necessity of setting up a vector store on a dedicated machine.
Why do we need embeddings and documents in chunks?When building chatbots with Langchain, embeddings and chunking documents are essential for several reasons that relate to the efficiency, accuracy, and performance of the chatbot.
Embeddings are vector representations of text (words, sentences, paragraphs, or documents) that capture semantic meaning. They encapsulate the context and meaning of words in a numerical form. This allows the chatbot to understand and generate responses that are contextually appropriate by capturing nuances, synonyms, and relationships between words.
Thanks to the embeddings, the chatbot can also quickly identify and retrieve the most relevant responses or information from a knowledge base, because they allow matching user queries with the most semantically relevant chunks of information, even if the wording differs.
Chunking, on the other side, involves dividing large documents into smaller, manageable pieces or chunks. Smaller chunks are faster to process and analyze compared to large, monolithic documents. This results in quicker response times from the chatbot.
Document chunking helps also with the relevancy of the output, because when a user asks a question, it is often only in a specific part of a document. Chunking allows the system to pinpoint and retrieve just the relevant sections and the chatbot can provide more precise and accurate answers.
Now let’s get back to our application and let’s update the following code by including your Activeloop organization ID. Keep in mind that, by default, your organization ID is the same as your username.
# TODO: use your organization id here. (by default, org id is your username) my_activeloop_org_id = "didogrigorov" my_activeloop_dataset_name = "jetbrains_article_dataset" dataset_path = f"hub://{my_activeloop_org_id}/{my_activeloop_dataset_name}" db = DeepLake(dataset_path=dataset_path, embedding_function=embeddings) # add documents to our Deep Lake dataset db.add_documents(docs)Another great feature of PyCharm I love is the option TODO notes to be added directly in Python comments. Once you type TODO with capital letters, all notes go to a section of PyCharm where you can see them all:
# TODO: use your organization id here. (by default, org id is your username)You can click on them and PyCharm directly shows you where they are in your code. I find it very convenient for developers and use it all the time:
If you execute the code till now you should see the following output, if everything works normal:
To find the most similar chunks to a given query, we can utilize the similarity_search method provided by the Deep Lake vector store:
# Check the top relevant documents to a specific query query = "how to check disk usage in linux?" docs = db.similarity_search(query) print(docs[0].page_content) Step 3: Let’s build the prompt for GPT-3We will design a prompt template that integrates role-prompting, pertinent Knowledge Base data, and the user’s inquiry. This template establishes the chatbot’s persona as an outstanding customer support agent. It accepts two input variables: chunks_formatted, containing the pre-formatted excerpts from articles, and query, representing the customer’s question. The goal is to produce a precise response solely based on the given chunks, avoiding any fabricated or incorrect information.
Step 4: Building the chatbot functionalityTo generate a response, we begin by retrieving the top-k (e.g., top-3) chunks that are most similar to the user’s query. These chunks are then formatted into a prompt, which is sent to the GPT-3 model with a temperature setting of 0.
# user question query = "How to check disk usage in linux?" # retrieve relevant chunks docs = db.similarity_search(query) retrieved_chunks = [doc.page_content for doc in docs] # format the prompt chunks_formatted = "\n\n".join(retrieved_chunks) prompt_formatted = prompt.format(chunks_formatted=chunks_formatted, query=query) # generate answer llm = OpenAI(model="gpt-3.5-turbo-instruct", temperature=0) answer = llm(prompt_formatted) print(answer)If everything works fine, your output should be:
To upload a PDF to ChatGPT, first log into the website and click the paperclip icon next to the text input field. Then, select the PDF from your local hard drive, Google Drive, or Microsoft OneDrive. Once attached, type your query or question into the prompt field and click the upload button. Give the system time to analyze the PDF and provide you with a response.
Step 5: Build conversational history # Create conversational memory memory = ConversationBufferMemory(memory_key="chat_history", input_key="input") # Define a prompt template that includes memory template = """You are an exceptional customer support chatbot that gently answers questions. {chat_history} You know the following context information. {chunks_formatted} Answer the following question from a customer. Use only information from the previous context information. Do not invent stuff. Question: {input} Answer:""" prompt = PromptTemplate( input_variables=["chat_history", "chunks_formatted", "input"], template=template, ) # Initialize the OpenAI model llm = OpenAI(openai_api_key="YOUR API KEY", model="gpt-3.5-turbo-instruct", temperature=0) # Create the LLMChain with memory chain = LLMChain( llm=llm, prompt=prompt, memory=memory ) # User query query = "What was the 5th point about on the question how to remove spotify account?" # Retrieve relevant chunks docs = db.similarity_search(query) retrieved_chunks = [doc.page_content for doc in docs] # Format the chunks for the prompt chunks_formatted = "\n\n".join(retrieved_chunks) # Prepare the input for the chain input_data = { "input": query, "chunks_formatted": chunks_formatted, "chat_history": memory.buffer } # Simulate a conversation response = chain.predict(**input_data) print(response)Let’s walk through the code in a more conversational manner.
To start with, we set up a conversational memory using `ConversationBufferMemory`. This allows our chatbot to remember the ongoing chat history, using `input_key=”input”` to manage the incoming user inputs.
Next, we design a prompt template. This template is like a script for the chatbot, including sections for chat history, the chunks of information we’ve gathered, and the current user question (input). This structure helps the chatbot know exactly what context it has and what question it needs to answer.
Then, we move on to initializing our language model chain, or `LLMChain`. Think of this as assembling the components: we take our prompt template, the language model, and the memory we set up earlier, and combine them into a single workflow.
When it’s time to handle a user query, we prepare the input. This involves creating a dictionary that includes the user’s question (`input`) and the relevant information chunks (`chunks_formatted`). This setup ensures that the chatbot has all the details it needs to craft a well-informed response.
Finally, we generate a response. We call the `chain.predict` method, passing in our prepared input data. The method processes this input through the workflow we’ve built, and out comes the chatbot’s answer, which we then display.
This approach allows our chatbot to maintain a smooth, informed conversation, remembering past interactions and providing relevant answers based on the context.
Another favorite trick with PyCharm that helped me a lot to build this functionality was the opportunity to put my cursor over a method, to hit the key “CTRL” and click on it.
In conclusionGPT-3 excels at creating conversational chatbots capable of answering specific questions based on contextual information provided in the prompt. However, ensuring the model generates answers solely based on this context can be challenging, as it often tends to hallucinate (i.e., generate new, potentially false information). The impact of such false information varies depending on the use case.
In summary, we developed a context-aware question-answering system using LangChain, following the provided code and strategies. The process included splitting documents into chunks, computing their embeddings, implementing a retriever to find similar chunks, crafting a prompt for GPT-3, and using the GPT-3 model for text generation. This approach showcases the potential of leveraging GPT-3 to create powerful and contextually accurate chatbots while also emphasizing the importance of being vigilant about the risk of generating false information.
About the author Dido GrigorovDido is a seasoned Deep Learning Engineer and Python programmer with an impressive 17 years of experience in the field. He is currently pursuing advanced studies at the prestigious Stanford University, where he is enrolled in a cutting-edge AI program, led by renowned experts such as Andrew Ng, Christopher Manning, Fei-Fei Li and Chelsea Finn, providing Dido with unparalleled insights and mentorship.
Dido’s passion for Artificial Intelligence is evident in his dedication to both work and experimentation. Over the years, he has developed a deep expertise in designing, implementing, and optimizing machine learning models. His proficiency in Python has enabled him to tackle complex problems and contribute to innovative AI solutions across various domains.
The Drop Times: Why Is It 'Drupal CMS' and Not 'Drupal': An Explainer
Drupal Starshot blog: Out-of-the-box functionality survey results
We recently posted a survey seeking community feedback on what features and contrib modules to include in Drupal CMS out of the box, in order to deliver on the vision of getting from install to launch really fast. We were looking for features and modules that align with the Drupal Starshot strategy and consider the primary persona, which is ambitious marketers.
The survey got 60 submissions, with a wide variety of suggestions. Many of these were already on our radar, and closely align with our existing initiatives and work tracks. But it also raised a lot of new and interesting ideas for the leadership team and track leads to consider. We will also likely be posting new work tracks in the next few weeks based on the results, since there are some great suggestions that are not yet covered.
The following is a summary of the survey results, which we are not treating as a 'vote' for any one feature, but it's a great way to validate our plans and determine what other areas to focus on.
FeaturesThere were 108 different feature suggestions, with many that overlapped. Of those that were suggested in more than one submission, all of these are already covered by an initiative or work track:
- Better page building tools: more intuitive layout builder; drag & drop components; ability to easily add lists to pages; theming tools in the UI; live preview (20) [Experience builder]
- SEO: Meta tags (specifically including content schema and social media sharing); SEO analysis tools (14) [SEO track]
- Form builder (7) [Contact form track]
- Perform content management actions in bulk (3) [Content publishing workflows track]
- Image resizing and cropping tools (3) [Media management track]
- Responsive images (3) [Media management track]
- Login with email (3) [Base recipe track]
- Anti-spam measures (2) [Contact form track]
- Better for search (2) [Advanced search recipe track]
- Ability to add sitewide alerts (2) [Base recipe track]
The remaining feature suggestions were suggested once each, but point to specific areas we could focus on.
Content management & workflows- Workspaces
- Content workflows
- Content scheduling
- Content cloning
- Simple content access control
- Deleted content recovery
- WYSIWYG editor
- Content import & export tools
- Inline entity creation
- Jobs content recipe
- Event calendar
- Two-factor authentication
- Configurable password policy
- Security compliance tools
- Asymmetric translations
- Capability to display the source content next to the translated content in the node edit form
- SVG support
- Bulk media upload
- Easy linking directly to media files
- AI alt tag generation
- A/B testing for content
- QR code generation
- Easy to configure social media links
- Social sharing capability
- Accessibility checker
- AI enabled content writing
- Admin menu search
- Infinite scrolling
- SMTP email support
- Entity relationship modeling tool
- Better cookie handling
- Login with social network accounts
- Integrated deployments
- Email rerouting for non-production environments
- New core theme with configurable CSS variables
- Advanced aggregation modernization
- Better exposure of metrics / telemetry
- Automatic Updates
- Project Browser
- Simplified Views UI
- Ability to define "site settings" without affecting configuration
- Safe revision pruning
- Better situational awareness of extensions
- Easy configuration management system
- Easier removal of modules and cleaning up of applied recipes
- Entity hierarchy module in core
- Referential integrity: https://www.drupal.org/project/drupal/issues/2723323
- Poster images for video media: https://www.drupal.org/project/drupal/issues/2954834
- Inline moderation notes for easier collaboration
- Improved file upload experience/widget
- Submission against including Twig Tweak module
- Manual curation tools such as entityqueue
As with the feature suggestions, some modules were suggested more than once, and are mostly covered by existing streams.
Whether a module will be included will depend on many things, but mainly, it should be required for some functionality that we are planning to deliver. Track leads will propose functionality that will be supported by contrib modules, and then the modules will be assessed for inclusion. We plan to publish further information about module selection and ongoing governance and maintenance as the project progresses.
- Metatag (6) [SEO track]
- Webform (5) [Contact form track]
- Admin Toolbar (3) [Superseded by Navigation module]
- Coffee (3) [Base recipe track]
- Paragraphs (3) [Superseded by Experience builder]
- Simple XML sitemap (3) [SEO track]
- Scheduler / Scheduled Publish (3) [Base recipe track]
- Security Kit (2)
- Captcha (2) [Contact form track]
- Editor Advanced link (2)
- Focal Point (2) [Media management track]
- Linkit (2) [Base recipe track]
- Pathauto (2) [Base recipe track]
- Google Tag / GoogleTag Manager (2) [Analytics track]
- Smart Date (2) [Event track]
- Workspaces (2) [Content publishing workflows track]
Based on this, we might create new tracks for WYSIWYG and security, if we don't feel that we can sufficiently cover these as part of the base recipe.
The other modules suggested were:
- Address
- Back To Top
- Better Exposed Filter
- Block Class
- CKEditor 5 Font Plugin
- CrowdSec
- Disable language
- Disclosure Menu
- DropzoneJS
- ECA: Event - Condition - Action
- Editoria11y Accessibility Checker
- Entity
- Entity Extra Field
- Estimated Read Time
- EU Cookie Compliance (GDPR Compliance)
- Field Permissions
- Fullcalendar View
- Honeypot
- Image Effects
- Inline responsive images
- Keysave
- Layout Builder Asymmetric Translation
- Linkchecker
- Media Alias Display
- Media Directories
- Multiple Fields Remove Button
- Override Node Options
- Quick Node Block
- Responsive Table Filter
- RobotsTxt
- Role Delegation
- Rules
- Search API
- Security Review
- Select (or other)
- Select 2
- Stage File Proxy
- Svg Image
- Drupal Symfony Mailer
- System Tags
- Two-factor Authentication (TFA)
- Token
- Token Filter
- Tour
- Trash
- UI Patterns
- View Unpublished
- Views Bulk Edit
- Views Bulk Operations (VBO)
- Views data export
- Views Load More
- Real-time SEO for Drupal
Russ Allbery: Review: These Burning Stars
Review: These Burning Stars, by Bethany Jacobs
Series: Kindom Trilogy #1 Publisher: Orbit Copyright: October 2023 ISBN: 0-316-46342-6 Format: Kindle Pages: 430These Burning Stars is a science fiction thriller with cyberpunk vibes. It is Bethany Jacobs's first novel and the first of an expected trilogy, and won the 2024 Philip K. Dick Award for the best SF paperback original published in the US.
Generation starships brought humanity to the three star systems of the Treble, where they've built a new and thriving culture of billions. The Treble is ruled by the Kindom, a tripartite government structure built around the worship of six gods and the aristocratic power of the First Families. The Clerisy handle religion, the Secretaries run the bureaucracy, and the Cloaksaan enforce the decisions of the other branches.
The Nightfoots are one of the First Families. They control sevite, the propellant required to move between the systems of the Treble now that the moon Jeve and the sole source of natural jevite has been destroyed. Esek Nightfoot is a cleric, theoretically following the rules of the Clerisy, but she has made a career of training cloaksaan. She is is mercurial, powerful, ruthless, ambitious, politically well-connected, and greatly feared. She is also obsessed with a person named Six: an orphan she first encountered at a training school who was too young to have a gender or a name but who was already one of the best fighters in the school. In the sort of manipulative challenge typical of Esek, she dangled the offer of a place as a student and challenged the child to learn enough to do something impressive. The subsequent twenty years of elusive taunts and mysterious gifts from the impossible-to-locate Six have driven Esek wild.
Cleric Chono was beside Esek for much of that time. One of Six's classmates and another of Esek's rescues, Chono is the rare student who became a cleric rather than a cloaksaan. She is pious, cautious, and careful, the opposite of Esek's mercurial rage, but it's impossible to spend that much time around the woman and not be affected and manipulated by her. As this story opens, Chono is summoned by the First Cleric to join Esek on an assignment: recover a data coin that was stolen from a pirate raid on the Nightfoot compound. He refuses to tell them what data is on it, only saying that he believes it could be used to undermine public trust in the Nightfoot family.
Jun is a hacker with considerably fewer connections to power or government and no desire to meet any of these people. She and her partner Liis make a dubiously legal living from smaller, quieter jobs. Buying a collection of stolen data coins for an archivist consortium is riskier than she prefers, but she's been tracking down rumors of this coin for months. The deal is worth a lot of money, enough to make a huge difference for her family.
This is the second book I've read recently with strong cyberpunk vibes, although These Burning Stars mixes them with political thriller. This is a messy world with complicated political and religious systems, a lot of contentious history, and vast inequality. The story is told in two interleaved time sequences: the present-day fight over the data coin and the information that it contains, and a sequence of flashbacks telling the history of Esek's relationship with Six and Chono. Jun's story is the most cyberpunk and the one I found the most enjoyable to read, but Chono is a good viewpoint character for Esek's vicious energy and abusive charisma.
Six is not a viewpoint character. For most of the book, they're present mostly in shadows, glimpses, and consequences, but they're the strongest character of the book. Both Esek and Six are larger than life, creatures of legend stuffed into mundane politics but too full of strong emotions, both good and bad, to play by any of the rules. Esek has the power base and access to the levers of government, but Six's quiet competence and mercilessly targeted morality may make them the more dangerous of the pair.
I found the twisty political thriller part of this book engrossing and very difficult to put down, but it was also a bit too much drama for me in places. Jacobs has some surprises in store, one of which I did not expect at all, and they're set up beautifully and well-done within the story, but Esek and Six become an emotional star that the other characters orbit around and are in danger of getting pulled into. Chono is an accomplished and powerful character in her own right, but she's also an abuse victim, and while those parts are realistic, I didn't entirely enjoy reading them. There is quiet competence here alongside the drama, but I think I wanted the balance of emotion to tip a bit more towards the competence.
There is one thing that Jacobs does with the end of the book that greatly impressed me. Unfortunately I can't even hint at it for fear of spoilers, but the ending is unsettling in a way that I found surprising and thought-provoking. I think what I can say is that this book respects the intelligence and skill of secondary characters in a way that I think is rare in a story with such overwhelming protagonists. I'm still thinking about that, and it's going to pull me right into the sequel.
This is not going to be to everyone's taste. Esek is a viewpoint character and she can be very nasty. There's a lot of violence and abuse, including one rather graphic fight scene that I thought dragged on much longer than it needed to. But it's a satisfying, complex story with a true variety of characters and some real surprises. I'm glad I read it.
Followed by On Vicious Worlds, not yet published as I write this.
Content warnings: emotional and physical abuse, graphic violence, off-screen rape and sexual abuse of minors.
Rating: 7 out of 10
Dirk Eddelbuettel: RcppMagicEnum 0.0.1 on CRAN: New Package!
Happy to announce a new package: RcppMagicEnum. It arrived on CRAN yesterday following the resumption of normal service following the CRAN summer break. RcppMagicEnum brings the magicenum library by Daniil Goncharov to R.
Modern C++ is powerful, but still lacks reflection. This may change with C++26 but until then this library can help. A simple example, also shown on the README is as follows (and can be called from R via Rcpp::sourceCpp() if the RcppMagicEnum package is installed):
// [[Rcpp::depends(RcppMagicEnum)]] #include <RcppMagicEnum> // define a simple enum class, it uses optional typing as well as optional assigned values enum class Color : int { RED = -10, BLUE = 0, GREEN = 10 }; // [[Rcpp::export]] void example() { // instantiate an enum value in variable 'val' auto val = Color::RED; // show the current value on stdout Rcpp::Rcout << "Name of enum: " << magic_enum::enum_name(val) << std::endl; Rcpp::Rcout << "Integer value of enum: " << magic_enum::enum_integer(val) << std::endl; } /*** R example() */It produces the following output (where the ‘meta-comment’ at the end ensure the included and created-by-sourcing function example() is also called):
> Rcpp::sourceCpp("miniex.cpp") > example() Name of enum: RED Integer value of enum: -10 >The plan to experiment some more with this and then see if we could possible make factor variables map to such enums and vice versa. Help and discussion input is always welcome, and could be submitted either on the rcpp-devel list or as an issue at the repo.
The short NEWS entry follows.
Changes in version 0.0.1 (2024-07-31)- Initial version and CRAN upload
If you like this or other open-source work I do, you can sponsor me at GitHub.
This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.
Armin Ronacher: Rye and uv: August is Harvest Season for Python Packaging
It has been a few months since I wrote about Rye here last. You might remember that in February I passed over stewardship of my Rye packaging too to Astral. The folks over there have been super busy in building a lot of amazing tooling for Python packaging in the last few months. If you have been using Rye in the last few months you will have noticed that the underlying resolver and installer uv got a lot better and faster.
As of the most recent release, uv also gained a lot of functionality that previously required Rye such as manipulating pyproject.toml files, workspace support, local package references and script installation. It now also can manage Python installations for you so it's getting much closer.
If you are using Rye today, consider this blog post as a reminder that you should probably starting having a closer look at uv and give feedback to the Astral folks.
I gave a talk just recently in Prague at EuroPython about my current view of the Python packaging, the lessons I learned when creating Rye and one of the things I mentioned there is that the goal of a packaging tool has to be that it will dominate the space. The tool that absolutely everybody uses has to be the best tool: it's the thing any new person to Python gets to see when they start their programming journey. After that talk a lot of people walked up to me and had a lot of questions about that in particular.
Python in the last two years has become an incredibly hot and popular platform for many new developers. That has in part been fueled by all the investments and interest that went into AI and ML. I really want everybody who gets to learn and experience Python not to remember it as an old language with bad tooling, but as an amazing language with a stellar developer experience. Unfortunately that's not the case today because there is so much choice, so many tools that are not quite compatible, and by the inconsistency everywhere. I have seen people walk down one tool, just to re-emerge moving their entire stack to conda and back because they hit some wall.
Domination is a goal because it means that most investment will go into one stack. I can only re-iterate my wish and desire that Rye (and with it a lot of other tools in the space) should cease to exist once the dominating tool has been established. For me uv is poised to be that tool. It's not quite there today yet for all cases, but it will be in no time, and now is the moment to step up as a community and start to start to rally around it. That doesn't mean that this tool will be the tool forever. Things come and go and maybe there is a future for some other tool.
But today I'm looking forward to the moment when there will be a final release of Rye that is no remaining functionality other than to just largely alias to uv, that retires Rye specific functionality and migrates you over to uv.
However I only have the power to retire one tool, and that won't be enough. Today we are using so many other package managing solutions for Python and we should be advertising fewer. I understand how much time and effort went into many of those, and everybody's contributions are absolutely appreciated. Software like Rye and uv were built on the advancements of the ecosystem underneath it. They leverage years and years of work that went into migrating the Python ecosystems from setup.py files to eggs and finally wheels. From not having a metadata standard to having one. From coupled to decoupled build systems. Much of what makes Rye so enjoyable were individuals that worked towards making redistributable and downloadable Python binaries a possibility. There was a lot of work that was put into building out an amazing ecosystem of Rust crates and Python libraries needed to make these tools work. All of that brought us to that point where we are today.
But it is my believe that we need to take the next step and be willing to say as a community that some tools are no longer recommended. Maybe not today, but that moment will come quicker than we think. I remember a time when many of us who maintained Python libraries pointed new developers to using ez_setup.py and easy_install in our onboarding guides. Years later we removed the mentions of ez_setup.py from our guides to replace them with pip. Some of us have pointed developers at pip-tools, at poetry or PDM. Many projects today even show 5 different installation guides because of that wild variety of tools available because they no longer feel like they can recommend one.
If you maintain an important Python project I would ask you to give uv a try and ask yourself if you would consider pointing people towards it. I think that this is our best shot in the community at finding ourselves in a much better position than we have ever been.
Have a look at the blog post that Charlie from Astral wrote about what uv can do today. It's a true accomplishment worth celebrating and enjoying.
Postscriptum: there is an elephant in the room which is that Astral is a VC funded company. What does that mean for the future of these tools? Here is my take on this: for the community having someone pour money into it can create some challenges. For the PSF and the core Python project this is something that should be considered. However having seen the code and what uv is doing, even in the worst possible future this is a very forkable and maintainable thing. I believe that even in case Astral shuts down or were to do something incredibly dodgy licensing wise, the community would be better off than before uv existed.
Trey Hunner: 10-Week Hands-On Python Course
Ever wished you could take an Intro to Python training with me, but you don’t work for a company with a generous training budget? I’m running a Python-learning program just for this situation.
Python High Five is a 10-week Python jumpstart program that starts this September.
Set aside the time to learn ⌚One of the biggest problems for folks starting to learn Python is setting aside the time. And even if you do manage to set aside the time, you’ll often hit a roadblock where you feel confused.
Python High Five is a way to keep a daily learning habit and to get help when find yourself stuck.
This program is based around daily practice. Monday through Friday you’ll pick 30 minutes from your schedule, at any time that works you. During those 30 minutes, you’ll watch a 5 minute video, work on the day’s exercise, and reflect on your progress.
The most effective learning is hands-on 🖐️Python High Five is all about learning through writing Python code. Each week we’ll dive deeper into Python, building upon what we’ve learned so far.
When you find yourself stuck you can get help through an asynchronous group chat and weekly office hour sessions. In addition to our weekly office hours together, I’ll check the chat each day, respond to questions, and provide guidance.
Proven learning techniques behind the scenes 📝The daily check-ins allow for daily accountability. The group chat also provides both a community of peers to rely on, and guidance from an experienced Python trainer (me).
We’ll also be using proven learning techniques behind the scenes:
- Retrieval practice: you don’t learn by putting information into your head, but by trying to take it out; for Python learning, that means writing code.
- Spaced repetition: cramming is less effective than learning spaced out over time, which is why we’ll spend 30 minutes each weekday instead of spending a few hours every week.
- Interleaving: each day’s exercise isn’t predictably themed because a bit of unpredictability can be really improve learning outcomes.
- Elaboration: your daily check-in isn’t just about reflection: it’s also a helpful learning tool!
Plus, we’ll be working through curriculum I’ve been developing and iterating on for many years. I have taught these topics in many different settings to folks from many different backgrounds.
Form a daily learning habit 🔁Any 10-week program will be just the start of a Python learning habit. You’ll need to keep up your Python after Python High Five ends, either by promptly applying your skills to a new project or diving deeper into Python with continued daily practice.
That’s why I’m offering an 80% discount for High Five attendees on one year of Python Morsels, which is my skill-building service designed to help deepen your Python skills every week. You can see more details on that here.
Ready to start your Python journey? ⛰️Are you ready to start your Python journey with a solid foundation?
Read more about Python High Five and decide whether this is for you.
Keep in mind that while the program begins on September 9, enrollment closes on August 31. So check the FAQs and if you have additional questions, be sure to email me soon!
Python Morsels: Checking for an empty list in Python
Python programmers typically check for empty lists by relying on truthiness.
Table of contents
- Checking the length of a list
- Evaluating the truthiness of a list
- Comparing for equality with an empty list
- Truthiness checks are non-emptiness checks on lists
One way to check whether a list is empty is to check the length of that list. If the length is 0, the list must be empty:
>>> numbers = [] >>> if len(numbers) == 0: ... print("The list is empty.") ... The list is empty.Or if we wanted to check for non-empty lists, we could make sure that the length is greater than 0:
>>> if len(numbers) > 0: ... print("The list is NOT empty.") ...But this is actually not the most typical way to check for an empty list in Python.
Evaluating the truthiness of a listMany Python users prefer to …
Read the full article: https://www.pythonmorsels.com/checking-for-an-empty-list-in-python/PyCoder’s Weekly: Issue #643 (Aug. 20, 2024)
#643 – AUGUST 20, 2024
View in Browser »
The Scrapy crawl stat logs are useful for tracking and monitoring the performance of a spider. If you want to keep them longer rather than just see the console printout, you can have them written to a database.
XIEGERTS.COM • Shared by Stephen
In this video course, you’ll learn how to use Python to communicate with REST APIs. You’ll learn about REST architecture and how to use the requests library to get data from a REST API. You’ll also explore different Python tools you can use to build REST APIs.
REAL PYTHON course
Tired of tediously send files and trying to use general-purpose collaboration tools? Posit Connect makes it easy to share, collaborate, and get feedback on your data science work including Jupyter notebooks, Plotly dashboards, Streamlit, Quarto, Shiny or other interactive analytics applications →
POSIT sponsor
argparse, the standard library module that Django uses for parsing command line options, supports sub-commands. These are pretty neat for providing an expansive API without hundreds of individual commands. This article shows you how to write your own.
ADAM JOHNSON
When crawling websites with Scrapy you’ll quickly come across all sorts of scenarios that require you to get creative or interact with the page that you’re trying to scrape. One of these scenarios is when you need to crawl an infinite scroll page. This type of website page loads more content as you scroll down the page like a social media feed.
STEPHEN SIEGERT • Shared by Stephen Siegert
SQL injection is the process of tricking a database into doing unintended things by modifying the input values to a query. Boolean-based blind injection is a subset that reveals structural information about the database. These can be hard to craft by hand, this article shows you how to automate the process to help do penetration testing.
TREBLEDJ
Experience near-human accuracy, low-latency performance, and advanced Speech AI capabilities with AssemblyAI’s Speech-to-Text API. Sign up today and receive free API credits—No credit card required. Get $50 Credit →
ASSEMBLY AI sponsor
If you use Python’s print() function to get information about the flow of your programs, then logging is the natural next step for you. This tutorial will guide you through creating your first logs and show you ways to curate them to grow with your projects.
REAL PYTHON
How costly it is to call functions and builtins in your python code? Does inlining help? How have the recent CPython releases improved performance in these areas? This article dives deep on function performance.
ABHINAV UPADHYAY
On *nix systems with GDB installed, you can attach to already running processes. This article shows you how to combine that with Python’s PDB debugger to then add breakpoints to Python in a running script.
DOMINIK CZARNOTA
This is a deeper dive into types and Pydantic around how to build “correct by construction” design patterns. Building your objects so that validation becomes a single call.
WILLIAM WOODRUFF
PyTorch vs Tensorflow: Which one should you use? Learn about these two popular deep learning libraries and how to choose the best one for your project.
REAL PYTHON
Have you ever wanted to create a plot or graph in your terminal? Learn how with the textual-plotext package.
MIKE DRISCOLL
A list of things to check when something works on your computer but not on someone else’s.
MATHEUS RICHARD
Juan talks about his love for the uv tool and how it has simplified Python packaging.
JUAN LUIS CANO RODRIGUEZ
GITHUB.COM/BNKC • Shared by lev ostatnigrosh
Events Weekly Real Python Office Hours Q&A (Virtual) August 21, 2024
REALPYTHON.COM
August 21 to August 23, 2024
PYCON.ORG.SO
August 23 to August 26, 2024
KIWIPYCON.NZ
August 24, 2024
MEETUP.COM
August 26 to August 31, 2024
EUROSCIPY.ORG
August 29 to September 1, 2024
PYCON.ORG
Happy Pythoning!
This was PyCoder’s Weekly Issue #643.
View in Browser »
[ Subscribe to 🐍 PyCoder’s Weekly 💌 – Get the best Python news, articles, and tutorials delivered to your inbox once a week >> Click here to learn more ]