Feeds

qtatech.com blog: Automatiser les Déploiements de Sites Drupal avec CI/CD

Planet Drupal - Thu, 2024-08-22 04:13

Automating Drupal Site Deployments with CI/CD kanapatrick Thu, 08/22/2024 - 11:16

With the constant evolution of Drupal, particularly with recent versions like Drupal 10 and Drupal 11, automating deployments has become essential to leverage new features and maintain an agile and reliable development cycle. This article will guide you through the coding approaches and techniques to automate the deployment of Drupal sites using CI/CD.

Categories: FLOSS Project Planets

Implementing an Audio Mixer, Part 1

Planet KDE - Thu, 2024-08-22 03:00

Motivation

When using Qt Multimedia to play audio files, it’s common to use QMediaPlayer, as it supports a larger variety of formats than QSound and QSoundEffect. Consider a Qt application with several audio sources; for example, different notification sounds that may play simultaneously. We want to avoid cutting notification sounds off when a new one is triggered, and we don’t want to construct a queue for notification sounds, as sounds will play at the incorrect time. We instead want these sounds to overlap and play simultaneously.

Ideally, an application with audio has one output stream to the system mixer. This way in the mixer control, different applications can be set to different volume levels. However, a QMediaPlayer instance can only play one audio source at a time, so each notification would have to construct a new QMediaPlayer. Each player in turn opens its own stream to the system.

The result is a huge number of streams to the system mixer being opened and closed all the time, as well as QMediaPlayers constantly being constructed and destructed.

To resolve this, the application needs a mixer of its own. It will open a single stream to the system and combine all the audio into the one stream.

Before we can implement this, we first need to understand how PCM audio works.

PCM

As defined by Wikipedia:

Pulse-code modulation (PCM) is a method used to digitally represent sampled analog signals. It is the standard form of digital audio in computers, compact discs, digital telephony and other digital audio applications. In a PCM stream, the amplitude of the analog signal is sampled at uniform intervals, and each sample is quantized to the nearest value within a range of digital steps.

Here you can see how points are sampled in uniform intervals and quantized to the closest number that can be represented.

Image Source: Wikipedia

Description from Wikipedia: Sampling and quantization of a signal (red) for 4-bit LPCM over a time domain at specific frequency.

Think of a PCM stream as a humongous array of bytes. More specifically, it’s an array of samples, which are either integer or float values and a certain number of bytes in size. The samples are these discrete amplitude values from a waveform, organized contiguously. Think of the each element as being a y-value of a point along the wave, with the index representing an offset from x=0 at a uniform time interval.

Here is a graph of discretely sampled points along a sinusoidal waveform similar to the one above:

Image Source: Wikimedia Commons

Description from Wikimedia Commons: Image of a discrete time sinusoid

Let’s say we have an audio waveform that is a simple sine wave, like the above examples. Each point taken at discrete intervals along the curve here is a sample, and together they approximate a continuous waveform. The distance between the samples along the x-axis is a time delta; this is the sample period. The sample rate is the inverse of this, the number of samples that are played in one second. The typical standard sample rate for audio on CDs is 44100 Hz – we can’t really hear that this data is discrete (plus, the resultant sound wave from air movement is in fact a continuous waveform).

We also have to consider the y-axis here, which represents the amplitude of the waveform at each sampled point. In the image above, amplitude A is normalized such that A\in[−1,1]. In digital audio, there are a few different ways to represent amplitude. We can’t represent all real numbers on a computer, so the representation of the range of values varies in precision.

For example, let’s say we have two different representations of the wave above: 8-bit signed integer and 16-bit signed integer. The normalized value 1 from the image above maps to (2^{8}\div{2})−1=127 with 8-bit representation and (2^{16}\div2)−1=32767 with 16-bit. Therefore, with 16-bit representation, we have 128 times as many possible values to represent the same range; it is more precise, but the required size to store each 16-bit sample is double that of 8-bit samples.

We call the chosen representation, and thus the size of each sample, the bitdepth. Some common bitdepths are 16-bit int, 24-bit int, and 32-bit float, but there are many others in existence.

Let’s consider a huge stream of 16-bit samples and a sample rate of 44100 Hz. We write samples to the audio device periodically with a fixed-size buffer; let’s say it is 4096 bytes. The device will play each sample in the buffer at the aforementioned rate. Since each sample is a contiguous 2-byte short, we can fit 2048 samples into the buffer at once. We need to write 44100 samples in one second, so the whole buffer will be written around 21.5 times per second.

What if we have two different waveforms though, and what if one starts halfway through the other one? How do we mix them so that this buffer contains the data from both sources?

Waveform Superimposition

In the study of waves, you can superimpose two waves by adding them together. Let’s say we have two different discrete wave approximations, each represented by 20 signed 8-bit integer values. To superimpose them, for each index, add the values at that index. Some of these sums will exceed the limits of 8-bit representation, so we clamp them at the end to avoid signed integer overflow. This is known as hard clipping and is the phenomenon responsible for digital overdrive distortion.

x Wave 1 (y_1) Wave 2 (y_2) Sum (y_1+y_2) Clamped Sum 0 +60 −100 −40 −40 1 −120 +80 −40 −40 2 +40 +70 +110 +110 3 −110 −100 −210 −128 4 +50 −110 −60 −60 5 −100 +60 −40 −40 6 +70 +50 +120 +120 7 −120 −120 −240 −128 8 +80 −100 −20 −20 9 −80 +40 −40 −40 10 +90 +80 +170 +127 11 −100 −90 −190 −128 12 +60 −120 −60 −60 13 −120 +70 −50 −50 14 +80 −120 −40 −40 15 −110 +80 −30 −30 16 +90 −100 −10 −10 17 −110 +90 −20 −20 18 +100 −110 −10 −10 19 −120 −120 −240 −128

Now let’s implement this in C++. We’ll start small, and just combine two samples.

Note: we will use qint types here, but qint16 will be the same as int16_t and short on most systems, and similarly qint32 will correspond to int32_t and int.

qint16 combineSamples(qint32 samp1, qint32 samp2) { const auto sum = samp1 + samp2; if (std::numeric_limits<qint16>::max() < sum) return std::numeric_limits<qint16>::max(); if (std::numeric_limits<qint16>::min() > sum) return std::numeric_limits<qint16>::min(); return sum; }

This is quite a simple implementation. We use a function combineSamples and pass in two 16-bit values, but they will be converted to 32-bit as arguments and summed. This sum is clamped to the limits of 16-bit integer representation using std::numeric_limits in the <limits> header of the standard library. We then return the sum, at which point it is re-converted to a 16-bit value.

Combining Samples for an Arbitrary Number of Audio Streams

Now consider an arbitrary number of audio streams n. For each sample position, we must sum the samples of all n streams.

Let’s assume we have some sort of audio stream type (we’ll implement it later), and a list called mStreams containing pointers to instances of this stream type. We need to implement a function that loops through mStreams and makes calls to our combineSamples function, accumulating a sum into a new buffer.

Assume each stream in mStreams has a member function read(char *, qint64). We can copy one sample into a char * by passing it to read, along with a qint64 representing the size of a sample (bitdepth). Remember that our bitdepth is 16-bit integer, so this size is just sizeof(qint16).

Using read on all the streams in mStreams and calling combineSamples to accumulate a sum might look something like this:

qint16 accumulatedSum = 0; for (auto *stream : mStreams) { // call stream->read(char *, qint64) // to read a sample from the stream into streamSample qint16 streamSample; stream->read(reinterpret_cast<char *>(&streamSample), sizeof(qint16))); // accumulate accumulatedSum = combineSamples(sample, accumulatedSum); }

The first pass will add samples from the first stream to zero, effectively copying it to accumulatedSum. When we move to another stream, the samples from the second stream will be added to those copied values from the first stream. This continues, so the call to combineSamples for a third stream would combine the third stream’s sample with the sum of the first two. We continue to add directly into the buffer until we have combined all the streams.

Combining All Samples for a Buffer

Now let’s use this concept to add all the samples for a buffer. We’ll make a function that takes a buffer char *data and its size qint64 maxSize. We’ll write our accumulated samples into this buffer, reading all samples from the streams and adding them using the method above.

The function signature looks like this:

void readData(char *data, qint64 maxSize);

Let’s achieve more efficiency by using a constexpr variable for the bitdepth:

constexpr qint16 bitDepth = sizeof(qint16);

There’s no reason to call sizeof multiple times, especially considering sizeof(qint16) can be evaluated as a literal at compile-time.

With the size of each sample and the size of the buffer, we can get the total number of samples to write:

const qint16 numSamples = maxSize / bitDepth;

For each stream in mStreams we need to read each sample up to numSamples. As the sample index increments, a pointer to the buffer data needs to also be incremented, so we can write our results at the correct location in the buffer.

That looks like this:

void readData(char *data, qint64 maxSize) { // start with 0 in the buffer memset(data, 0, maxSize); constexpr qint16 bitDepth = sizeof(qint16); const qint16 numSamples = maxSize / bitDepth; for (auto *stream : mStreams) { // this pointer will be incremented across the buffer auto *cursor = reinterpret_cast<qint16 *>(data); qint16 sample; for (int i = 0; i < numSamples; ++i, ++cursor) if (stream->read(reinterpret_cast<char *>(&sample), bitDepth)) *cursor = combineSamples(sample, *cursor); } }

The idea here is that we can start playing new audio sources by adding new streams to mStreams. If we add a second stream halfway through a first stream playing, the next buffer for the first stream will be combined with the first buffer of this new stream. When we’re done playing a stream, we just drop it from the list.

Next Steps

In Part 2, we’ll use Qt Multimedia to fully implement our mixer, connect to our audio device, and test it on some audio files.

About KDAB

If you like this article and want to read similar material, consider subscribing via our RSS feed.

Subscribe to KDAB TV for similar informative short video content.

KDAB provides market leading software consulting and development services and training in Qt, C++ and 3D/OpenGL. Contact us.

The post Implementing an Audio Mixer, Part 1 appeared first on KDAB.

Categories: FLOSS Project Planets

Promet Source: Drupal vs SharePoint for State and Local Government

Planet Drupal - Wed, 2024-08-21 21:09

Takeaway: Drupal outperforms SharePoint for government websites by offering greater flexibility, cost-effectiveness, and scalability. Its open-source nature enables rapid innovation and customization, which is crucial for meeting your citizens' needs and government requirements without vendor lock-in or escalating costs.

Categories: FLOSS Project Planets

KDE ⚙️ Gear 24.08

Planet KDE - Wed, 2024-08-21 20:00

A script element has been removed to ensure Planet works properly. Please find it in the original post. Manage

Many of the new features in Dolphin are designed to make it easier to access and manage files and folders that require administrative privileges. Visual cues, wizards to help install needed software, and menu options to elevate your privileges make it easier than ever to use Dolphin as a superuser.

New usability features include:

A new "Move to New Folder…" option that pops up when right-clicking a file, allowing you to create a folder and copy the file into it all in one go.
Double-clicking the view background that triggers the "Select All" action by default.

Filelight is a complementary application to Dolphin, and can be installed directly from Dolphin by clicking the down arrow in the lower right corner of the main window.

Filelight helps visualize how much space your files and folders are taking up. Version 24.08 comes with a friendlier home page, and the Windows version (available from the Microsoft Store) has been redesigned to improve its overall appearance.

Konsole 24.08 also comes with a brand new usability enhancement: if you need to bookmark something important in a long output, double-click the scroll bar to set a position marker. You can then quickly scroll back and locate it later.

Create

Kdenlive is KDE's professional video editor, and this new release is all about the curves.

You can now use the brand new keyframe curve editor to customize effects, and easing methods (Cubic in/out and Exponential in/out) for fades.

To make things easier, we've redesigned the effects stack widget and improved the Transform effect, which now lets you select clips directly from the monitor. It also comes with a new grid, and improved design and behavior for the handle.

Travel

Coming to Akademy 2024? Don't forget to install the updates for Itinerary and Kongress and make your journey easy.

Itinerary is KDE's travel assistant. It lets you plan and manage your trips, providing an overview of where you need to be and when. It keeps boarding passes, tickets, and health certificates all in one place, and the latest version adds more details, including seat information displayed directly in the timeline.

Once you arrive at your destination, it's time to fire up Kongress so you don't miss any of the sessions or activities. Kongress now makes things easier by providing indoor maps of the venue, so you not only know when and what's going on, but also where.

Both Itinerary and Kongress work on desktop and laptop computers and most mobile devices.

Communicate

NeoChat is KDE's client for the Matrix chat system — and KDE's official way of chatting. Version 24.08 increases your safety by allowing you to preemptively block invites from unknown users not in any rooms with you.

Tokodon not only helps you read and post on Mastodon, but also manage your own server. Speaking of which, the version being released today can notify you of sign-ups on your server for better user management.

For posting, you can easily attach images from the internet, quote other posts and pop out the text editor to comfortably compose your toot.

When reading, Tokodon 24.08 supports scrolling up whole screenfuls of posts using the PageUp and PageDown keys.

Develop

Whether you want to help KDE implement features and fix bugs, or develop the next killer app, KDE's advanced text editor Kate has you covered.

Kate 24.08 improves its document formatting plugin with better support for bash, d, fish, Nix config, opsi-script, QML and YAML files. In related news, the Language Server Protocol (LSP) feature adds support for the Gleam, PureScript, and Typst languages.

If you're working on a CMake-based project, the Project and Build plugin now allows you to open the build directory and get both files and targets.

Surf

Falkon is KDE's full-featured web browser. The new release implements many bug fixes and optimizations that make surfing the web with Falkon smoother, easier and safer.

A new feature in 24.08 allows you to customize things that affect privacy and functionality on a site-by-site basis. Say you don't mind JavaScript on one site because the authors are trustworthy and it actually provides useful functionality, but you want to block it elsewhere for security reasons. You can now configure this in Falkon's Settings.

And all this too…

Okular — KDE's eco-certified document reader — improves compatibility for fillable forms in PDF documents, gets a makeover for Windows, and adds a more usable zoom feature.
PlasmaTube — a player for watching online videos from popular sites on your desktop — adds an option to block sponsored sections in videos.
Elisa — our elegant music player — gets a "Play this next" feature, and allows resizing of the sidebar and playlist panes.

Full changelog here Where to get KDE Apps

Although we fully support distributions that ship our software, KDE Gear 24.08 apps will also be available on these Linux app stores shortly:

Flathub Snapcraft

If you'd like to help us get more KDE applications into the app stores, support more app stores and get the apps better integrated into our development process, come say hi in our All About the Apps chat room.

Categories: FLOSS Project Planets

Kanopi Studios: All About Drupal 11

Planet Drupal - Wed, 2024-08-21 19:48

The next major Drupal version was just released — laying the foundation for its future. Drupal 11 was recently released on Drupal’s timeline. Unlike previous major versions, where releases needed to accommodate underlying technologies’ end of life like Symfony, Drupal 11 was released because it was the right time to solidify new features and free us […]

The post All About Drupal 11 appeared first on Kanopi Studios.

Categories: FLOSS Project Planets

Darren Oh: How Drupal Forge began

Planet Drupal - Wed, 2024-08-21 18:38

How Drupal Forge began Darren Oh Wed, 08/21/2024 - 18:38

Categories: FLOSS Project Planets

parallel @ Savannah: GNU Parallel 20240822 ('Southport') released

GNU Planet! - Wed, 2024-08-21 16:09

GNU Parallel 20240822 ('Southport') has been released. It is available for download at: lbry://@GnuParallel:4

Quote of the month:

honestly the coolest software i've ever seen gotta be gnu parallel or
ffmpeg, nothing like them
-- @scootykins scoot

New in this release:

--match Match input source with regexp to set replacement fields.
{:%fmt} Use printf formatting of replacement strings.
Bug fixes and man page updates.

News about GNU Parallel:

Powerful GNU parallel, more than a loop https://www.linkedin.com/pulse/powerful-gnu-parallel-more-than-loop-zhenguo-zhang-18dxc
How To Increase File Transfer Speed Using Parallel Rsync? https://contentbase.com/blog/increase-file-transfer-speed-parallel-rsync/
Converting WebP Images to PNG Using parallel and dwebp https://bytefreaks.net/2024/07/27
Turbocharging the Box CLI with GNU Parallel https://medium.com/box-developer-blog/turbocharging-the-box-cli-with-gnu-parallel-ee44c48811c0

GNU Parallel - For people who live life in the parallel lane.

If you like GNU Parallel record a video testimonial: Say who you are, what you use GNU Parallel for, how it helps you, and what you like most about it. Include a command that uses GNU Parallel if you feel like it.

About GNU Parallel

GNU Parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. The typical input is a list of files, a list of hosts, a list of users, a list of URLs, or a list of tables. A job can also be a command that reads from a pipe. GNU Parallel can then split the input and pipe it into commands in parallel.

If you use xargs and tee today you will find GNU Parallel very easy to use as GNU Parallel is written to have the same options as xargs. If you write loops in shell, you will find GNU Parallel may be able to replace most of the loops and make them run faster by running several jobs in parallel. GNU Parallel can even replace nested loops.

GNU Parallel makes sure output from the commands is the same output as you would get had you run the commands sequentially. This makes it possible to use output from GNU Parallel as input for other programs.

For example you can run this to convert all jpeg files into png and gif files and have a progress bar:

parallel --bar convert {1} {1.}.{2} ::: *.jpg ::: png gif

Or you can generate big, medium, and small thumbnails of all jpeg files in sub dirs:

find . -name '*.jpg' |
    parallel convert -geometry {2} {1} {1//}/thumb{2}_{1/} :::: - ::: 50 100 200

You can find more about GNU Parallel at: http://www.gnu.org/s/parallel/

You can install GNU Parallel in just 10 seconds with:

    $ (wget -O - pi.dk/3 || lynx -source pi.dk/3 || curl pi.dk/3/ || \
       fetch -o - http://pi.dk/3 ) > install.sh
    $ sha1sum install.sh | grep 883c667e01eed62f975ad28b6d50e22a
    12345678 883c667e 01eed62f 975ad28b 6d50e22a
    $ md5sum install.sh | grep cc21b4c943fd03e93ae1ae49e28573c0
    cc21b4c9 43fd03e9 3ae1ae49 e28573c0
    $ sha512sum install.sh | grep ec113b49a54e705f86d51e784ebced224fdff3f52
    79945d9d 250b42a4 2067bb00 99da012e c113b49a 54e705f8 6d51e784 ebced224
    fdff3f52 ca588d64 e75f6033 61bd543f d631f592 2f87ceb2 ab034149 6df84a35
    $ bash install.sh

Watch the intro video on http://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

Walk through the tutorial (man parallel_tutorial). Your command line will love you for it.

When using programs that use GNU Parallel to process data for publication please cite:

O. Tange (2018): GNU Parallel 2018, March 2018, https://doi.org/10.5281/zenodo.1146014.

If you like GNU Parallel:

Give a demo at your local user group/team/colleagues
Post the intro videos on Reddit/Diaspora*/forums/blogs/ Identi.ca/Google+/Twitter/Facebook/Linkedin/mailing lists
Get the merchandise https://gnuparallel.threadless.com/designs/gnu-parallel
Request or write a review for your favourite blog or magazine
Request or build a package for your favourite distribution (if it is not already there)
Invite me for your next conference

If you use programs that use GNU Parallel for research:

Please cite GNU Parallel in you publications (use --citation)

If GNU Parallel saves you money:

(Have your company) donate to FSF https://my.fsf.org/donate/

About GNU SQL

GNU sql aims to give a simple, unified interface for accessing databases through all the different databases' command line clients. So far the focus has been on giving a common way to specify login information (protocol, username, password, hostname, and port number), size (database and table size), and running queries.

The database is addressed using a DBURL. If commands are left out you will get that database's interactive shell.

When using GNU SQL for a publication please cite:

O. Tange (2011): GNU SQL - A Command Line Tool for Accessing Different Databases Using DBURLs, ;login: The USENIX Magazine, April 2011:29-32.

About GNU Niceload

GNU niceload slows down a program when the computer load average (or other system activity) is above a certain limit. When the limit is reached the program will be suspended for some time. If the limit is a soft limit the program will be allowed to run for short amounts of time before being suspended again. If the limit is a hard limit the program will only be allowed to run when the system is below the limit.

Categories: FLOSS Project Planets

DrupalEasy: DrupalEasy Podcast S17E2 - Janez Urevc - Gander

Planet Drupal - Wed, 2024-08-21 15:16

We talk with Janez Urevc from Tag1 Consulting about Gander, an open-source automated performance testing framework.

URLs mentioned

Gander (includes many helpful links!)
40-minute video presentation introducing Gander (including a demo)
Drupal.org docs page for performance testing with Gander
https://gander.tag1.io - dashboard that Tag1 hosts for the Drupal community
Two-hour Gander workshop at DrupalCon Barcelona on September 24, 2024

DrupalEasy News

Professional module development - 15 weeks, 90 hours, live, online course.
Drupal Career Online - 12 weeks, 77 hours, live online, beginner-focused course.

Audio transcript

We're using the machine-driven Amazon Transcribe service to provide an audio transcript of this episode.

Subscribe to our podcast on iTunes, iHeart, Amazon, YouTube, or Spotify.

If you'd like to leave us a voicemail, call 321-396-2340. Please keep in mind that we might play your voicemail during one of our future podcasts. Feel free to call in with suggestions, rants, questions, or corrections. If you'd rather just send us an email, please use our contact page.

Credits

Podcast edited by Amelia Anello.

Categories: FLOSS Project Planets

Jonathan Dowland: Fediverse and feeds

Planet Debian - Wed, 2024-08-21 12:06

It's clear that Twitter has been circling the drain for years, but things have been especially bad in recent times. I haven't quit (I have some sympathy with the viewpoint don't cede territory to fascists) but I try to read it much less, and I certainly post much less.

Especially at the moment, I really appreciate distractions.

Last time I wrote about Mastodon (by which I meant the Fediverse1), I was looking for a new instance to try. I settled on Debian's social instance 2. I'm now trying to put any energy I might spend engaging on Twitter, into engaging in the Fediverse instead. (You can follow me via the handle @jon@dow.land, I think, which should repoint to my actual handle, @jmtd@pleroma.debian.social.)

There are other potential successors to Twitter: two big ones are Bluesky and Facebook-owned Threads. They are effectively cookie-cutter copies of the Twitter model, and so, we will repeat the same mistakes there. Sadly I see the majority of communities and sub-cultures I follow are migrating to one or the other of these.

The Fediverse (or the Mastodon-ish bits of it) should avoid the fate of Twitter. JWZ puts it better and more succinctly than I can.

The Fedi experience is, sadly, pretty clunky. So I want to try and write a bit from time to time with tips and tricks that might improve people's experiences.

First up, something I discovered only today about Mastodon instances. As JWZ noted, If you are worried about picking the "right" Mastodon instance, don't. Just spin the wheel.. You can spend too much time trying to guess a good answer to this. Better to just get started.

At the same time, individual instances are supposed to cater to specific niches. So it could be useful to sample the public posts from an entire instance. For example, to find people to follow, or decide to hop over to that instance yourself. You can't (I think) follow an entire instance from within yours, but, they usually have a public page which shows you the latest traffic.

For example, the infosec-themed instance infosec.exchange has one here: https://infosec.exchange/public/local

These pages don't provide RSS or Atom feeds3, sadly. I hope that's on the software's roadmap, and hasn't been spurned for ideological reasons. For now at least, OpenRSS provide RSS/Atom feeds for many Mastodon instances. For example, an RSS/Atom feed of the above: https://openrss.org/infosec.exchange/public/local

One can add these feeds to your Feed reader and over time get a flavour for the kind of discourse that takes place on given instances.

I think the OpenRSS have to manually add Mastodon instances to their service. I tried three instances and only one (infosec.exchange) worked. I'm not sure but I think trying an instance that doesn't work automatically puts it on OpenRSS's backlog.

the Fediverse-versus-Mastodon nomenclature problem is just the the tip of the iceberg, in terms of adoption problems. Mastodon provides a twitter-like service that participates in the Fediverse. But it isn't correct to call the twitter-like service "Mastodon" because other softwares also participate in/provide that service. And it's not correct to call it "Fediverse" because that describes a bigger thing, with e.g. youtube clones also taking part. I'm not sure what the right term should be for "the twitter-like thing". Also, everything I wrote here is probably subtly wrong.↩
Debian's instance actually runs Pleroma, an alternative to Mastodon. Why should it matter? I think it's healthy for there to be more than one implementation in an open ecosystem. However the experience can be janky, as the features don't perfectly align, some Mastodon features/APIs are not documented/standardised/etc.↩
I have to remind myself that the concept of RSS/Atom feeds and Feedreaders might need explaining to a modern audience too. Perhaps in another blog post.↩

Categories: FLOSS Project Planets

Twin Cities Drupal Camp: Introducing Lightning Talks

Planet Drupal - Wed, 2024-08-21 11:57

Introducing Lightning Talks Published Date Tuesday, August 27th, 2024 - 11:00 am minneapolisdan Wed, 08/21/2024 - 11:57

We've added a fun new event to our conference this year – Lightning Talks!

Some of you may be familiar with the concept, but we'll explain it here, as well as how we're planning to do it. (Note that this an in-person event for registered attendees only.)

What are Lightning Talks

A lightning talk is a very short presentation lasting only a few minutes. Each speaker gets a maximum of 5 minutes to present on a topic of your choice. You will have access to the projector for slides.

Because these are very short and fast presentations (thus the "lightning" part), it's meant to be brief, snappy, and fun. We'll rotate quickly between speakers to keep things moving and entertaining.

These are not the same as the 45 minute sessions held during the rest of Camp. Speakers will need to focus on a single message or just a few quick key points.

Speakers can talk about serious and technical topics, or they can do something lighthearted and silly too. This is always a fun event!

When Will They Be Held?

We're going to hold the lightning talks between 3-4pm on the first day of Camp (Thu, Sep 12), in the West Wing (the big room). Right before happy hour!

How Can I Participate?

We've made a form to gather signups ahead of time. We'll only have time for 10-12 presenters, so we may not be able to include every submission. Sign up today!

Do I Have to Participate?

Talk of group presentations may be triggering for some people, but don't worry! No one is required to participate. Having you in the audience is all we ask.

Additional Resources

What are Lightning Talks by the University of North Carolina

Posted In Drupal Planet

Categories: FLOSS Project Planets

Python Anywhere: Belated announcement of latest updates

Planet Python - Wed, 2024-08-21 10:03

Here is a slightly delayed (and short) run-down of the new stuff that we deployed recently.

The main change for this update is that we have updated the underlying OS running PythonAnywhere to Ubuntu 22.04. This is an LTS release so it will be supported for some time to come. This will not affect user environments, but it is setting us up for a new user environment that should be coming soon.

We have also:

Started the process of updating our file servers to be more robust
Improved our alerting so that we are alerted to many new forms of failure on PythonAnywhere
Made some improvements to the ASGI beta systems and their documentation
Fixed a number of security issues
Fixed various bugs

Categories: FLOSS Project Planets

Real Python: Primer on Jinja Templating

Planet Python - Wed, 2024-08-21 10:00

Templates are an essential ingredient in full-stack web development. With Jinja, you can build rich templates that power the front end of your Python web applications.

But you don’t need to use a web framework to experience the capabilities of Jinja. When you want to create text files with programmatic content, Jinja can help you out.

In this tutorial, you’ll learn how to:

Install the Jinja template engine
Create your first Jinja template
Render a Jinja template in Flask
Use for loops and conditional statements with Jinja
Nest Jinja templates
Modify variables in Jinja with filters
Use macros to add functionality to your front end

You’ll start by using Jinja on its own to cover the basics of Jinja templating. Later you’ll build a basic Flask web project with two pages and a navigation bar to leverage the full potential of Jinja.

Throughout the tutorial, you’ll build an example app that showcases some of Jinja’s wide range of features. To see what it’ll do, skip ahead to the final section.

You can also find the full source code of the web project by clicking on the link below:

Source Code: Click here to download the source code that you’ll use to explore Jinja’s capabilities.

Take the Quiz: Test your knowledge with our interactive “Primer on Jinja Templating” quiz. You’ll receive a score upon completion to help you track your learning progress:

Interactive Quiz

Primer on Jinja Templating

In this quiz, you'll test your understanding of Jinja templating. Jinja is a powerful tool for building rich templates in Python web applications, and it can also be used to create text files with programmatic content.

This tutorial is for you if you want to learn more about the Jinja template language or if you’re getting started with Flask.

Get Started With Jinja

Jinja is not only a city in the Eastern Region of Uganda and a Japanese temple, but also a template engine. You commonly use template engines for web templates that receive dynamic content from the back end and render it as a static page in the front end.

But you can use Jinja without a web framework running in the background. That’s exactly what you’ll do in this section. Specifically, you’ll install Jinja and build your first templates.

Install Jinja

Before exploring any new package, it’s a good idea to create and activate a virtual environment. That way, you’re installing any project dependencies in your project’s virtual environment instead of system-wide.

Select your operating system below and use your platform-specific command to set up a virtual environment:

Windows PowerShell PS> python -m venv venv PS> .\venv\Scripts\activate (venv) PS> Copied! Shell $ python -m venv venv $ source venv/bin/activate (venv) $ Copied!

With the above commands, you create and activate a virtual environment named venv by using Python’s built-in venv module. The parentheses (()) surrounding venv in front of the prompt indicate that you’ve successfully activated the virtual environment.

After you’ve created and activated your virtual environment, it’s time to install Jinja with pip:

Shell (venv) $ python -m pip install Jinja2 Copied!

Don’t forget the 2 at the end of the package name. Otherwise, you’ll install an old version that isn’t compatible with Python 3.

It’s worth noting that although the current major version is actually greater than 2, the package that you’ll install is nevertheless called Jinja2. You can verify that you’ve installed a modern version of Jinja by running pip list:

Shell (venv) $ python -m pip list Package Version ---------- ------- Jinja2 3.x ... Copied!

To make things even more confusing, after installing Jinja with an uppercase J, you have to import it with a lowercase j in Python. Try it out by opening the interactive Python interpreter and running the following commands:

Read the full article at https://realpython.com/primer-on-jinja-templating/ »

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Drupal Association blog: Why I'm a Ripplemaker.. by Nikki Flores

Planet Drupal - Wed, 2024-08-21 10:00

I first learned Drupal in 2008 and it was the backbone of one of my consulting service's very first sites. We used the Quiz module and built online certifications based on scoring. This feature was a huge hit for that client and carried them forward, such that they were able to get acquired and eventually retire.

Implementing sites on Drupal carried my partner and I through a decade of consulting as we built out websites for national and international public agencies, nonprofits, membership organizations, and e-commerce. Drupal fed my family through a variety of projects, paid for my healthcare, tuition, and our housing.

In my current role at Lullabot as an employee-owner, Drupal has continued to evolve. We use Drupal for our enterprise projects, for state and local governments as well as education and publishing clients: my current client, a state government, is converting their agencies into a unified platform.

From the community side, I organized the first DrupalCamp in Hawaii and am nearing the end of my elected term as a community board member for the Drupal Association, and continue to speak on panels, coordinate, organize, and present at DrupalCon and other events.

There is nothing that I do that is special: anyone else in the Drupal community can, and is welcome, to contribute, connect, and engage to the level that they have energy to do so, I recognize how fortunate I am to work at a company that invests in Drupal and also supports "internal time" so we continue to learn, grow, and develop our skills.

Because I have gained so much from Drupal, I'm very happy to encourage you to join me as a Ripple Maker—making a monthly contribution, and in my case, making an allocation to the Drupal Association as a beneficiary from my estate, helps me know that the core values of our community: collaboration, questioning and commenting, making items incrementally better, and continuing to encourage the next generation, will last.

Donate now

Nikki Flores
Senior Technical Project Manager
Lullabot

Categories: FLOSS Project Planets

Real Python: Quiz: Primer on Jinja Templating

Planet Python - Wed, 2024-08-21 08:00

In this quiz, you’ll test your understanding of Jinja templating. Jinja is a powerful tool for building rich templates in Python web applications, and it can also be used to create text files with programmatic content.

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Droptica: 10 SEO Features a Modern CMS Should Have. Using Drupal as an Example

Planet Drupal - Wed, 2024-08-21 06:17

In this blog post, I'll introduce ten SEO features that every modern CMS should have and show you how easy it is to implement them in Drupal. So, if you have an existing website, you'll easily see what you're missing. And if you're just planning to build a new one, you'll get a ready-made list of features to copy to your web specifications and requirements. I invite you to read the article or watch an episode of the “Nowoczesny Drupal” series (the video is in Polish).

Categories: FLOSS Project Planets

Drupal.org blog: GitLab CI templates will make Drupal 11 the default version to run

Planet Drupal - Wed, 2024-08-21 06:06

Whenever a new major version of Drupal is released, we update Drupal's GitLab CI testing templates to automatically update the versions being tested. Here's an outline of our plan:

Where we are now

Drupal 11 was released on August 6th. You can learn more about it on the Drupal 11 landing page.

This means that we are in the middle of a transition period where many sites and modules will want to be in Drupal 11, whereas some others might still want to stay in Drupal 10.

From a GitLab CI point of view, testing for both Drupal 10 and 11 simultaneously has been available for months, providing module maintainers with a great tool to test their code before Drupal 11 was launched.

This was available by setting one variable in the .gitlab-ci.yml like this:

variables: OPT_IN_TEST_NEXT_MAJOR: 1

Many maintainers have leveraged this already and we can see many modules already claiming full Drupal 11 support within days of the release. To be more specific, as of August 20th, 2870 projects have no compatibility errors anymore, and 1720 have made Drupal 11 compatible releases.

Where we want to be

We are preparing to update the default testing configuration for the GitLab CI templates, but we want to make sure to continue to support maintainers who still need to test against Drupal 10 and 11. We've outlined the changes we'll be making and the timeline below.

As of today:

Current version (default) is Drupal 10
Next major version is Drupal 11
Previous major version is Drupal 9

When we do the shift, this will change to:

Current version (default) is Drupal 11
Next major version will be Drupal 12 (when development starts) - see note below.
Previous major version is Drupal 10

For modules that were testing Drupal 10 and Drupal 11 simultaneously, the change will be as easy as this:

variables: # OPT_IN_TEST_NEXT_MAJOR: 1 OPT_IN_TEST_PREVIOUS_MAJOR: 1

Instead of opting in to test the next major, all you need to do is opt into the previous major.

Note: Drupal 12 development branch does not exist yet. Enabling this version might not do anything until this branch is created.

Steps

We are actively working on making the above switch in this issue: Update templates so 11.0 is the default/current branch.

We are going to be taking the following steps in the coming days / weeks.

Step 1: Make all modules start testing against Drupal 11

We will set the default value for OPT_IN_TEST_NEXT_MAJOR to 1 temporarily, and release version 1.5.6 of the templates. This will automatically become the default for all Contrib.

Modules that have not yet tested their code against Drupal 11 will now see "Next Major" test jobs in their pipelines, in addition to the "current" Drupal 10 variant. These new jobs have allow_failure: true, so the overall result of the pipelines should not change. This should show a good sense of where the module is at in relation to Drupal 11. Maintainers can still override the variable to be 0 if they don't want this behavior.

The expected date for this change is: August 26th, 2024 (next Monday)

Step 2: Roll out the shift and make it available for Contrib

When the issue Update templates so 11.0 is the default/current branch and all its dependencies all sorted, we will deploy the changes and create a new release 1.6.0. This will be available to Contrib projects using "gitlab ref" main or 1.x-latest

The expected date for this change is: September 5th, 2024 (2 weeks from now)

Step 3: Make the shift default for all Contrib

Then we will make this new release be the default for all contrib projects automatically.

However, we have provided several alternatives for modules that don't want to do the shift at this point. Any of the following can be used:

You can pin the version of the templates for your module to 1.5.6. This is the latest version released before the switch. Learn more about pinning the templates version in this page. Note that this means you will not get any updates to the templates for new features or bug fixes, until you un-pin the release.
You can set OPT_IN_TEST_PREVIOUS_MAJOR to 1 and OPT_IN_TEST_CURRENT to 0 to continue testing Drupal 10 and not Drupal 11.
You can configure your own variants as described on this page.
You can tweak the key variables used when creating variants so they have the versions that you desire. Check the above link for that information.

For those wanting to do the shift, you will not need to do anything at all.

The expected date for this change is: September 12th, 2024 (3 weeks from now)

After the shift is made

Onwards and upwards, that means that Drupal 11 is the default version to be tested for all new issues, merge requests, and pipelines for all contrib projects, allowing us to keep the Drupal ecosystem up to date and relevant.

There are some issues that are not blockers for this change, but are related, so we encourage you to see the issue list before reporting anything new, but otherwise create a new issue if you discover a problem and don't find it in the queue.

Categories: FLOSS Project Planets

PyCharm: How to Build Chatbots With LangChain

Planet Python - Wed, 2024-08-21 06:06

This is a guest post from Dido Grigorov, a deep learning engineer and Python programmer with 17 years of experience in the field.

Chatbots have evolved far beyond simple question-and-answer tools. With the power of large language models (LLMs), they can understand the context of conversations and generate human-like responses, making them invaluable for customer support applications and other types of virtual assistance.

LangChain, an open-source framework, streamlines the process of building these conversational chatbots by providing tools for seamless model integration, context management, and prompt engineering.

In this blog post, we’ll explore how LangChain works and how chatbots interact with LLMs. We’ll also guide you step by step through building a context-aware chatbot that delivers accurate, relevant responses using LangChain and GPT-3.

What are the chatbots in the realm of LLMs?

Chatbots in the field of LLMs are cutting-edge software that simulate human-like conversations with users through text or voice interfaces. These chatbots exploit the advanced capabilities of LLMs, which are neural networks trained on huge amounts of text data which allows them to produce human-like responses to a wide range of input prompts.

One among all other matters is that LLM-based chatbots can take a conversation’s context into account when generating a response. This means they can keep coherence across several exchanges and can process complex queries to produce outputs that are in line with the users’ intentions. Additionally, these chatbots assess the emotional tone of a user’s input and adjust their responses to match the user’s sentiments.

Chatbots are highly adaptable and personalized. They learn from how users interact with them thus improving on their responses by adjusting them according to individual preferences and needs.

What is LangChain?

LangChain is a framework that’s open-source developed for creating apps that use large language models (LLMs). It comes with tools and abstractions to better personalize the information produced from these models while maintaining accuracy and relevance.

One common term you can see when you read about LLMs is “prompt chains”. A prompt chain refers to a sequence of prompts or instructions used in the context of artificial intelligence and machine learning, with the purpose to guide the AI model through a multi-step process to generate more accurate, detailed, or refined outputs. This method can be employed for various tasks, such as writing, problem-solving, or generating code.

Developers can create new prompt chains using LangChain, which is one of the strongest sides of the framework. They can even modify existing prompt templates without needing to train the model again when using new datasets.

How does LangChain work?

LangChain is a framework designed to simplify the development of applications that utilize language models. It offers a suite of tools that help developers efficiently build and manage applications that involve natural language processing (NLP) and Large Language Models. By defining the steps needed to achieve the desired outcome (this might be a chatbot, task automation, virtual assistant, customer support, and even more), developers can adapt language models flexibly to specific business contexts using LangChain.

Here’s a high-level overview of how LangChain works.

Model integration

LangChain supports various Language models including those from OpenAI, Hugging Face, Cohere, Anyscale, Azure Models, Databricks, Ollama, Llama, GPT4All, Spacy, Pinecone, AWS Bedrock, MistralAI, among others. Developers can easily switch between different models or use multiple models in one application. They can build custom-developed model integration solutions, which allow developers to take advantage of specific capabilities tailored to their specific applications.

Chains

The core concept of LangChain is chains, which bring together different AI components for context-aware responses. A chain represents a set of automated actions between a user prompt and the final model output. There are two types of chains provided by LangChain:

Sequential chains: These chains enable the output of a model or function to be used as an input for another one. This is particularly helpful in making multi-step processes that depend on each other.
Parallel chains: It allows for simultaneous running of multiple tasks, with their outputs merged at the end. This makes it perfect for doing tasks that can be divided into subtasks that are completely independent.

Memory

LangChain facilitates the storage and retrieval of information across various interactions. This is essential where there is need for persistence of context such as with chat-bots or interactive agents. There are also two types of memory provided:

Short-term memory – Helps keep track of recent sessions.
Long-term memory – Allows retention of information from previous sessions enhancing system recall capability on past chats and user preferences.

Tools and utilities

LangChain provides many tools, but the most used ones are Prompt Engineering, Data Loaders and Evaluators. When it comes to Prompt Engineering, LangChain contains utilities to develop good prompts, which are very important in getting the best responses from language models.

If you want to load up files like csv, pdf or other format, Data Loaders are here to help you to load and pre-process different types of data hence making them usable in model interactions.

Evaluation is an essential part of working with machine learning models and large language models. That’s why LangChain provides Evaluators – tools used for testing language models and chains so that generated results meet the required criteria, which might include:

Datasets criteria:

Manually curated examples: Start with high-quality, diverse inputs.
Historical logs: Use real user data and feedback.
Synthetic data: Generate examples based on initial data.

Types of evaluations:

Human: Manual scoring and feedback.
Heuristic: Rule-based functions, both reference-free and reference-based.
LLM-as-judge: LLMs score outputs based on encoded criteria.
Pairwise: Compare two outputs to pick the better one.

Application evaluations:

Unit tests: Quick, heuristic-based checks.
Regression testing: Measure performance changes over time.
Back-testing: Re-run production data on new versions.
Online evaluation: Evaluate in real-time, often for guardrails and classifications.

Agents
LangChain agents are essentially autonomous entities that leverage LLMs to interact with users, perform tasks, and make decisions based on natural language inputs.

Action-driven agents use language models to decide on optimal actions for predefined tasks. On the other side interactive agents or interactive applications such as chatbots make use of these agents, which also take into account user input and stored memory when responding to queries.

How do chatbots work with LLMs?

LLMs underlying chatbots use Natural Language Understanding (NLU) and Natural Language Generation (NLG), which are made possible through pre-training of models on vast textual data.

Natural Language Understanding (NLU)

Context awareness: LLMs can understand the subtlety and allusions in a conversation, and they can keep track of the conversation from one turn to the next. This makes it possible for the chatbots to generate logical and contextually appropriate responses to the clients.
Intent recognition: These models should be capable of understanding the user’s intent from their queries, whether the language is very specific or quite general. They can discern what the user wants to achieve and determine the best way to help them reach that goal.
Sentiment analysis: Chatbots can determine the emotion of the user through the tone of language used and adapt to the user’s emotional state, which increases the engagement of the user.

Natural Language Generation (NLG)

Response generation: When LLMs are asked questions, the responses they provide are correct both in terms of grammar and the context. This is because the responses that are produced by these models mimic human communication, due to the training of the models on vast amounts of natural language textual data.
Creativity and flexibility: Apart from simple answers, LLM-based chatbots can tell a story, create a poem, or provide a detailed description of a specific technical issue and, therefore, can be considered to be very flexible in terms of the provided material.

Personalization and adaptability

Learning from interactions: Chatbots make the interaction personalized because they have the ability to learn from the users’ behavior, as well as from their choices. It can be said that it is constantly learning, thereby making the chatbot more effective and precise in answering questions.
Adaptation to different domains: The LLMs can be tuned to particular areas or specialties that allow the chatbots to perform as subject matter experts in customer relations, technical support, or the healthcare domain.

LLMs are capable of understanding and generating text in multiple languages, making them suitable for applications in diverse linguistic contexts.

Building your own chatbot with LangChain in five steps

This project aims to build a chatbot that leverages GPT-3 to search for answers within documents. First, we scrape content from online articles, split them into small chunks, compute their embeddings, and store them in Deep Lake. Then, we use a user query to retrieve the most relevant chunks from Deep Lake, which are incorporated into a prompt for generating the final answer with the LLM.

It’s important to note that using LLMs carries a risk of generating hallucinations or false information. While this may be unacceptable for many customer support scenarios, the chatbot can still be valuable for assisting operators in drafting answers that they can verify before sending to users.

Next, we’ll explore how to manage conversations with GPT-3 and provide examples to demonstrate the effectiveness of this workflow

Step 1: Project creation, prerequisites, and required library installation

First create your PyCharm project for the chatbot. Open up Pycharm and click on “new project”. Then give a name of your project.

Once ready with the project set up, generate your `OPENAI_API_KEY` on the OpenAI API Platform Website, once you are logged in (or sign up on the OpenAI website for that purpose). To do that go to the “API Keys” section on the left navigation menu and then click on the button “+Create new secret key”. Don’t forget to copy your key.

After that get your `ACTIVELOOP_TOKEN` by signing up on the Activeloop website. Once logged in, just click on the button “Create API Token” and you’ll be navigated to the token creation page. Copy this token as well.

Once you have both the token and the key, open your configuration settings in PyCharm, by clicking on the 3 dots button next to the run and debug buttons, and choose “Edit”. You should see the following window:

Now locate the field “Environment variables” and find the icon on the right side of the field. Then click there – you’ll see the following window:

And now by clicking the + button start adding your environmental variables and be careful with their names. They should be the same as mentioned above: `OPENAI_API_KEY` and `ACTIVELOOP_TOKEN`. When ready just click OK on the first window and then “Apply” and “OK” on the second one.

That’s a very big advantage of PyCharm and I very much love it, because it handles the environment variables for us automatically without the requirement for additional calls to them, allowing us to think more about the creative part of the code.

Note: ActiveLoop is a technology company that focuses on developing data infrastructure and tools for machine learning and artificial intelligence. The company aims to streamline the process of managing, storing, and processing large-scale datasets, particularly for deep learning and other AI applications.

DeepLake is an ActiveLoop’s flagship product. It provides efficient data storage, management, and access capabilities, optimized for large-scale datasets often used in AI.

Install the required libraries

We’ll use the `SeleniumURLLoader` class from LangChain, which relies on the `unstructured` and `selenium` Python libraries. Install these using pip. It is recommended to install the latest version, although the code has been specifically tested with version 0.7.7.

To do that use the following command in your PyCharm terminal:

pip install unstructured selenium

Now we need to install langchain, deeplake and openai. To do that just use this command in your terminal (same window you used for Selenium) and wait a bit until everything is successfully installed:

pip install langchain==0.0.208 deeplake openai==0.27.8 psutil tiktoken

To make sure all libraries are properly installed, just add the following lines needed for our chatbot app and click on the Run button:

from langchain.embeddings.openai import OpenAIEmbeddings from langchain.vectorstores import DeepLake from langchain.text_splitter import CharacterTextSplitter from langchain import OpenAI from langchain.document_loaders import SeleniumURLLoader from langchain import PromptTemplate

Another way to install your libraries is through the settings of PyCharm. Open them and go to the section Project -> Project Interpreter. Then locate the + button, search for your package and hit the button “Install Package”. Once ready, close it, and on the next window click “Apply” and then “OK”.

Step 2: Splitting content into chunks and computing their embeddings

As previously mentioned, our chatbot will “communicate” with content coming out of online articles, that’s why I picked Digitaltrends.com as my source of data and selected 8 articles to start. All of them are organized into a Python list and assigned to a variable called “articles”.

articles = ['https://www.digitaltrends.com/computing/claude-sonnet-vs-gpt-4o-comparison/', 'https://www.digitaltrends.com/computing/apple-intelligence-proves-that-macbooks-need-something-more/', 'https://www.digitaltrends.com/computing/how-to-use-openai-chatgpt-text-generation-chatbot/', 'https://www.digitaltrends.com/computing/character-ai-how-to-use/', 'https://www.digitaltrends.com/computing/how-to-upload-pdf-to-chatgpt/']

We load the documents from the provided URLs and split them into chunks using the `CharacterTextSplitter` with a chunk size of 1000 and no overlap:

# Use the selenium to load the documents loader = SeleniumURLLoader(urls=articles) docs_not_splitted = loader.load() # Split the documents into smaller chunks text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0) docs = text_splitter.split_documents(docs_not_splitted)

If you run the code till now you should receive the following output, if everything works well:

[Document(page_content="techcrunch\n\ntechcrunch\n\nWe, TechCrunch, are part of the Yahoo family of brandsThe sites and apps that we own and operate, including Yahoo and AOL, and our digital advertising service, Yahoo Advertising.Yahoo family of brands.\n\n When you use our sites and apps, we use \n\nCookiesCookies (including similar technologies such as web storage) allow the operators of websites and apps to store and read information from your device. Learn more in our cookie policy.cookies to:\n\nprovide our sites and apps to you\n\nauthenticate users, apply security measures, and prevent spam and abuse, and\n\nmeasure your use of our sites and apps\n\n If you click '", metadata={'source': ……………]

Next, we generate the embeddings using OpenAIEmbeddings and save them in a DeepLake vector store hosted in the cloud. Ideally, in a production environment, we could upload an entire website or course lesson to a DeepLake dataset, enabling searches across thousands or even millions of documents.

By leveraging a serverless Deep Lake dataset in the cloud, applications from various locations can seamlessly access a centralized dataset without the necessity of setting up a vector store on a dedicated machine.

Why do we need embeddings and documents in chunks?

When building chatbots with Langchain, embeddings and chunking documents are essential for several reasons that relate to the efficiency, accuracy, and performance of the chatbot.

Embeddings are vector representations of text (words, sentences, paragraphs, or documents) that capture semantic meaning. They encapsulate the context and meaning of words in a numerical form. This allows the chatbot to understand and generate responses that are contextually appropriate by capturing nuances, synonyms, and relationships between words.

Thanks to the embeddings, the chatbot can also quickly identify and retrieve the most relevant responses or information from a knowledge base, because they allow matching user queries with the most semantically relevant chunks of information, even if the wording differs.

Chunking, on the other side, involves dividing large documents into smaller, manageable pieces or chunks. Smaller chunks are faster to process and analyze compared to large, monolithic documents. This results in quicker response times from the chatbot.

Document chunking helps also with the relevancy of the output, because when a user asks a question, it is often only in a specific part of a document. Chunking allows the system to pinpoint and retrieve just the relevant sections and the chatbot can provide more precise and accurate answers.

Now let’s get back to our application and let’s update the following code by including your Activeloop organization ID. Keep in mind that, by default, your organization ID is the same as your username.

# TODO: use your organization id here. (by default, org id is your username) my_activeloop_org_id = "didogrigorov" my_activeloop_dataset_name = "jetbrains_article_dataset" dataset_path = f"hub://{my_activeloop_org_id}/{my_activeloop_dataset_name}" db = DeepLake(dataset_path=dataset_path, embedding_function=embeddings) # add documents to our Deep Lake dataset db.add_documents(docs)

Another great feature of PyCharm I love is the option TODO notes to be added directly in Python comments. Once you type TODO with capital letters, all notes go to a section of PyCharm where you can see them all:

# TODO: use your organization id here. (by default, org id is your username)

You can click on them and PyCharm directly shows you where they are in your code. I find it very convenient for developers and use it all the time:

If you execute the code till now you should see the following output, if everything works normal:

To find the most similar chunks to a given query, we can utilize the similarity_search method provided by the Deep Lake vector store:

# Check the top relevant documents to a specific query query = "how to check disk usage in linux?" docs = db.similarity_search(query) print(docs[0].page_content) Step 3: Let’s build the prompt for GPT-3

We will design a prompt template that integrates role-prompting, pertinent Knowledge Base data, and the user’s inquiry. This template establishes the chatbot’s persona as an outstanding customer support agent. It accepts two input variables: chunks_formatted, containing the pre-formatted excerpts from articles, and query, representing the customer’s question. The goal is to produce a precise response solely based on the given chunks, avoiding any fabricated or incorrect information.

Step 4: Building the chatbot functionality

To generate a response, we begin by retrieving the top-k (e.g., top-3) chunks that are most similar to the user’s query. These chunks are then formatted into a prompt, which is sent to the GPT-3 model with a temperature setting of 0.

# user question query = "How to check disk usage in linux?" # retrieve relevant chunks docs = db.similarity_search(query) retrieved_chunks = [doc.page_content for doc in docs] # format the prompt chunks_formatted = "\n\n".join(retrieved_chunks) prompt_formatted = prompt.format(chunks_formatted=chunks_formatted, query=query) # generate answer llm = OpenAI(model="gpt-3.5-turbo-instruct", temperature=0) answer = llm(prompt_formatted) print(answer)

If everything works fine, your output should be:

To upload a PDF to ChatGPT, first log into the website and click the paperclip icon next to the text input field. Then, select the PDF from your local hard drive, Google Drive, or Microsoft OneDrive. Once attached, type your query or question into the prompt field and click the upload button. Give the system time to analyze the PDF and provide you with a response.

Step 5: Build conversational history # Create conversational memory memory = ConversationBufferMemory(memory_key="chat_history", input_key="input") # Define a prompt template that includes memory template = """You are an exceptional customer support chatbot that gently answers questions. {chat_history} You know the following context information. {chunks_formatted} Answer the following question from a customer. Use only information from the previous context information. Do not invent stuff. Question: {input} Answer:""" prompt = PromptTemplate( input_variables=["chat_history", "chunks_formatted", "input"], template=template, ) # Initialize the OpenAI model llm = OpenAI(openai_api_key="YOUR API KEY", model="gpt-3.5-turbo-instruct", temperature=0) # Create the LLMChain with memory chain = LLMChain( llm=llm, prompt=prompt, memory=memory ) # User query query = "What was the 5th point about on the question how to remove spotify account?" # Retrieve relevant chunks docs = db.similarity_search(query) retrieved_chunks = [doc.page_content for doc in docs] # Format the chunks for the prompt chunks_formatted = "\n\n".join(retrieved_chunks) # Prepare the input for the chain input_data = { "input": query, "chunks_formatted": chunks_formatted, "chat_history": memory.buffer } # Simulate a conversation response = chain.predict(**input_data) print(response)

Let’s walk through the code in a more conversational manner.

To start with, we set up a conversational memory using `ConversationBufferMemory`. This allows our chatbot to remember the ongoing chat history, using `input_key=”input”` to manage the incoming user inputs.

Next, we design a prompt template. This template is like a script for the chatbot, including sections for chat history, the chunks of information we’ve gathered, and the current user question (input). This structure helps the chatbot know exactly what context it has and what question it needs to answer.

Then, we move on to initializing our language model chain, or `LLMChain`. Think of this as assembling the components: we take our prompt template, the language model, and the memory we set up earlier, and combine them into a single workflow.

When it’s time to handle a user query, we prepare the input. This involves creating a dictionary that includes the user’s question (`input`) and the relevant information chunks (`chunks_formatted`). This setup ensures that the chatbot has all the details it needs to craft a well-informed response.

Finally, we generate a response. We call the `chain.predict` method, passing in our prepared input data. The method processes this input through the workflow we’ve built, and out comes the chatbot’s answer, which we then display.

This approach allows our chatbot to maintain a smooth, informed conversation, remembering past interactions and providing relevant answers based on the context.

Another favorite trick with PyCharm that helped me a lot to build this functionality was the opportunity to put my cursor over a method, to hit the key “CTRL” and click on it.

In conclusion

GPT-3 excels at creating conversational chatbots capable of answering specific questions based on contextual information provided in the prompt. However, ensuring the model generates answers solely based on this context can be challenging, as it often tends to hallucinate (i.e., generate new, potentially false information). The impact of such false information varies depending on the use case.

In summary, we developed a context-aware question-answering system using LangChain, following the provided code and strategies. The process included splitting documents into chunks, computing their embeddings, implementing a retriever to find similar chunks, crafting a prompt for GPT-3, and using the GPT-3 model for text generation. This approach showcases the potential of leveraging GPT-3 to create powerful and contextually accurate chatbots while also emphasizing the importance of being vigilant about the risk of generating false information.

About the author Dido Grigorov

Dido is a seasoned Deep Learning Engineer and Python programmer with an impressive 17 years of experience in the field. He is currently pursuing advanced studies at the prestigious Stanford University, where he is enrolled in a cutting-edge AI program, led by renowned experts such as Andrew Ng, Christopher Manning, Fei-Fei Li and Chelsea Finn, providing Dido with unparalleled insights and mentorship.

Dido’s passion for Artificial Intelligence is evident in his dedication to both work and experimentation. Over the years, he has developed a deep expertise in designing, implementing, and optimizing machine learning models. His proficiency in Python has enabled him to tackle complex problems and contribute to innovative AI solutions across various domains.

Categories: FLOSS Project Planets

The Drop Times: Why Is It 'Drupal CMS' and Not 'Drupal': An Explainer

Planet Drupal - Wed, 2024-08-21 02:45

Discover why Drupal's latest product will be called 'Drupal CMS' and not just 'Drupal.' Explore the strategic decision-making, community feedback, and future implications behind this significant naming shift that redefines the way we think about Drupal's evolution.

Categories: FLOSS Project Planets

Drupal Starshot blog: Out-of-the-box functionality survey results

Planet Drupal - Wed, 2024-08-21 01:52

We recently posted a survey seeking community feedback on what features and contrib modules to include in Drupal CMS out of the box, in order to deliver on the vision of getting from install to launch really fast. We were looking for features and modules that align with the Drupal Starshot strategy and consider the primary persona, which is ambitious marketers.

The survey got 60 submissions, with a wide variety of suggestions. Many of these were already on our radar, and closely align with our existing initiatives and work tracks. But it also raised a lot of new and interesting ideas for the leadership team and track leads to consider. We will also likely be posting new work tracks in the next few weeks based on the results, since there are some great suggestions that are not yet covered.

The following is a summary of the survey results, which we are not treating as a 'vote' for any one feature, but it's a great way to validate our plans and determine what other areas to focus on.

Features

There were 108 different feature suggestions, with many that overlapped. Of those that were suggested in more than one submission, all of these are already covered by an initiative or work track:

Better page building tools: more intuitive layout builder; drag & drop components; ability to easily add lists to pages; theming tools in the UI; live preview (20) [Experience builder]
SEO: Meta tags (specifically including content schema and social media sharing); SEO analysis tools (14) [SEO track]
Form builder (7) [Contact form track]
Perform content management actions in bulk (3) [Content publishing workflows track]
Image resizing and cropping tools (3) [Media management track]
Responsive images (3) [Media management track]
Login with email (3) [Base recipe track]
Anti-spam measures (2) [Contact form track]
Better for search (2) [Advanced search recipe track]
Ability to add sitewide alerts (2) [Base recipe track]

The remaining feature suggestions were suggested once each, but point to specific areas we could focus on.

Content management & workflows

Workspaces
Content workflows
Content scheduling
Content cloning
Simple content access control
Deleted content recovery
WYSIWYG editor
Content import & export tools
Inline entity creation
Jobs content recipe
Event calendar

Security

Two-factor authentication
Configurable password policy
Security compliance tools

Multilingual

Asymmetric translations
Capability to display the source content next to the translated content in the node edit form

Media

SVG support
Bulk media upload
Easy linking directly to media files
AI alt tag generation

Marketing tools

A/B testing for content
QR code generation
Easy to configure social media links
Social sharing capability

General

Accessibility checker
AI enabled content writing
Admin menu search
Infinite scrolling
SMTP email support
Entity relationship modeling tool
Better cookie handling
Login with social network accounts

Developer tools

Integrated deployments
Email rerouting for non-production environments
New core theme with configurable CSS variables
Advanced aggregation modernization
Better exposure of metrics / telemetry

Drupal-specific suggestions

Automatic Updates
Project Browser
Simplified Views UI
Ability to define "site settings" without affecting configuration
Safe revision pruning
Better situational awareness of extensions
Easy configuration management system
Easier removal of modules and cleaning up of applied recipes
Entity hierarchy module in core
Referential integrity: https://www.drupal.org/project/drupal/issues/2723323
Poster images for video media: https://www.drupal.org/project/drupal/issues/2954834
Inline moderation notes for easier collaboration
Improved file upload experience/widget
Submission against including Twig Tweak module
Manual curation tools such as entityqueue

Modules proposed

As with the feature suggestions, some modules were suggested more than once, and are mostly covered by existing streams.

Whether a module will be included will depend on many things, but mainly, it should be required for some functionality that we are planning to deliver. Track leads will propose functionality that will be supported by contrib modules, and then the modules will be assessed for inclusion. We plan to publish further information about module selection and ongoing governance and maintenance as the project progresses.

Metatag (6) [SEO track]
Webform (5) [Contact form track]
Admin Toolbar (3) [Superseded by Navigation module]
Coffee (3) [Base recipe track]
Paragraphs (3) [Superseded by Experience builder]
Simple XML sitemap (3) [SEO track]
Scheduler / Scheduled Publish (3) [Base recipe track]
Security Kit (2)
Captcha (2) [Contact form track]
Editor Advanced link (2)
Focal Point (2) [Media management track]
Linkit (2) [Base recipe track]
Pathauto (2) [Base recipe track]
Google Tag / GoogleTag Manager (2) [Analytics track]
Smart Date (2) [Event track]
Workspaces (2) [Content publishing workflows track]

Based on this, we might create new tracks for WYSIWYG and security, if we don't feel that we can sufficiently cover these as part of the base recipe.

The other modules suggested were:

Categories: FLOSS Project Planets

Russ Allbery: Review: These Burning Stars

Planet Debian - Tue, 2024-08-20 23:54

Review: These Burning Stars, by Bethany Jacobs

Series: Kindom Trilogy #1 Publisher: Orbit Copyright: October 2023 ISBN: 0-316-46342-6 Format: Kindle Pages: 430

These Burning Stars is a science fiction thriller with cyberpunk vibes. It is Bethany Jacobs's first novel and the first of an expected trilogy, and won the 2024 Philip K. Dick Award for the best SF paperback original published in the US.

Generation starships brought humanity to the three star systems of the Treble, where they've built a new and thriving culture of billions. The Treble is ruled by the Kindom, a tripartite government structure built around the worship of six gods and the aristocratic power of the First Families. The Clerisy handle religion, the Secretaries run the bureaucracy, and the Cloaksaan enforce the decisions of the other branches.

The Nightfoots are one of the First Families. They control sevite, the propellant required to move between the systems of the Treble now that the moon Jeve and the sole source of natural jevite has been destroyed. Esek Nightfoot is a cleric, theoretically following the rules of the Clerisy, but she has made a career of training cloaksaan. She is is mercurial, powerful, ruthless, ambitious, politically well-connected, and greatly feared. She is also obsessed with a person named Six: an orphan she first encountered at a training school who was too young to have a gender or a name but who was already one of the best fighters in the school. In the sort of manipulative challenge typical of Esek, she dangled the offer of a place as a student and challenged the child to learn enough to do something impressive. The subsequent twenty years of elusive taunts and mysterious gifts from the impossible-to-locate Six have driven Esek wild.

Cleric Chono was beside Esek for much of that time. One of Six's classmates and another of Esek's rescues, Chono is the rare student who became a cleric rather than a cloaksaan. She is pious, cautious, and careful, the opposite of Esek's mercurial rage, but it's impossible to spend that much time around the woman and not be affected and manipulated by her. As this story opens, Chono is summoned by the First Cleric to join Esek on an assignment: recover a data coin that was stolen from a pirate raid on the Nightfoot compound. He refuses to tell them what data is on it, only saying that he believes it could be used to undermine public trust in the Nightfoot family.

Jun is a hacker with considerably fewer connections to power or government and no desire to meet any of these people. She and her partner Liis make a dubiously legal living from smaller, quieter jobs. Buying a collection of stolen data coins for an archivist consortium is riskier than she prefers, but she's been tracking down rumors of this coin for months. The deal is worth a lot of money, enough to make a huge difference for her family.

This is the second book I've read recently with strong cyberpunk vibes, although These Burning Stars mixes them with political thriller. This is a messy world with complicated political and religious systems, a lot of contentious history, and vast inequality. The story is told in two interleaved time sequences: the present-day fight over the data coin and the information that it contains, and a sequence of flashbacks telling the history of Esek's relationship with Six and Chono. Jun's story is the most cyberpunk and the one I found the most enjoyable to read, but Chono is a good viewpoint character for Esek's vicious energy and abusive charisma.

Six is not a viewpoint character. For most of the book, they're present mostly in shadows, glimpses, and consequences, but they're the strongest character of the book. Both Esek and Six are larger than life, creatures of legend stuffed into mundane politics but too full of strong emotions, both good and bad, to play by any of the rules. Esek has the power base and access to the levers of government, but Six's quiet competence and mercilessly targeted morality may make them the more dangerous of the pair.

I found the twisty political thriller part of this book engrossing and very difficult to put down, but it was also a bit too much drama for me in places. Jacobs has some surprises in store, one of which I did not expect at all, and they're set up beautifully and well-done within the story, but Esek and Six become an emotional star that the other characters orbit around and are in danger of getting pulled into. Chono is an accomplished and powerful character in her own right, but she's also an abuse victim, and while those parts are realistic, I didn't entirely enjoy reading them. There is quiet competence here alongside the drama, but I think I wanted the balance of emotion to tip a bit more towards the competence.

There is one thing that Jacobs does with the end of the book that greatly impressed me. Unfortunately I can't even hint at it for fear of spoilers, but the ending is unsettling in a way that I found surprising and thought-provoking. I think what I can say is that this book respects the intelligence and skill of secondary characters in a way that I think is rare in a story with such overwhelming protagonists. I'm still thinking about that, and it's going to pull me right into the sequel.

This is not going to be to everyone's taste. Esek is a viewpoint character and she can be very nasty. There's a lot of violence and abuse, including one rather graphic fight scene that I thought dragged on much longer than it needed to. But it's a satisfying, complex story with a true variety of characters and some real surprises. I'm glad I read it.

Followed by On Vicious Worlds, not yet published as I write this.

Content warnings: emotional and physical abuse, graphic violence, off-screen rape and sexual abuse of minors.

Rating: 7 out of 10

Categories: FLOSS Project Planets

Search form

Tag cloud

Feeds

qtatech.com blog: Automatiser les Déploiements de Sites Drupal avec CI/CD

Implementing an Audio Mixer, Part 1

Promet Source: Drupal vs SharePoint for State and Local Government

KDE ⚙️ Gear 24.08

Kanopi Studios: All About Drupal 11

Darren Oh: How Drupal Forge began

parallel @ Savannah: GNU Parallel 20240822 ('Southport') released

DrupalEasy: DrupalEasy Podcast S17E2 - Janez Urevc - Gander

Jonathan Dowland: Fediverse and feeds

Twin Cities Drupal Camp: Introducing Lightning Talks

Python Anywhere: Belated announcement of latest updates

Real Python: Primer on Jinja Templating

Drupal Association blog: Why I'm a Ripplemaker.. by Nikki Flores

Real Python: Quiz: Primer on Jinja Templating

Droptica: 10 SEO Features a Modern CMS Should Have. Using Drupal as an Example

Drupal.org blog: GitLab CI templates will make Drupal 11 the default version to run

PyCharm: How to Build Chatbots With LangChain

The Drop Times: Why Is It 'Drupal CMS' and Not 'Drupal': An Explainer

Drupal Starshot blog: Out-of-the-box functionality survey results

Russ Allbery: Review: These Burning Stars

Pages

Recent Publications

FLOSS Project Planets

FLOSS Research

Search form

Tag cloud

You are here

Feeds

Pages

Recent Publications

FLOSS Project Planets

FLOSS Research