Feeds
Community input drives the new draft of the Open Source AI Definition
A new version of the Open Source AI Definition has been released with one new feature and a cleaner text, based on comments received from public discussions and recommendations. We’re continuing our march towards having a stable release by the end of October 2024, at All Things Open. Get involved by joining the discussion on the forum, finding OSI staff around the world and online at the weekly town halls.
New feature: clarified Open Source model and Open Source weights- Under “What is Open Source AI,” there is a new paragraph that (1) identifies both models and weights/parameters as encompassed by the word “system” and (2) makes it clear that all components of a larger system have to meet the standard. There is a new sentence in the paragraph after the “share” bullet making this point.
- Under the heading “Open Source models and Open Source weights,” there is a description of the components for both of those for machine learning systems. We also edited the paragraph below those additions to eliminate some redundancy.
The role of training data is one of the most hotly debated parts of the definition. After long deliberation and co-design sessions we have concluded that defining training data as a benefit, not a requirement, is the best way to go.
Training data is valuable to study AI systems: to understand the biases that have been learned, which can impact system behavior. But training data is not part of the preferred form for making modifications to an existing AI system. The insights and correlations in that data have already been learned.
Data can be hard to share. Laws that permit training on data often limit the resharing of that same data to protect copyright or other interests. Privacy rules also give a person the rightful ability to control their most sensitive information, such as decisions about their health. Similarly, much of the world’s Indigenous knowledge is protected through mechanisms that are not compatible with later-developed frameworks for rights exclusivity and sharing.
- Open training data (data that can be reshared) provides the best way to enable users to study the system, along with the preferred form of making modifications.
- Public training data (data that others can inspect as long as it remains available) also enables users to study the work, along with the preferred form.
- Unshareable non-public training data (data that cannot be shared for explainable reasons) gives the ability to study some of the systems biases and demands a detailed description of the data – what it is, how it was collected, its characteristics, and so on – so that users can understand the biases and categorization underlying the system.
OSI believes these extra requirements for data beyond the preferred form of making modifications to the AI system both advance openness in all the components of the preferred form of modifying the AI system and drive more Open Source AI in private-first areas such as healthcare.
Other changes- The Checklist is separated into its own document. This is to separate the discussion about how to identify Open Source AI from the establishment of general principles in the Definition. The content of the Checklist has also been fully aligned with the Model Openness Framework (MOF), allowing for an easy overlay.
- Under “Preferred form to make modifications,” the word “Model” changed to “Weights.” The word “Model” was referring only to parameters, and was inconsistent with how the word “model” is used in the rest of the document.
- There is an explicit reference to the intended recipients of the four freedoms: developers, deployers and end users of AI systems.
- Incorporated credit to the Free Software Definition.
- Added references to conditions of availability of components, referencing the Open Source Definition.
- Continue iterating through drafts after meeting diverse stakeholders at the worldwide roadshow, collect feedback and carefully look for new arguments in dissenting opinions.
- Decide how to best address the reviews of new licenses for datasets, documentation and the agreements governing model parameters.
- Keep improving the FAQ.
- Prepare for post-stable-release: Establish a process to review future versions of the Open Source AI Definition.
We will be taking draft v.0.0.9 on the road collecting input and endorsements, thanks to a grant by the Sloan Foundation. The lively conversation about the role of data in building and modifying AI systems will continue at multiple conferences from around the world, the weekly town halls and online throughout the Open Source community.
The first two stops are in Asia: Hong Kong for AI_dev August 21-23, then Beijing for Open Source Congress August 25-27. Other events are planned to take place in Africa, South America, Europe and North America. These are all steps toward the conclusion of the co-design process that will result in the release of the stable version of the Definition in October at All Things Open.
Creating an Open Source AI Definition is an arduous task over the past two years, but we know the importance of creating this standard so the freedoms to use, study, share and modify AI systems can be guaranteed. Those are the core tenets of Open Source, and it warrants the dedicated work it has required. You can read about the people who have played key roles in bringing the Definition to life in our Voices of Open Source AI Definition on the blog.
How to get involvedThe OSAID co-design process is open to everyone interested in collaborating. There are many ways to get involved:
- Join the forum: share your comment on the drafts.
- Leave comment on the latest draft: provide precise feedback on the text of the latest draft.
- Follow the weekly recaps: subscribe to our monthly newsletter and blog to be kept up-to-date.
- Join the town hall meetings: we’re increasing the frequency to weekly meetings where you can learn more, ask questions and share your thoughts.
- Join the workshops and scheduled conferences: meet the OSI and other participants at in-person events around the world.
EuroPython: EuroPython August 2024 Newsletter
Hello and welcome to the post-conference newsletter! We really hope you enjoyed EuroPython 2024, cause we sure did and are still recovering from all the fun and excitement :)
We have some updates to share with you, and also wanted to use this newsletter to nostalgically look back at all the good times we had just last month, surrounded by old friends and new in the beautiful city of Prague ❤️.
🏛️ EuroPython Society (EPS)This year we had a booth for the EuroPython Society at the conference. What is the EPS? The EPS is the running engine behind the EuroPython Conference. The EPS board is made up of up to 9 directors (including 1 chair and 1 vice chair). It runs the day-to-day business of the EuroPython Society, including running the EuroPython conference series, and supports the community through various initiatives such as our grants programme. The board collectively takes up the fiscal and legal responsibility of the Society.
For the next few weeks, the board is working with our accountant and auditor to get our financial reports in order. As soon as that is finalised, we will be excited to call for the next Annual General Assembly (GA); the actual GA will be held at least 14 days after our formal notice.
General Assembly is a great opportunity to hear about EuroPython Society&aposs developments and updates in the last year and a new board will also be elected at the end of the GA.
All EPS members are invited to attend the GA and have voting rights. Find out how to sign up to become an EPS member for free here: https://www.europython-society.org/applicationAt the moment, running the annual EuroPython conference is a major task for the EPS. As such, the board members are expected to invest significant time and effort towards overseeing the smooth execution of the conference, ranging from venue selection, contract negotiations, and budgeting, to volunteer management. Every board member has the duty to support one or more EuroPython teams to facilitate decision-making and knowledge transfer.
In addition, the Society prioritises building a close relationship with local communities. Board members should not only be passionate about the Python community but have a high-level vision and plan for how the EPS could best serve the community.
How can you become an EPS 2024 board member?Any EPS member can nominate themselves for the EPS 2024 board. Nominations will be published prior to the GA.
Though the formal deadline for self-nomination is at the GA, it is recommended that you send in yours as early as possible (yes, now is a good time!) to board@europython.eu.We look forward to your email :)
📝 Feedback & NumbersThanks to everyone who filled in the feedback form! In total, 157 attendees gave their feedback, which represents around 13% of the onsite attendees and around 11% of total attendees. One caveat when reading the results below: it’s difficult to say whether this sample was representative of all attendees as we didn’t collect demographic data.
Satisfaction with the conferenceOn average, attendees let us know that they were very satisfied with the conference, with a mean overall satisfaction rating of 4.3. Moreover, attendees were satisfied with most specific aspects of the conference, including the venue (mean = 4.6), food (mean = 4.0), and the social event (mean = 4.0). Prague was a particularly popular choice of location, getting a mean rating of 4.7.
We also had a look to see which of these aspects were most strongly related to overall satisfaction with the conference. Using a Spearman correlation, we found that satisfaction with the food (rs = 0.20) and the social event (rs = 0.17) had the highest relationship with overall satisfaction with the conference. However, any fellow stats nerds reading this might have noticed that these are not particularly strong relationships, likely meaning that other factors we didn’t explicitly measure are driving how much people liked the conference.
If you’re interested in seeing more of the results we got from the feedback form, we published a blog post where we deep dive into everything we found in much more detail. And we promise there will be lots of pretty graphs!
https://blog.europython.eu/europython-2024-post-conference-feedback/
🦒 Speaker’s Mentorship ProgrammeIt was another successful year for our Speaker Mentorship Programme! Here are some key highlights from this year:
- Each mentee had the opportunity to receive personalized feedback, support, and guidance on their talk or proposal from an experienced mentor. We successfully supported 29 mentees, most from underrepresented communities, by pairing them with 29 seasoned mentors!
- Six mentees were given the opportunity to attend a public speaking workshop to further enhance their skills.
- On June 3rd, we held a fantastic first-time speakers&apos workshop where attendees engaged with experienced speakers, receiving valuable advice and feedback for their presentations.
Last but not least, a huge THANK YOU to all our mentors who volunteered their time to guide mentees in submitting their proposals and delivering their talks
🐍 PyLadies dayEuroPython this year had an entire day dedicated to PyLadies events. We started with Moderni Soberana giving a workshop on how to establish boundaries and stop abusive behaviour in society. This was followed up by the PyLadies lunch, sponsored by Kraken Technologies, that had 120 allies joining us for a truly empowerment session.
PyThe afternoon had a #IAmRemarkable workshop hosted by Lola Onipko! We also had a Meet & Greet session where beginners and experienced PyLadies shared knowledge and insights of the tech industry.
Picture by Deborah Foroni (PyLadies SP)💬 Python Organisers DiscussionWe had +35 community members joining us to discuss how the EuroPython Society can better support Python Communities.
✍️ Community write-upsIt warms our hearts to see posts from the community about their experience and stories this year! Here are some of them, please feel free to share yours by tagging us on socials @europython or mailing us at news@europython.eu
Anwesha Das about EuroPython 2024:
A conference that believes community matters, human values and feelings matter, and not afraid to walk the talk. And how the conference stood up to my expectations in every bit.Keep reading here: https://anweshadas.in/looking-back-to-euro-python-2024/
Grete Tungla, PyCon Estonia’s Head Organiser shares her insights from EuroPython 2024: https://www.linkedin.com/pulse/europython-2024-insights-from-pycon-estonias-head-organiser-
Jakub Cervinka shares how he was to participate in the Operations team organising EuroPython 2024: https://www.linkedin.com/pulse/thank-you-europython-2024-jakub-červinka-eusme
❤️ Thank you Volunteers & SponsorsYear after year EuroPython shines because of the hard work of our amazing team of volunteers!
But beyond the logistics and the schedules, it&aposs your smiles, your enthusiasm, and your genuine willingness to go the extra mile that truly made EuroPython 2024 truly special. Your efforts have not only fostered a sense of belonging among first time attendees but also exemplified the power of community and collaboration that lies at the heart of this conference. (And if you check out our blog post about the post-conference feedback, you&aposll see that community was the thing people reported liking most about EuroPython this year!)
Once again, thank you for being the backbone of EuroPython, for your dedication, and for showing the world yet again why people who come for the Python language end up staying for the amazing community :)
We built a page on our website to thank everyone for their effort on making EuroPython 2024 what it was! Check it out: https://ep2024.europython.eu/thank-you
And a special thank you to all of the Sponsors for all of their support!
Yay sponsors!Special thanks to StickerApp for the awesome stickers, Evolabel for shipping, Pretalx for the partnership and Kraken Technologies for the PyLadies lunch!
🎥 Conference Photos & VideosThe official conference photos are up on Flickr! Do not forget to tag us when you share your favourite clicks on your socials 😉.
https://www.flickr.com/photos/europython/albums/
While our team edits the conference videos, we&aposve put together a EuroPython 2024 livestream playlist with all the daily links. We hope this helps you easily find and enjoy the talks you want to catch up on Youtube.
We also have a sweet video featuring the amazing humans of EuroPython sharing why they volunteer!
🤝 Code of ConductCode of Conduct Transparency Report is now published on our website: https://www.europython-society.org/europython-2024-code-of-conduct-transparency-report/
🐍 Note from The PSFThe Python Software Foundation is proud to support EuroPython Prague 2024 with a grant in support of our mission to promote, protect, and advance the Python programming language and to support and facilitate the growth of a diverse and international community of Python programmers. We send congratulations and thanks to the organizers for their work to create a wonderful experience for the Python community!
The PSF is the non-profit charitable organization behind the Python language. We empower the Python community in a variety of ways including paying developers to work directly on CPython, PyPI, and security, hosting projects like PyLadies and Pallets, organizing PyCon US, and awarding community grants like this one. We welcome you to be a part of the PSF by signing up for PSF membership or supporting our mission and initiatives with a one-time, monthly, or annual donation. If your company uses Python and wants to support our community, you can find more information and submit a sponsor application on our website. We’re happy to answer any questions at sponsors@python.org.🗓️ Upcoming Events in the Python CommunityEuroPython might over but fret not there are a bunch of more PyCons happening
- PyCon Estonia https://pycon.ee/ 🇪🇪
- PyCon PT https://2024.pycon.pt/ 🇵🇹
- PyCon ES https://2024.es.pycon.org/ 🇪🇸
- PyCon SE https://www.pycon.se/ 🇸🇪
- PyCon IE https://python.ie/pycon-2024 🇮🇪
- PyLadiesCon https://conference.pyladies.com/ 🌎
- Swiss Python Summit https://www.python-summit.ch/ 🇨🇭
- EuroScipy: https://euroscipy.org/2024/ 🇪🇺
- PyCon FR: https://www.pycon.fr/2024/ 🇫🇷
- PyCon PL: https://pl.pycon.org/2024/ 🇵🇱
- PyCon NL: https://nl.pycon.org/ 🇳🇱
Enjoy a 30% discount for PyCon Estonia 2024 on Late Snake tickets with the code "EPSXPYCONEST24" over here: https://gateme.com/event/98762/
PyJok.es$ pip install pyjokes
$ pyjoke
Hardware: The part of a computer that you can kick.
Matthew Garrett: What the fuck is an SBAT and why does everyone suddenly care
Long version: When UEFI Secure Boot was specified, everyone involved was, well, a touch naive. The basic security model of Secure Boot is that all the code that ends up running in a kernel-level privileged environment should be validated before execution - the firmware verifies the bootloader, the bootloader verifies the kernel, the kernel verifies any additional runtime loaded kernel code, and now we have a trusted environment to impose any other security policy we want. Obviously people might screw up, but the spec included a way to revoke any signed components that turned out not to be trustworthy: simply add the hash of the untrustworthy code to a variable, and then refuse to load anything with that hash even if it's signed with a trusted key.
Unfortunately, as it turns out, scale. Every Linux distribution that works in the Secure Boot ecosystem generates their own bootloader binaries, and each of them has a different hash. If there's a vulnerability identified in the source code for said bootloader, there's a large number of different binaries that need to be revoked. And, well, the storage available to store the variable containing all these hashes is limited. There's simply not enough space to add a new set of hashes every time it turns out that grub (a bootloader initially written for a simpler time when there was no boot security and which has several separate image parsers and also a font parser and look you know where this is going) has another mechanism for a hostile actor to cause it to execute arbitrary code, so another solution was needed.
And that solution is SBAT. The general concept behind SBAT is pretty straightforward. Every important component in the boot chain declares a security generation that's incorporated into the signed binary. When a vulnerability is identified and fixed, that generation is incremented. An update can then be pushed that defines a minimum generation - boot components will look at the next item in the chain, compare its name and generation number to the ones stored in a firmware variable, and decide whether or not to execute it based on that. Instead of having to revoke a large number of individual hashes, it becomes possible to push one update that simply says "Any version of grub with a security generation below this number is considered untrustworthy".
So why is this suddenly relevant? SBAT was developed collaboratively between the Linux community and Microsoft, and Microsoft chose to push a Windows update that told systems not to trust versions of grub with a security generation below a certain level. This was because those versions of grub had genuine security vulnerabilities that would allow an attacker to compromise the Windows secure boot chain, and we've seen real world examples of malware wanting to do that (Black Lotus did so using a vulnerability in the Windows bootloader, but a vulnerability in grub would be just as viable for this). Viewed purely from a security perspective, this was a legitimate thing to want to do.
(An aside: the "Something has gone seriously wrong" message that's associated with people having a bad time as a result of this update? That's a message from shim, not any Microsoft code. Shim pays attention to SBAT updates in order to avoid violating the security assumptions made by other bootloaders on the system, so even though it was Microsoft that pushed the SBAT update, it's the Linux bootloader that refuses to run old versions of grub as a result. This is absolutely working as intended)
The problem we've ended up in is that several Linux distributions had not shipped versions of grub with a newer security generation, and so those versions of grub are assumed to be insecure (it's worth noting that grub is signed by individual distributions, not Microsoft, so there's no externally introduced lag here). Microsoft's stated intention was that Windows Update would only apply the SBAT update to systems that were Windows-only, and any dual-boot setups would instead be left vulnerable to attack until the installed distro updated its grub and shipped an SBAT update itself. Unfortunately, as is now obvious, that didn't work as intended and at least some dual-boot setups applied the update and that distribution's Shim refused to boot that distribution's grub.
What's the summary? Microsoft (understandably) didn't want it to be possible to attack Windows by using a vulnerable version of grub that could be tricked into executing arbitrary code and then introduce a bootkit into the Windows kernel during boot. Microsoft did this by pushing a Windows Update that updated the SBAT variable to indicate that known-vulnerable versions of grub shouldn't be allowed to boot on those systems. The distribution-provided Shim first-stage bootloader read this variable, read the SBAT section from the installed copy of grub, realised these conflicted, and refused to boot grub with the "Something has gone seriously wrong" message. This update was not supposed to apply to dual-boot systems, but did anyway. Basically:
1) Microsoft applied an update to systems where that update shouldn't have been applied
2) Some Linux distros failed to update their grub code and SBAT security generation when exploitable security vulnerabilities were identified in grub
The outcome is that some people can't boot their systems. I think there's plenty of blame here. Microsoft should have done more testing to ensure that dual-boot setups could be identified accurately. But also distributions shipping signed bootloaders should make sure that they're updating those and updating the security generation to match, because otherwise they're shipping a vector that can be used to attack other operating systems and that's kind of a violation of the social contract around all of this.
It's unfortunate that the victims here are largely end users faced with a system that suddenly refuses to boot the OS they want to boot. That should never happen. I don't think asking arbitrary end users whether they want secure boot updates is likely to result in good outcomes, and while I vaguely tend towards UEFI Secure Boot not being something that benefits most end users it's also a thing you really don't want to discover you want after the fact so I have sympathy for it being default on, so I do sympathise with Microsoft's choices here, other than the failed attempt to avoid the update on dual boot systems.
Anyway. I was extremely involved in the implementation of this for Linux back in 2012 and wrote the first prototype of Shim (which is now a massively better bootloader maintained by a wider set of people and that I haven't touched in years), so if you want to blame an individual please do feel free to blame me. This is something that shouldn't have happened, and unless you're either Microsoft or a Linux distribution it's not your fault. I'm sorry.
comments
qtatech.com blog: Automatiser les Déploiements de Sites Drupal avec CI/CD
With the constant evolution of Drupal, particularly with recent versions like Drupal 10 and Drupal 11, automating deployments has become essential to leverage new features and maintain an agile and reliable development cycle. This article will guide you through the coding approaches and techniques to automate the deployment of Drupal sites using CI/CD.
Implementing an Audio Mixer, Part 1
When using Qt Multimedia to play audio files, it’s common to use QMediaPlayer, as it supports a larger variety of formats than QSound and QSoundEffect. Consider a Qt application with several audio sources; for example, different notification sounds that may play simultaneously. We want to avoid cutting notification sounds off when a new one is triggered, and we don’t want to construct a queue for notification sounds, as sounds will play at the incorrect time. We instead want these sounds to overlap and play simultaneously.
Ideally, an application with audio has one output stream to the system mixer. This way in the mixer control, different applications can be set to different volume levels. However, a QMediaPlayer instance can only play one audio source at a time, so each notification would have to construct a new QMediaPlayer. Each player in turn opens its own stream to the system.
The result is a huge number of streams to the system mixer being opened and closed all the time, as well as QMediaPlayers constantly being constructed and destructed.
To resolve this, the application needs a mixer of its own. It will open a single stream to the system and combine all the audio into the one stream.
Before we can implement this, we first need to understand how PCM audio works.
PCMAs defined by Wikipedia:
Pulse-code modulation (PCM) is a method used to digitally represent sampled analog signals. It is the standard form of digital audio in computers, compact discs, digital telephony and other digital audio applications. In a PCM stream, the amplitude of the analog signal is sampled at uniform intervals, and each sample is quantized to the nearest value within a range of digital steps.
Here you can see how points are sampled in uniform intervals and quantized to the closest number that can be represented.
Description from Wikipedia: Sampling and quantization of a signal (red) for 4-bit LPCM over a time domain at specific frequency.
Think of a PCM stream as a humongous array of bytes. More specifically, it’s an array of samples, which are either integer or float values and a certain number of bytes in size. The samples are these discrete amplitude values from a waveform, organized contiguously. Think of the each element as being a y-value of a point along the wave, with the index representing an offset from x=0 at a uniform time interval.
Here is a graph of discretely sampled points along a sinusoidal waveform similar to the one above:
Image Source: Wikimedia Commons
Description from Wikimedia Commons: Image of a discrete time sinusoid
Let’s say we have an audio waveform that is a simple sine wave, like the above examples. Each point taken at discrete intervals along the curve here is a sample, and together they approximate a continuous waveform. The distance between the samples along the x-axis is a time delta; this is the sample period. The sample rate is the inverse of this, the number of samples that are played in one second. The typical standard sample rate for audio on CDs is 44100 Hz – we can’t really hear that this data is discrete (plus, the resultant sound wave from air movement is in fact a continuous waveform).
We also have to consider the y-axis here, which represents the amplitude of the waveform at each sampled point. In the image above, amplitude A is normalized such that A\in[−1,1]. In digital audio, there are a few different ways to represent amplitude. We can’t represent all real numbers on a computer, so the representation of the range of values varies in precision.
For example, let’s say we have two different representations of the wave above: 8-bit signed integer and 16-bit signed integer. The normalized value 1 from the image above maps to (2^{8}\div{2})−1=127 with 8-bit representation and (2^{16}\div2)−1=32767 with 16-bit. Therefore, with 16-bit representation, we have 128 times as many possible values to represent the same range; it is more precise, but the required size to store each 16-bit sample is double that of 8-bit samples.
We call the chosen representation, and thus the size of each sample, the bitdepth. Some common bitdepths are 16-bit int, 24-bit int, and 32-bit float, but there are many others in existence.
Let’s consider a huge stream of 16-bit samples and a sample rate of 44100 Hz. We write samples to the audio device periodically with a fixed-size buffer; let’s say it is 4096 bytes. The device will play each sample in the buffer at the aforementioned rate. Since each sample is a contiguous 2-byte short, we can fit 2048 samples into the buffer at once. We need to write 44100 samples in one second, so the whole buffer will be written around 21.5 times per second.
What if we have two different waveforms though, and what if one starts halfway through the other one? How do we mix them so that this buffer contains the data from both sources?
Waveform SuperimpositionIn the study of waves, you can superimpose two waves by adding them together. Let’s say we have two different discrete wave approximations, each represented by 20 signed 8-bit integer values. To superimpose them, for each index, add the values at that index. Some of these sums will exceed the limits of 8-bit representation, so we clamp them at the end to avoid signed integer overflow. This is known as hard clipping and is the phenomenon responsible for digital overdrive distortion.
x Wave 1 (y_1) Wave 2 (y_2) Sum (y_1+y_2) Clamped Sum 0 +60 −100 −40 −40 1 −120 +80 −40 −40 2 +40 +70 +110 +110 3 −110 −100 −210 −128 4 +50 −110 −60 −60 5 −100 +60 −40 −40 6 +70 +50 +120 +120 7 −120 −120 −240 −128 8 +80 −100 −20 −20 9 −80 +40 −40 −40 10 +90 +80 +170 +127 11 −100 −90 −190 −128 12 +60 −120 −60 −60 13 −120 +70 −50 −50 14 +80 −120 −40 −40 15 −110 +80 −30 −30 16 +90 −100 −10 −10 17 −110 +90 −20 −20 18 +100 −110 −10 −10 19 −120 −120 −240 −128Now let’s implement this in C++. We’ll start small, and just combine two samples.
Note: we will use qint types here, but qint16 will be the same as int16_t and short on most systems, and similarly qint32 will correspond to int32_t and int.
qint16 combineSamples(qint32 samp1, qint32 samp2) { const auto sum = samp1 + samp2; if (std::numeric_limits<qint16>::max() < sum) return std::numeric_limits<qint16>::max(); if (std::numeric_limits<qint16>::min() > sum) return std::numeric_limits<qint16>::min(); return sum; }This is quite a simple implementation. We use a function combineSamples and pass in two 16-bit values, but they will be converted to 32-bit as arguments and summed. This sum is clamped to the limits of 16-bit integer representation using std::numeric_limits in the <limits> header of the standard library. We then return the sum, at which point it is re-converted to a 16-bit value.
Combining Samples for an Arbitrary Number of Audio StreamsNow consider an arbitrary number of audio streams n. For each sample position, we must sum the samples of all n streams.
Let’s assume we have some sort of audio stream type (we’ll implement it later), and a list called mStreams containing pointers to instances of this stream type. We need to implement a function that loops through mStreams and makes calls to our combineSamples function, accumulating a sum into a new buffer.
Assume each stream in mStreams has a member function read(char *, qint64). We can copy one sample into a char * by passing it to read, along with a qint64 representing the size of a sample (bitdepth). Remember that our bitdepth is 16-bit integer, so this size is just sizeof(qint16).
Using read on all the streams in mStreams and calling combineSamples to accumulate a sum might look something like this:
qint16 accumulatedSum = 0; for (auto *stream : mStreams) { // call stream->read(char *, qint64) // to read a sample from the stream into streamSample qint16 streamSample; stream->read(reinterpret_cast<char *>(&streamSample), sizeof(qint16))); // accumulate accumulatedSum = combineSamples(sample, accumulatedSum); }The first pass will add samples from the first stream to zero, effectively copying it to accumulatedSum. When we move to another stream, the samples from the second stream will be added to those copied values from the first stream. This continues, so the call to combineSamples for a third stream would combine the third stream’s sample with the sum of the first two. We continue to add directly into the buffer until we have combined all the streams.
Combining All Samples for a BufferNow let’s use this concept to add all the samples for a buffer. We’ll make a function that takes a buffer char *data and its size qint64 maxSize. We’ll write our accumulated samples into this buffer, reading all samples from the streams and adding them using the method above.
The function signature looks like this:
void readData(char *data, qint64 maxSize);Let’s achieve more efficiency by using a constexpr variable for the bitdepth:
constexpr qint16 bitDepth = sizeof(qint16);There’s no reason to call sizeof multiple times, especially considering sizeof(qint16) can be evaluated as a literal at compile-time.
With the size of each sample and the size of the buffer, we can get the total number of samples to write:
const qint16 numSamples = maxSize / bitDepth;For each stream in mStreams we need to read each sample up to numSamples. As the sample index increments, a pointer to the buffer data needs to also be incremented, so we can write our results at the correct location in the buffer.
That looks like this:
void readData(char *data, qint64 maxSize) { // start with 0 in the buffer memset(data, 0, maxSize); constexpr qint16 bitDepth = sizeof(qint16); const qint16 numSamples = maxSize / bitDepth; for (auto *stream : mStreams) { // this pointer will be incremented across the buffer auto *cursor = reinterpret_cast<qint16 *>(data); qint16 sample; for (int i = 0; i < numSamples; ++i, ++cursor) if (stream->read(reinterpret_cast<char *>(&sample), bitDepth)) *cursor = combineSamples(sample, *cursor); } }The idea here is that we can start playing new audio sources by adding new streams to mStreams. If we add a second stream halfway through a first stream playing, the next buffer for the first stream will be combined with the first buffer of this new stream. When we’re done playing a stream, we just drop it from the list.
Next Steps
In Part 2, we’ll use Qt Multimedia to fully implement our mixer, connect to our audio device, and test it on some audio files.
About KDAB
If you like this article and want to read similar material, consider subscribing via our RSS feed.
Subscribe to KDAB TV for similar informative short video content.
KDAB provides market leading software consulting and development services and training in Qt, C++ and 3D/OpenGL. Contact us.
The post Implementing an Audio Mixer, Part 1 appeared first on KDAB.
Promet Source: Drupal vs SharePoint for State and Local Government
KDE ⚙️ Gear 24.08
Many of the new features in Dolphin are designed to make it easier to access and manage files and folders that require administrative privileges. Visual cues, wizards to help install needed software, and menu options to elevate your privileges make it easier than ever to use Dolphin as a superuser.
New usability features include:
- A new "Move to New Folder…" option that pops up when right-clicking a file, allowing you to create a folder and copy the file into it all in one go.
- Double-clicking the view background that triggers the "Select All" action by default.
Filelight is a complementary application to Dolphin, and can be installed directly from Dolphin by clicking the down arrow in the lower right corner of the main window.
Filelight helps visualize how much space your files and folders are taking up. Version 24.08 comes with a friendlier home page, and the Windows version (available from the Microsoft Store) has been redesigned to improve its overall appearance.
Konsole 24.08 also comes with a brand new usability enhancement: if you need to bookmark something important in a long output, double-click the scroll bar to set a position marker. You can then quickly scroll back and locate it later.
CreateKdenlive is KDE's professional video editor, and this new release is all about the curves.
You can now use the brand new keyframe curve editor to customize effects, and easing methods (Cubic in/out and Exponential in/out) for fades.
To make things easier, we've redesigned the effects stack widget and improved the Transform effect, which now lets you select clips directly from the monitor. It also comes with a new grid, and improved design and behavior for the handle.
TravelComing to Akademy 2024? Don't forget to install the updates for Itinerary and Kongress and make your journey easy.
Itinerary is KDE's travel assistant. It lets you plan and manage your trips, providing an overview of where you need to be and when. It keeps boarding passes, tickets, and health certificates all in one place, and the latest version adds more details, including seat information displayed directly in the timeline.
Once you arrive at your destination, it's time to fire up Kongress so you don't miss any of the sessions or activities. Kongress now makes things easier by providing indoor maps of the venue, so you not only know when and what's going on, but also where.
Both Itinerary and Kongress work on desktop and laptop computers and most mobile devices.
CommunicateNeoChat is KDE's client for the Matrix chat system — and KDE's official way of chatting. Version 24.08 increases your safety by allowing you to preemptively block invites from unknown users not in any rooms with you.
Tokodon not only helps you read and post on Mastodon, but also manage your own server. Speaking of which, the version being released today can notify you of sign-ups on your server for better user management.
For posting, you can easily attach images from the internet, quote other posts and pop out the text editor to comfortably compose your toot.
When reading, Tokodon 24.08 supports scrolling up whole screenfuls of posts using the PageUp and PageDown keys.
DevelopWhether you want to help KDE implement features and fix bugs, or develop the next killer app, KDE's advanced text editor Kate has you covered.
Kate 24.08 improves its document formatting plugin with better support for bash, d, fish, Nix config, opsi-script, QML and YAML files. In related news, the Language Server Protocol (LSP) feature adds support for the Gleam, PureScript, and Typst languages.
If you're working on a CMake-based project, the Project and Build plugin now allows you to open the build directory and get both files and targets.
SurfFalkon is KDE's full-featured web browser. The new release implements many bug fixes and optimizations that make surfing the web with Falkon smoother, easier and safer.
A new feature in 24.08 allows you to customize things that affect privacy and functionality on a site-by-site basis. Say you don't mind JavaScript on one site because the authors are trustworthy and it actually provides useful functionality, but you want to block it elsewhere for security reasons. You can now configure this in Falkon's Settings.
And all this too…- Okular — KDE's eco-certified document reader — improves compatibility for fillable forms in PDF documents, gets a makeover for Windows, and adds a more usable zoom feature.
- PlasmaTube — a player for watching online videos from popular sites on your desktop — adds an option to block sponsored sections in videos.
- Elisa — our elegant music player — gets a "Play this next" feature, and allows resizing of the sidebar and playlist panes.
Although we fully support distributions that ship our software, KDE Gear 24.08 apps will also be available on these Linux app stores shortly:
Flathub SnapcraftIf you'd like to help us get more KDE applications into the app stores, support more app stores and get the apps better integrated into our development process, come say hi in our All About the Apps chat room.
Kanopi Studios: All About Drupal 11
The next major Drupal version was just released — laying the foundation for its future. Drupal 11 was recently released on Drupal’s timeline. Unlike previous major versions, where releases needed to accommodate underlying technologies’ end of life like Symfony, Drupal 11 was released because it was the right time to solidify new features and free us […]
The post All About Drupal 11 appeared first on Kanopi Studios.
Darren Oh: How Drupal Forge began
parallel @ Savannah: GNU Parallel 20240822 ('Southport') released
GNU Parallel 20240822 ('Southport') has been released. It is available for download at: lbry://@GnuParallel:4
Quote of the month:
honestly the coolest software i've ever seen gotta be gnu parallel or
ffmpeg, nothing like them
-- @scootykins scoot
New in this release:
- --match Match input source with regexp to set replacement fields.
- {:%fmt} Use printf formatting of replacement strings.
- Bug fixes and man page updates.
News about GNU Parallel:
- Powerful GNU parallel, more than a loop https://www.linkedin.com/pulse/powerful-gnu-parallel-more-than-loop-zhenguo-zhang-18dxc
- How To Increase File Transfer Speed Using Parallel Rsync? https://contentbase.com/blog/increase-file-transfer-speed-parallel-rsync/
- Converting WebP Images to PNG Using parallel and dwebp https://bytefreaks.net/2024/07/27
- Turbocharging the Box CLI with GNU Parallel https://medium.com/box-developer-blog/turbocharging-the-box-cli-with-gnu-parallel-ee44c48811c0
GNU Parallel - For people who live life in the parallel lane.
If you like GNU Parallel record a video testimonial: Say who you are, what you use GNU Parallel for, how it helps you, and what you like most about it. Include a command that uses GNU Parallel if you feel like it.
GNU Parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. The typical input is a list of files, a list of hosts, a list of users, a list of URLs, or a list of tables. A job can also be a command that reads from a pipe. GNU Parallel can then split the input and pipe it into commands in parallel.
If you use xargs and tee today you will find GNU Parallel very easy to use as GNU Parallel is written to have the same options as xargs. If you write loops in shell, you will find GNU Parallel may be able to replace most of the loops and make them run faster by running several jobs in parallel. GNU Parallel can even replace nested loops.
GNU Parallel makes sure output from the commands is the same output as you would get had you run the commands sequentially. This makes it possible to use output from GNU Parallel as input for other programs.
For example you can run this to convert all jpeg files into png and gif files and have a progress bar:
parallel --bar convert {1} {1.}.{2} ::: *.jpg ::: png gif
Or you can generate big, medium, and small thumbnails of all jpeg files in sub dirs:
find . -name '*.jpg' |
parallel convert -geometry {2} {1} {1//}/thumb{2}_{1/} :::: - ::: 50 100 200
You can find more about GNU Parallel at: http://www.gnu.org/s/parallel/
You can install GNU Parallel in just 10 seconds with:
$ (wget -O - pi.dk/3 || lynx -source pi.dk/3 || curl pi.dk/3/ || \
fetch -o - http://pi.dk/3 ) > install.sh
$ sha1sum install.sh | grep 883c667e01eed62f975ad28b6d50e22a
12345678 883c667e 01eed62f 975ad28b 6d50e22a
$ md5sum install.sh | grep cc21b4c943fd03e93ae1ae49e28573c0
cc21b4c9 43fd03e9 3ae1ae49 e28573c0
$ sha512sum install.sh | grep ec113b49a54e705f86d51e784ebced224fdff3f52
79945d9d 250b42a4 2067bb00 99da012e c113b49a 54e705f8 6d51e784 ebced224
fdff3f52 ca588d64 e75f6033 61bd543f d631f592 2f87ceb2 ab034149 6df84a35
$ bash install.sh
Watch the intro video on http://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Walk through the tutorial (man parallel_tutorial). Your command line will love you for it.
When using programs that use GNU Parallel to process data for publication please cite:
O. Tange (2018): GNU Parallel 2018, March 2018, https://doi.org/10.5281/zenodo.1146014.
If you like GNU Parallel:
- Give a demo at your local user group/team/colleagues
- Post the intro videos on Reddit/Diaspora*/forums/blogs/ Identi.ca/Google+/Twitter/Facebook/Linkedin/mailing lists
- Get the merchandise https://gnuparallel.threadless.com/designs/gnu-parallel
- Request or write a review for your favourite blog or magazine
- Request or build a package for your favourite distribution (if it is not already there)
- Invite me for your next conference
If you use programs that use GNU Parallel for research:
- Please cite GNU Parallel in you publications (use --citation)
If GNU Parallel saves you money:
- (Have your company) donate to FSF https://my.fsf.org/donate/
GNU sql aims to give a simple, unified interface for accessing databases through all the different databases' command line clients. So far the focus has been on giving a common way to specify login information (protocol, username, password, hostname, and port number), size (database and table size), and running queries.
The database is addressed using a DBURL. If commands are left out you will get that database's interactive shell.
When using GNU SQL for a publication please cite:
O. Tange (2011): GNU SQL - A Command Line Tool for Accessing Different Databases Using DBURLs, ;login: The USENIX Magazine, April 2011:29-32.
GNU niceload slows down a program when the computer load average (or other system activity) is above a certain limit. When the limit is reached the program will be suspended for some time. If the limit is a soft limit the program will be allowed to run for short amounts of time before being suspended again. If the limit is a hard limit the program will only be allowed to run when the system is below the limit.
DrupalEasy: DrupalEasy Podcast S17E2 - Janez Urevc - Gander
We talk with Janez Urevc from Tag1 Consulting about Gander, an open-source automated performance testing framework.
URLs mentioned- Gander (includes many helpful links!)
- 40-minute video presentation introducing Gander (including a demo)
- Drupal.org docs page for performance testing with Gander
- https://gander.tag1.io - dashboard that Tag1 hosts for the Drupal community
- Two-hour Gander workshop at DrupalCon Barcelona on September 24, 2024
Professional module development - 15 weeks, 90 hours, live, online course.
Drupal Career Online - 12 weeks, 77 hours, live online, beginner-focused course.
We're using the machine-driven Amazon Transcribe service to provide an audio transcript of this episode.
SubscribeSubscribe to our podcast on iTunes, iHeart, Amazon, YouTube, or Spotify.
If you'd like to leave us a voicemail, call 321-396-2340. Please keep in mind that we might play your voicemail during one of our future podcasts. Feel free to call in with suggestions, rants, questions, or corrections. If you'd rather just send us an email, please use our contact page.
CreditsPodcast edited by Amelia Anello.
Jonathan Dowland: Fediverse and feeds
It's clear that Twitter has been circling the drain for years, but things have been especially bad in recent times. I haven't quit (I have some sympathy with the viewpoint don't cede territory to fascists) but I try to read it much less, and I certainly post much less.
Especially at the moment, I really appreciate distractions.
Last time I wrote about Mastodon (by which I meant the Fediverse1), I was looking for a new instance to try. I settled on Debian's social instance2. I'm now trying to put any energy I might spend engaging on Twitter, into engaging in the Fediverse instead. (You can follow me via the handle @jon@dow.land, I think, which should repoint to my actual handle, @jmtd@pleroma.debian.social.)
There are other potential successors to Twitter: two big ones are Bluesky and Facebook-owned Threads. They are effectively cookie-cutter copies of the Twitter model, and so, we will repeat the same mistakes there. Sadly I see the majority of communities and sub-cultures I follow are migrating to one or the other of these.
The Fediverse (or the Mastodon-ish bits of it) should avoid the fate of Twitter. JWZ puts it better and more succinctly than I can.
The Fedi experience is, sadly, pretty clunky. So I want to try and write a bit from time to time with tips and tricks that might improve people's experiences.
First up, something I discovered only today about Mastodon instances. As JWZ noted, If you are worried about picking the "right" Mastodon instance, don't. Just spin the wheel.. You can spend too much time trying to guess a good answer to this. Better to just get started.
At the same time, individual instances are supposed to cater to specific niches. So it could be useful to sample the public posts from an entire instance. For example, to find people to follow, or decide to hop over to that instance yourself. You can't (I think) follow an entire instance from within yours, but, they usually have a public page which shows you the latest traffic.
For example, the infosec-themed instance infosec.exchange has one here: https://infosec.exchange/public/local
These pages don't provide RSS or Atom feeds3, sadly. I hope that's on the software's roadmap, and hasn't been spurned for ideological reasons. For now at least, OpenRSS provide RSS/Atom feeds for many Mastodon instances. For example, an RSS/Atom feed of the above: https://openrss.org/infosec.exchange/public/local
One can add these feeds to your Feed reader and over time get a flavour for the kind of discourse that takes place on given instances.
I think the OpenRSS have to manually add Mastodon instances to their service. I tried three instances and only one (infosec.exchange) worked. I'm not sure but I think trying an instance that doesn't work automatically puts it on OpenRSS's backlog.
- the Fediverse-versus-Mastodon nomenclature problem is just the the tip of the iceberg, in terms of adoption problems. Mastodon provides a twitter-like service that participates in the Fediverse. But it isn't correct to call the twitter-like service "Mastodon" because other softwares also participate in/provide that service. And it's not correct to call it "Fediverse" because that describes a bigger thing, with e.g. youtube clones also taking part. I'm not sure what the right term should be for "the twitter-like thing". Also, everything I wrote here is probably subtly wrong.↩
- Debian's instance actually runs Pleroma, an alternative to Mastodon. Why should it matter? I think it's healthy for there to be more than one implementation in an open ecosystem. However the experience can be janky, as the features don't perfectly align, some Mastodon features/APIs are not documented/standardised/etc.↩
- I have to remind myself that the concept of RSS/Atom feeds and Feedreaders might need explaining to a modern audience too. Perhaps in another blog post.↩
Twin Cities Drupal Camp: Introducing Lightning Talks
We've added a fun new event to our conference this year – Lightning Talks!
Some of you may be familiar with the concept, but we'll explain it here, as well as how we're planning to do it. (Note that this an in-person event for registered attendees only.)
What are Lightning TalksA lightning talk is a very short presentation lasting only a few minutes. Each speaker gets a maximum of 5 minutes to present on a topic of your choice. You will have access to the projector for slides.
Because these are very short and fast presentations (thus the "lightning" part), it's meant to be brief, snappy, and fun. We'll rotate quickly between speakers to keep things moving and entertaining.
These are not the same as the 45 minute sessions held during the rest of Camp. Speakers will need to focus on a single message or just a few quick key points.
Speakers can talk about serious and technical topics, or they can do something lighthearted and silly too. This is always a fun event!
When Will They Be Held?We're going to hold the lightning talks between 3-4pm on the first day of Camp (Thu, Sep 12), in the West Wing (the big room). Right before happy hour!
How Can I Participate?We've made a form to gather signups ahead of time. We'll only have time for 10-12 presenters, so we may not be able to include every submission. Sign up today!
Do I Have to Participate?Talk of group presentations may be triggering for some people, but don't worry! No one is required to participate. Having you in the audience is all we ask.
Additional Resources- What are Lightning Talks by the University of North Carolina
Python Anywhere: Belated announcement of latest updates
Here is a slightly delayed (and short) run-down of the new stuff that we deployed recently.
The main change for this update is that we have updated the underlying OS running PythonAnywhere to Ubuntu 22.04. This is an LTS release so it will be supported for some time to come. This will not affect user environments, but it is setting us up for a new user environment that should be coming soon.
We have also:
- Started the process of updating our file servers to be more robust
- Improved our alerting so that we are alerted to many new forms of failure on PythonAnywhere
- Made some improvements to the ASGI beta systems and their documentation
- Fixed a number of security issues
- Fixed various bugs
Real Python: Primer on Jinja Templating
Templates are an essential ingredient in full-stack web development. With Jinja, you can build rich templates that power the front end of your Python web applications.
But you don’t need to use a web framework to experience the capabilities of Jinja. When you want to create text files with programmatic content, Jinja can help you out.
In this tutorial, you’ll learn how to:
- Install the Jinja template engine
- Create your first Jinja template
- Render a Jinja template in Flask
- Use for loops and conditional statements with Jinja
- Nest Jinja templates
- Modify variables in Jinja with filters
- Use macros to add functionality to your front end
You’ll start by using Jinja on its own to cover the basics of Jinja templating. Later you’ll build a basic Flask web project with two pages and a navigation bar to leverage the full potential of Jinja.
Throughout the tutorial, you’ll build an example app that showcases some of Jinja’s wide range of features. To see what it’ll do, skip ahead to the final section.
You can also find the full source code of the web project by clicking on the link below:
Source Code: Click here to download the source code that you’ll use to explore Jinja’s capabilities.
Take the Quiz: Test your knowledge with our interactive “Primer on Jinja Templating” quiz. You’ll receive a score upon completion to help you track your learning progress:
Interactive Quiz
Primer on Jinja TemplatingIn this quiz, you'll test your understanding of Jinja templating. Jinja is a powerful tool for building rich templates in Python web applications, and it can also be used to create text files with programmatic content.
This tutorial is for you if you want to learn more about the Jinja template language or if you’re getting started with Flask.
Get Started With JinjaJinja is not only a city in the Eastern Region of Uganda and a Japanese temple, but also a template engine. You commonly use template engines for web templates that receive dynamic content from the back end and render it as a static page in the front end.
But you can use Jinja without a web framework running in the background. That’s exactly what you’ll do in this section. Specifically, you’ll install Jinja and build your first templates.
Install JinjaBefore exploring any new package, it’s a good idea to create and activate a virtual environment. That way, you’re installing any project dependencies in your project’s virtual environment instead of system-wide.
Select your operating system below and use your platform-specific command to set up a virtual environment:
Windows PowerShell PS> python -m venv venv PS> .\venv\Scripts\activate (venv) PS> Copied! Shell $ python -m venv venv $ source venv/bin/activate (venv) $ Copied!With the above commands, you create and activate a virtual environment named venv by using Python’s built-in venv module. The parentheses (()) surrounding venv in front of the prompt indicate that you’ve successfully activated the virtual environment.
After you’ve created and activated your virtual environment, it’s time to install Jinja with pip:
Shell (venv) $ python -m pip install Jinja2 Copied!Don’t forget the 2 at the end of the package name. Otherwise, you’ll install an old version that isn’t compatible with Python 3.
It’s worth noting that although the current major version is actually greater than 2, the package that you’ll install is nevertheless called Jinja2. You can verify that you’ve installed a modern version of Jinja by running pip list:
Shell (venv) $ python -m pip list Package Version ---------- ------- Jinja2 3.x ... Copied!To make things even more confusing, after installing Jinja with an uppercase J, you have to import it with a lowercase j in Python. Try it out by opening the interactive Python interpreter and running the following commands:
Read the full article at https://realpython.com/primer-on-jinja-templating/ »[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
Drupal Association blog: Why I'm a Ripplemaker.. by Nikki Flores
I first learned Drupal in 2008 and it was the backbone of one of my consulting service's very first sites. We used the Quiz module and built online certifications based on scoring. This feature was a huge hit for that client and carried them forward, such that they were able to get acquired and eventually retire.
Implementing sites on Drupal carried my partner and I through a decade of consulting as we built out websites for national and international public agencies, nonprofits, membership organizations, and e-commerce. Drupal fed my family through a variety of projects, paid for my healthcare, tuition, and our housing.
In my current role at Lullabot as an employee-owner, Drupal has continued to evolve. We use Drupal for our enterprise projects, for state and local governments as well as education and publishing clients: my current client, a state government, is converting their agencies into a unified platform.
From the community side, I organized the first DrupalCamp in Hawaii and am nearing the end of my elected term as a community board member for the Drupal Association, and continue to speak on panels, coordinate, organize, and present at DrupalCon and other events.
There is nothing that I do that is special: anyone else in the Drupal community can, and is welcome, to contribute, connect, and engage to the level that they have energy to do so, I recognize how fortunate I am to work at a company that invests in Drupal and also supports "internal time" so we continue to learn, grow, and develop our skills.
Because I have gained so much from Drupal, I'm very happy to encourage you to join me as a Ripple Maker—making a monthly contribution, and in my case, making an allocation to the Drupal Association as a beneficiary from my estate, helps me know that the core values of our community: collaboration, questioning and commenting, making items incrementally better, and continuing to encourage the next generation, will last.
Nikki Flores
Senior Technical Project Manager
Lullabot
Real Python: Quiz: Primer on Jinja Templating
In this quiz, you’ll test your understanding of Jinja templating. Jinja is a powerful tool for building rich templates in Python web applications, and it can also be used to create text files with programmatic content.
[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
Droptica: 10 SEO Features a Modern CMS Should Have. Using Drupal as an Example
In this blog post, I'll introduce ten SEO features that every modern CMS should have and show you how easy it is to implement them in Drupal. So, if you have an existing website, you'll easily see what you're missing. And if you're just planning to build a new one, you'll get a ready-made list of features to copy to your web specifications and requirements. I invite you to read the article or watch an episode of the “Nowoczesny Drupal” series (the video is in Polish).
Drupal.org blog: GitLab CI templates will make Drupal 11 the default version to run
Whenever a new major version of Drupal is released, we update Drupal's GitLab CI testing templates to automatically update the versions being tested. Here's an outline of our plan:
Where we are nowDrupal 11 was released on August 6th. You can learn more about it on the Drupal 11 landing page.
This means that we are in the middle of a transition period where many sites and modules will want to be in Drupal 11, whereas some others might still want to stay in Drupal 10.
From a GitLab CI point of view, testing for both Drupal 10 and 11 simultaneously has been available for months, providing module maintainers with a great tool to test their code before Drupal 11 was launched.
This was available by setting one variable in the .gitlab-ci.yml like this:
variables: OPT_IN_TEST_NEXT_MAJOR: 1Many maintainers have leveraged this already and we can see many modules already claiming full Drupal 11 support within days of the release. To be more specific, as of August 20th, 2870 projects have no compatibility errors anymore, and 1720 have made Drupal 11 compatible releases.
Where we want to beWe are preparing to update the default testing configuration for the GitLab CI templates, but we want to make sure to continue to support maintainers who still need to test against Drupal 10 and 11. We've outlined the changes we'll be making and the timeline below.
As of today:
- Current version (default) is Drupal 10
- Next major version is Drupal 11
- Previous major version is Drupal 9
When we do the shift, this will change to:
- Current version (default) is Drupal 11
- Next major version will be Drupal 12 (when development starts) - see note below.
- Previous major version is Drupal 10
For modules that were testing Drupal 10 and Drupal 11 simultaneously, the change will be as easy as this:
variables: # OPT_IN_TEST_NEXT_MAJOR: 1 OPT_IN_TEST_PREVIOUS_MAJOR: 1Instead of opting in to test the next major, all you need to do is opt into the previous major.
Note: Drupal 12 development branch does not exist yet. Enabling this version might not do anything until this branch is created.
StepsWe are actively working on making the above switch in this issue: Update templates so 11.0 is the default/current branch.
We are going to be taking the following steps in the coming days / weeks.
Step 1: Make all modules start testing against Drupal 11We will set the default value for OPT_IN_TEST_NEXT_MAJOR to 1 temporarily, and release version 1.5.6 of the templates. This will automatically become the default for all Contrib.
Modules that have not yet tested their code against Drupal 11 will now see "Next Major" test jobs in their pipelines, in addition to the "current" Drupal 10 variant. These new jobs have allow_failure: true, so the overall result of the pipelines should not change. This should show a good sense of where the module is at in relation to Drupal 11. Maintainers can still override the variable to be 0 if they don't want this behavior.
The expected date for this change is: August 26th, 2024 (next Monday)
Step 2: Roll out the shift and make it available for ContribWhen the issue Update templates so 11.0 is the default/current branch and all its dependencies all sorted, we will deploy the changes and create a new release 1.6.0. This will be available to Contrib projects using "gitlab ref" main or 1.x-latest
The expected date for this change is: September 5th, 2024 (2 weeks from now)
Step 3: Make the shift default for all ContribThen we will make this new release be the default for all contrib projects automatically.
However, we have provided several alternatives for modules that don't want to do the shift at this point. Any of the following can be used:
- You can pin the version of the templates for your module to 1.5.6. This is the latest version released before the switch. Learn more about pinning the templates version in this page. Note that this means you will not get any updates to the templates for new features or bug fixes, until you un-pin the release.
- You can set OPT_IN_TEST_PREVIOUS_MAJOR to 1 and OPT_IN_TEST_CURRENT to 0 to continue testing Drupal 10 and not Drupal 11.
- You can configure your own variants as described on this page.
- You can tweak the key variables used when creating variants so they have the versions that you desire. Check the above link for that information.
For those wanting to do the shift, you will not need to do anything at all.
The expected date for this change is: September 12th, 2024 (3 weeks from now)
After the shift is madeOnwards and upwards, that means that Drupal 11 is the default version to be tested for all new issues, merge requests, and pipelines for all contrib projects, allowing us to keep the Drupal ecosystem up to date and relevant.
There are some issues that are not blockers for this change, but are related, so we encourage you to see the issue list before reporting anything new, but otherwise create a new issue if you discover a problem and don't find it in the queue.
PyCharm: How to Build Chatbots With LangChain
This is a guest post from Dido Grigorov, a deep learning engineer and Python programmer with 17 years of experience in the field.
Chatbots have evolved far beyond simple question-and-answer tools. With the power of large language models (LLMs), they can understand the context of conversations and generate human-like responses, making them invaluable for customer support applications and other types of virtual assistance.
LangChain, an open-source framework, streamlines the process of building these conversational chatbots by providing tools for seamless model integration, context management, and prompt engineering.
In this blog post, we’ll explore how LangChain works and how chatbots interact with LLMs. We’ll also guide you step by step through building a context-aware chatbot that delivers accurate, relevant responses using LangChain and GPT-3.
What are the chatbots in the realm of LLMs?Chatbots in the field of LLMs are cutting-edge software that simulate human-like conversations with users through text or voice interfaces. These chatbots exploit the advanced capabilities of LLMs, which are neural networks trained on huge amounts of text data which allows them to produce human-like responses to a wide range of input prompts.
One among all other matters is that LLM-based chatbots can take a conversation’s context into account when generating a response. This means they can keep coherence across several exchanges and can process complex queries to produce outputs that are in line with the users’ intentions. Additionally, these chatbots assess the emotional tone of a user’s input and adjust their responses to match the user’s sentiments.
Chatbots are highly adaptable and personalized. They learn from how users interact with them thus improving on their responses by adjusting them according to individual preferences and needs.
What is LangChain?LangChain is a framework that’s open-source developed for creating apps that use large language models (LLMs). It comes with tools and abstractions to better personalize the information produced from these models while maintaining accuracy and relevance.
One common term you can see when you read about LLMs is “prompt chains”. A prompt chain refers to a sequence of prompts or instructions used in the context of artificial intelligence and machine learning, with the purpose to guide the AI model through a multi-step process to generate more accurate, detailed, or refined outputs. This method can be employed for various tasks, such as writing, problem-solving, or generating code.
Developers can create new prompt chains using LangChain, which is one of the strongest sides of the framework. They can even modify existing prompt templates without needing to train the model again when using new datasets.
How does LangChain work?LangChain is a framework designed to simplify the development of applications that utilize language models. It offers a suite of tools that help developers efficiently build and manage applications that involve natural language processing (NLP) and Large Language Models. By defining the steps needed to achieve the desired outcome (this might be a chatbot, task automation, virtual assistant, customer support, and even more), developers can adapt language models flexibly to specific business contexts using LangChain.
Here’s a high-level overview of how LangChain works.
Model integrationLangChain supports various Language models including those from OpenAI, Hugging Face, Cohere, Anyscale, Azure Models, Databricks, Ollama, Llama, GPT4All, Spacy, Pinecone, AWS Bedrock, MistralAI, among others. Developers can easily switch between different models or use multiple models in one application. They can build custom-developed model integration solutions, which allow developers to take advantage of specific capabilities tailored to their specific applications.
ChainsThe core concept of LangChain is chains, which bring together different AI components for context-aware responses. A chain represents a set of automated actions between a user prompt and the final model output. There are two types of chains provided by LangChain:
- Sequential chains: These chains enable the output of a model or function to be used as an input for another one. This is particularly helpful in making multi-step processes that depend on each other.
- Parallel chains: It allows for simultaneous running of multiple tasks, with their outputs merged at the end. This makes it perfect for doing tasks that can be divided into subtasks that are completely independent.
LangChain facilitates the storage and retrieval of information across various interactions. This is essential where there is need for persistence of context such as with chat-bots or interactive agents. There are also two types of memory provided:
- Short-term memory – Helps keep track of recent sessions.
- Long-term memory – Allows retention of information from previous sessions enhancing system recall capability on past chats and user preferences.
LangChain provides many tools, but the most used ones are Prompt Engineering, Data Loaders and Evaluators. When it comes to Prompt Engineering, LangChain contains utilities to develop good prompts, which are very important in getting the best responses from language models.
If you want to load up files like csv, pdf or other format, Data Loaders are here to help you to load and pre-process different types of data hence making them usable in model interactions.
Evaluation is an essential part of working with machine learning models and large language models. That’s why LangChain provides Evaluators – tools used for testing language models and chains so that generated results meet the required criteria, which might include:
Datasets criteria:
- Manually curated examples: Start with high-quality, diverse inputs.
- Historical logs: Use real user data and feedback.
- Synthetic data: Generate examples based on initial data.
Types of evaluations:
- Human: Manual scoring and feedback.
- Heuristic: Rule-based functions, both reference-free and reference-based.
- LLM-as-judge: LLMs score outputs based on encoded criteria.
- Pairwise: Compare two outputs to pick the better one.
Application evaluations:
- Unit tests: Quick, heuristic-based checks.
- Regression testing: Measure performance changes over time.
- Back-testing: Re-run production data on new versions.
- Online evaluation: Evaluate in real-time, often for guardrails and classifications.
Agents
LangChain agents are essentially autonomous entities that leverage LLMs to interact with users, perform tasks, and make decisions based on natural language inputs.
Action-driven agents use language models to decide on optimal actions for predefined tasks. On the other side interactive agents or interactive applications such as chatbots make use of these agents, which also take into account user input and stored memory when responding to queries.
How do chatbots work with LLMs?LLMs underlying chatbots use Natural Language Understanding (NLU) and Natural Language Generation (NLG), which are made possible through pre-training of models on vast textual data.
Natural Language Understanding (NLU)- Context awareness: LLMs can understand the subtlety and allusions in a conversation, and they can keep track of the conversation from one turn to the next. This makes it possible for the chatbots to generate logical and contextually appropriate responses to the clients.
- Intent recognition: These models should be capable of understanding the user’s intent from their queries, whether the language is very specific or quite general. They can discern what the user wants to achieve and determine the best way to help them reach that goal.
- Sentiment analysis: Chatbots can determine the emotion of the user through the tone of language used and adapt to the user’s emotional state, which increases the engagement of the user.
- Response generation: When LLMs are asked questions, the responses they provide are correct both in terms of grammar and the context. This is because the responses that are produced by these models mimic human communication, due to the training of the models on vast amounts of natural language textual data.
- Creativity and flexibility: Apart from simple answers, LLM-based chatbots can tell a story, create a poem, or provide a detailed description of a specific technical issue and, therefore, can be considered to be very flexible in terms of the provided material.
- Learning from interactions: Chatbots make the interaction personalized because they have the ability to learn from the users’ behavior, as well as from their choices. It can be said that it is constantly learning, thereby making the chatbot more effective and precise in answering questions.
- Adaptation to different domains: The LLMs can be tuned to particular areas or specialties that allow the chatbots to perform as subject matter experts in customer relations, technical support, or the healthcare domain.
LLMs are capable of understanding and generating text in multiple languages, making them suitable for applications in diverse linguistic contexts.
Building your own chatbot with LangChain in five stepsThis project aims to build a chatbot that leverages GPT-3 to search for answers within documents. First, we scrape content from online articles, split them into small chunks, compute their embeddings, and store them in Deep Lake. Then, we use a user query to retrieve the most relevant chunks from Deep Lake, which are incorporated into a prompt for generating the final answer with the LLM.
It’s important to note that using LLMs carries a risk of generating hallucinations or false information. While this may be unacceptable for many customer support scenarios, the chatbot can still be valuable for assisting operators in drafting answers that they can verify before sending to users.
Next, we’ll explore how to manage conversations with GPT-3 and provide examples to demonstrate the effectiveness of this workflow
Step 1: Project creation, prerequisites, and required library installationFirst create your PyCharm project for the chatbot. Open up Pycharm and click on “new project”. Then give a name of your project.
Once ready with the project set up, generate your `OPENAI_API_KEY` on the OpenAI API Platform Website, once you are logged in (or sign up on the OpenAI website for that purpose). To do that go to the “API Keys” section on the left navigation menu and then click on the button “+Create new secret key”. Don’t forget to copy your key.
After that get your `ACTIVELOOP_TOKEN` by signing up on the Activeloop website. Once logged in, just click on the button “Create API Token” and you’ll be navigated to the token creation page. Copy this token as well.
Once you have both the token and the key, open your configuration settings in PyCharm, by clicking on the 3 dots button next to the run and debug buttons, and choose “Edit”. You should see the following window:
Now locate the field “Environment variables” and find the icon on the right side of the field. Then click there – you’ll see the following window:
And now by clicking the + button start adding your environmental variables and be careful with their names. They should be the same as mentioned above: `OPENAI_API_KEY` and `ACTIVELOOP_TOKEN`. When ready just click OK on the first window and then “Apply” and “OK” on the second one.
That’s a very big advantage of PyCharm and I very much love it, because it handles the environment variables for us automatically without the requirement for additional calls to them, allowing us to think more about the creative part of the code.
Note: ActiveLoop is a technology company that focuses on developing data infrastructure and tools for machine learning and artificial intelligence. The company aims to streamline the process of managing, storing, and processing large-scale datasets, particularly for deep learning and other AI applications.
DeepLake is an ActiveLoop’s flagship product. It provides efficient data storage, management, and access capabilities, optimized for large-scale datasets often used in AI.
Install the required librariesWe’ll use the `SeleniumURLLoader` class from LangChain, which relies on the `unstructured` and `selenium` Python libraries. Install these using pip. It is recommended to install the latest version, although the code has been specifically tested with version 0.7.7.
To do that use the following command in your PyCharm terminal:
pip install unstructured seleniumNow we need to install langchain, deeplake and openai. To do that just use this command in your terminal (same window you used for Selenium) and wait a bit until everything is successfully installed:
pip install langchain==0.0.208 deeplake openai==0.27.8 psutil tiktokenTo make sure all libraries are properly installed, just add the following lines needed for our chatbot app and click on the Run button:
from langchain.embeddings.openai import OpenAIEmbeddings from langchain.vectorstores import DeepLake from langchain.text_splitter import CharacterTextSplitter from langchain import OpenAI from langchain.document_loaders import SeleniumURLLoader from langchain import PromptTemplateAnother way to install your libraries is through the settings of PyCharm. Open them and go to the section Project -> Project Interpreter. Then locate the + button, search for your package and hit the button “Install Package”. Once ready, close it, and on the next window click “Apply” and then “OK”.
Step 2: Splitting content into chunks and computing their embeddingsAs previously mentioned, our chatbot will “communicate” with content coming out of online articles, that’s why I picked Digitaltrends.com as my source of data and selected 8 articles to start. All of them are organized into a Python list and assigned to a variable called “articles”.
articles = ['https://www.digitaltrends.com/computing/claude-sonnet-vs-gpt-4o-comparison/', 'https://www.digitaltrends.com/computing/apple-intelligence-proves-that-macbooks-need-something-more/', 'https://www.digitaltrends.com/computing/how-to-use-openai-chatgpt-text-generation-chatbot/', 'https://www.digitaltrends.com/computing/character-ai-how-to-use/', 'https://www.digitaltrends.com/computing/how-to-upload-pdf-to-chatgpt/']We load the documents from the provided URLs and split them into chunks using the `CharacterTextSplitter` with a chunk size of 1000 and no overlap:
# Use the selenium to load the documents loader = SeleniumURLLoader(urls=articles) docs_not_splitted = loader.load() # Split the documents into smaller chunks text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0) docs = text_splitter.split_documents(docs_not_splitted)If you run the code till now you should receive the following output, if everything works well:
[Document(page_content="techcrunch\n\ntechcrunch\n\nWe, TechCrunch, are part of the Yahoo family of brandsThe sites and apps that we own and operate, including Yahoo and AOL, and our digital advertising service, Yahoo Advertising.Yahoo family of brands.\n\n When you use our sites and apps, we use \n\nCookiesCookies (including similar technologies such as web storage) allow the operators of websites and apps to store and read information from your device. Learn more in our cookie policy.cookies to:\n\nprovide our sites and apps to you\n\nauthenticate users, apply security measures, and prevent spam and abuse, and\n\nmeasure your use of our sites and apps\n\n If you click '", metadata={'source': ……………]
Next, we generate the embeddings using OpenAIEmbeddings and save them in a DeepLake vector store hosted in the cloud. Ideally, in a production environment, we could upload an entire website or course lesson to a DeepLake dataset, enabling searches across thousands or even millions of documents.
By leveraging a serverless Deep Lake dataset in the cloud, applications from various locations can seamlessly access a centralized dataset without the necessity of setting up a vector store on a dedicated machine.
Why do we need embeddings and documents in chunks?When building chatbots with Langchain, embeddings and chunking documents are essential for several reasons that relate to the efficiency, accuracy, and performance of the chatbot.
Embeddings are vector representations of text (words, sentences, paragraphs, or documents) that capture semantic meaning. They encapsulate the context and meaning of words in a numerical form. This allows the chatbot to understand and generate responses that are contextually appropriate by capturing nuances, synonyms, and relationships between words.
Thanks to the embeddings, the chatbot can also quickly identify and retrieve the most relevant responses or information from a knowledge base, because they allow matching user queries with the most semantically relevant chunks of information, even if the wording differs.
Chunking, on the other side, involves dividing large documents into smaller, manageable pieces or chunks. Smaller chunks are faster to process and analyze compared to large, monolithic documents. This results in quicker response times from the chatbot.
Document chunking helps also with the relevancy of the output, because when a user asks a question, it is often only in a specific part of a document. Chunking allows the system to pinpoint and retrieve just the relevant sections and the chatbot can provide more precise and accurate answers.
Now let’s get back to our application and let’s update the following code by including your Activeloop organization ID. Keep in mind that, by default, your organization ID is the same as your username.
# TODO: use your organization id here. (by default, org id is your username) my_activeloop_org_id = "didogrigorov" my_activeloop_dataset_name = "jetbrains_article_dataset" dataset_path = f"hub://{my_activeloop_org_id}/{my_activeloop_dataset_name}" db = DeepLake(dataset_path=dataset_path, embedding_function=embeddings) # add documents to our Deep Lake dataset db.add_documents(docs)Another great feature of PyCharm I love is the option TODO notes to be added directly in Python comments. Once you type TODO with capital letters, all notes go to a section of PyCharm where you can see them all:
# TODO: use your organization id here. (by default, org id is your username)You can click on them and PyCharm directly shows you where they are in your code. I find it very convenient for developers and use it all the time:
If you execute the code till now you should see the following output, if everything works normal:
To find the most similar chunks to a given query, we can utilize the similarity_search method provided by the Deep Lake vector store:
# Check the top relevant documents to a specific query query = "how to check disk usage in linux?" docs = db.similarity_search(query) print(docs[0].page_content) Step 3: Let’s build the prompt for GPT-3We will design a prompt template that integrates role-prompting, pertinent Knowledge Base data, and the user’s inquiry. This template establishes the chatbot’s persona as an outstanding customer support agent. It accepts two input variables: chunks_formatted, containing the pre-formatted excerpts from articles, and query, representing the customer’s question. The goal is to produce a precise response solely based on the given chunks, avoiding any fabricated or incorrect information.
Step 4: Building the chatbot functionalityTo generate a response, we begin by retrieving the top-k (e.g., top-3) chunks that are most similar to the user’s query. These chunks are then formatted into a prompt, which is sent to the GPT-3 model with a temperature setting of 0.
# user question query = "How to check disk usage in linux?" # retrieve relevant chunks docs = db.similarity_search(query) retrieved_chunks = [doc.page_content for doc in docs] # format the prompt chunks_formatted = "\n\n".join(retrieved_chunks) prompt_formatted = prompt.format(chunks_formatted=chunks_formatted, query=query) # generate answer llm = OpenAI(model="gpt-3.5-turbo-instruct", temperature=0) answer = llm(prompt_formatted) print(answer)If everything works fine, your output should be:
To upload a PDF to ChatGPT, first log into the website and click the paperclip icon next to the text input field. Then, select the PDF from your local hard drive, Google Drive, or Microsoft OneDrive. Once attached, type your query or question into the prompt field and click the upload button. Give the system time to analyze the PDF and provide you with a response.
Step 5: Build conversational history # Create conversational memory memory = ConversationBufferMemory(memory_key="chat_history", input_key="input") # Define a prompt template that includes memory template = """You are an exceptional customer support chatbot that gently answers questions. {chat_history} You know the following context information. {chunks_formatted} Answer the following question from a customer. Use only information from the previous context information. Do not invent stuff. Question: {input} Answer:""" prompt = PromptTemplate( input_variables=["chat_history", "chunks_formatted", "input"], template=template, ) # Initialize the OpenAI model llm = OpenAI(openai_api_key="YOUR API KEY", model="gpt-3.5-turbo-instruct", temperature=0) # Create the LLMChain with memory chain = LLMChain( llm=llm, prompt=prompt, memory=memory ) # User query query = "What was the 5th point about on the question how to remove spotify account?" # Retrieve relevant chunks docs = db.similarity_search(query) retrieved_chunks = [doc.page_content for doc in docs] # Format the chunks for the prompt chunks_formatted = "\n\n".join(retrieved_chunks) # Prepare the input for the chain input_data = { "input": query, "chunks_formatted": chunks_formatted, "chat_history": memory.buffer } # Simulate a conversation response = chain.predict(**input_data) print(response)Let’s walk through the code in a more conversational manner.
To start with, we set up a conversational memory using `ConversationBufferMemory`. This allows our chatbot to remember the ongoing chat history, using `input_key=”input”` to manage the incoming user inputs.
Next, we design a prompt template. This template is like a script for the chatbot, including sections for chat history, the chunks of information we’ve gathered, and the current user question (input). This structure helps the chatbot know exactly what context it has and what question it needs to answer.
Then, we move on to initializing our language model chain, or `LLMChain`. Think of this as assembling the components: we take our prompt template, the language model, and the memory we set up earlier, and combine them into a single workflow.
When it’s time to handle a user query, we prepare the input. This involves creating a dictionary that includes the user’s question (`input`) and the relevant information chunks (`chunks_formatted`). This setup ensures that the chatbot has all the details it needs to craft a well-informed response.
Finally, we generate a response. We call the `chain.predict` method, passing in our prepared input data. The method processes this input through the workflow we’ve built, and out comes the chatbot’s answer, which we then display.
This approach allows our chatbot to maintain a smooth, informed conversation, remembering past interactions and providing relevant answers based on the context.
Another favorite trick with PyCharm that helped me a lot to build this functionality was the opportunity to put my cursor over a method, to hit the key “CTRL” and click on it.
In conclusionGPT-3 excels at creating conversational chatbots capable of answering specific questions based on contextual information provided in the prompt. However, ensuring the model generates answers solely based on this context can be challenging, as it often tends to hallucinate (i.e., generate new, potentially false information). The impact of such false information varies depending on the use case.
In summary, we developed a context-aware question-answering system using LangChain, following the provided code and strategies. The process included splitting documents into chunks, computing their embeddings, implementing a retriever to find similar chunks, crafting a prompt for GPT-3, and using the GPT-3 model for text generation. This approach showcases the potential of leveraging GPT-3 to create powerful and contextually accurate chatbots while also emphasizing the importance of being vigilant about the risk of generating false information.
About the author Dido GrigorovDido is a seasoned Deep Learning Engineer and Python programmer with an impressive 17 years of experience in the field. He is currently pursuing advanced studies at the prestigious Stanford University, where he is enrolled in a cutting-edge AI program, led by renowned experts such as Andrew Ng, Christopher Manning, Fei-Fei Li and Chelsea Finn, providing Dido with unparalleled insights and mentorship.
Dido’s passion for Artificial Intelligence is evident in his dedication to both work and experimentation. Over the years, he has developed a deep expertise in designing, implementing, and optimizing machine learning models. His proficiency in Python has enabled him to tackle complex problems and contribute to innovative AI solutions across various domains.