Feeds

Qt Creator 14 released

Planet KDE - Thu, 2024-07-25 07:45

We are happy to announce the release of Qt Creator 14!

Hailey Schoelkopf: Voices of the Open Source AI Definition

Open Source Initiative - Thu, 2024-07-25 07:45

The Open Source Initiative (OSI) is running a blog series to introduce some of the people who have been actively involved in the Open Source AI Definition (OSAID) co-design process. The co-design methodology allows for the integration of diverging perspectives into one just, cohesive and feasible standard. Support and contribution from a significant and broad group of stakeholders is imperative to the Open Source process and is proven to bring diverse issues to light, deliver swift outputs and garner community buy-in.

This series features the voices of the volunteers who have helped shape and are shaping the Definition.

Meet Hailey Schoelkopf What’s your background related to Open Source and AI?

One of the main reasons I was able to get more deeply involved in AI research was through open research communities such as the BigScience Workshop and EleutherAI, where discussions and collaboration were available to outsiders. These opportunities to share knowledge and learn from others more experienced than me were crucial to learning about the field and growing as a practitioner and researcher.

I co-lead the training of the Pythia language models (https://arxiv.org/abs/2304.01373), some of the first fully-documented and reproducible large-scale language models with as many related artifacts as possible released Open Source. We were happy and lucky to see these models fill a clear need, especially in the research community, where Pythia has since contributed to a large amount of studies attempting to build our understanding of LLMs, including interpreting their internals, understanding the process by which these models improve over training, and disentangling some of the effects of the dataset contents on these models’ downstream behavior.

What motivated you to join this co-design process to define Open Source AI?

There has been a significant amount of confusion induced by the fact that not all ‘open-weights’ AI models released are released under OSI-compliant licenses-–or impose restrictions on their usage or adaptation-–so I was excited that OSI was working on reducing this confusion by producing a clear definition that could be used by the Open Source community. I more directly joined the process by helping discuss how the Open Source AI Definition could be mapped onto the Pythia language models and the accompanying artifacts we released.

Can you describe your experience participating in this process? What did you most enjoy about it and what were some of the challenges you faced?

Deciding what counts as sufficient transparency and modifiability to be Open Source was an interesting problem. Although public model weights are very beneficial to the Open Source community, releasing model weights without sufficient detail to understand the model and its development process to make modifications or understand reasons behind its design and resulting characteristics can hinder understanding or prevent the full benefits of a completely Open Source model from being realized.

Why do you think AI should be Open Source?

There are clear advantages to having models that are Open Source. Access to such fully-documented models can help a much, much broader group of people–trained researchers and also many others–who can use, study, and examine these models for their own purposes. While not every model should be made Open Source under all conditions, wider scrutiny and study of these models can help increase our understanding of AI systems’ behavior, raise societal preparedness and awareness of AI capabilities, and improve these models’ safety by allowing more people to understand them and explore their flaws.

With the Pythia language models, we’ve seen many researchers explore questions around the safety and biases of these models, including a breadth of questions we’d not have been able to study ourselves, or many that we could not even anticipate. These different perspectives are a crucial component in making AI systems safer and more broadly beneficial.

What do you think is the role of data in Open Source AI?

Data is a crucial component of AI systems. Transparency around (and, potentially, open release of) training datasets can enable a wide range of extended benefits to researchers, practitioners, and society at large. I think that for a model to be truly Open Source, and to derive the greatest benefits from its openness, information on training data must be shared transparently. This information also importantly allows various members of the Open Source community to avoid replicating each other’s work independently. Transparent sharing about motivations and findings with respect to dataset creation choices can improve the community’s collective understanding of system and dataset design for the future and minimize overlapping, wasted effort.

Has your personal definition of Open Source AI changed along the way? What new perspectives or ideas did you encounter while participating in the co-design process?

An interesting perspective that I’ve grown to appreciate is that the Open Source AI definition includes public and Open Source licensed training and inference code. Actually making one’s Open Source AI model effectively usable by the community and practitioners is a crucial step of promoting transparency, though not often enough discussed.

What do you think the primary benefit will be once there is a clear definition of Open Source AI?

Having a clear definition of Open Source AI can make it clearer where existing currently “open” systems fall, and potentially encourage future open-weights models to be released with more transparency. Many current open-weights models are shared under bespoke licenses with terms not compliant with Open Source principles–this creates legal uncertainty and also makes it less likely that a new open-weights model release will benefit practitioners at large or contribute to better understanding of how to design better systems. I would hope that a clearer Open Source AI definition will make it easier to draw these lines and encourage those currently releasing open-weights models to do so in a way more closely fitting the Open Source AI standard.

What do you think are the next steps for the community involved in Open Source AI?

An exciting future direction for the Open Source AI research community is to explore methods for greater control over AI model behavior; attempting to explore approaches to collective modification and collaborative development of AI systems that can adapt and be “patched” over time. A stronger understanding of how to properly evaluate these systems for capabilities, robustness, and safety will also be crucial. I hope to see the community direct greater attention to evaluation in the future as well.

How to get involved

The OSAID co-design process is open to everyone interested in collaborating. There are many ways to get involved:

Join the working groups: be part of a team to evaluate various models against the OSAID.
Join the forum: support and comment on the drafts, record your approval or concerns to new and existing threads.
Comment on the latest draft: provide feedback on the latest draft document directly.
Follow the weekly recaps: subscribe to our newsletter and blog to be kept up-to-date.
Join the town hall meetings: participate in the online public town hall meetings to learn more and ask questions.
Join the workshops and scheduled conferences: meet the OSI and other participants at in-person events around the world.

Categories: FLOSS Research

Gary Benson: Python atomic counter

GNU Planet! - Thu, 2024-07-25 07:09

Do you need a thread-safe atomic counter in Python? Use itertools.count():

>>> from itertools import count >>> counter = count() >>> next(counter) 0 >>> next(counter) 1 >>> next(counter) 2

I found this in the decorator package, labelled Atomic get-and-increment provided by the GIL. So simple! So cool!

Categories: FLOSS Project Planets

The Drop Times: Drupal Cafe Lutsk #25 Recap: Key Insights and Community Support

Planet Drupal - Thu, 2024-07-25 05:39

Drupal Cafe Lutsk #25, held on June 27, 2024, featured sessions on Drupal's Domain module, digital opportunities for cities, and IT project optimization. Supported by Anyforsoft, YozmaTech, and DevBranch, the event highlighted valuable industry insights and community collaboration.

Categories: FLOSS Project Planets

Formatting Selected Text in QML

Planet KDE - Thu, 2024-07-25 04:00

Motivation

Let’s say we’re working on a QML project that involves a TextEdit.

There’s some text in it:

here is some text

We want to select part of this text and hit ctrl+B to make it bold:

here is some text

In Qt Widgets, this is trivial, but not so much in QML – we can get font.bold of the entire TextEdit, but not of just the text in the selection. We have to implement formattable selections manually.

To do this, there are two approaches we’ll look at:

The first is to hack it together by getting the formatted text from the selection and editing this. Rather than setting properties of selected text, this solution actually inserts or removes formatting symbols from the underlying rich text source.
The other way to do this is to create a QML object that is implemented in C++ and exposed to TextEdit as a property. This way we can make use of QTextDocument and QTextCursor to actually set text properties within the selection area. This more closely follows the patterns expected in Qt.

In Qt 6.7, the TextEdit QML element does have a cursorSelection property that works in this way, and by dissecting its implementation, we can write a pseudo-backport for other Qt versions.

Before we do this, let’s take a look at the hacky QML/JS solution.

Hacky Approach

We start by focusing on just making ctrl+B bold shortcuts work:

TextEdit { id: txtEdit anchors.fill: parent selectByMouse: true textFormat: TextEdit.RichText } Shortcut { sequence: StandardKey.Bold onActivated: { if (txtEdit.selectedText.length > 0) { const start = txtEdit.selectionStart const end = txtEdit.selectionEnd let sel = txtEdit.getFormattedText(start, end) .split("")[1] .split("")[0] txtEdit.remove(start, end) if (sel.includes("font-weight:600;")) sel = sel.replace("font-weight:600;", "") else sel = "<b>" + sel + "</b>" txtEdit.insert(txtEdit.cursorPosition, sel) txtEdit.select(start, end) } } }

Notice that we actually remove and replace the selected text, and reselect the insertion manually.

We can set up similar shortcuts for italics and underline trivially, but what if we want to set font properties of only the text in the selected area?

To keep things simple, let’s see what happens if we want to set just the font family and size:

FontDialog { id: fontDlg } Shortcut { id: fontShortcut property string sel: "" property int start: 0 property int end: 0 sequence: StandardKey.Find onActivated: { if (txtEdit.selectedText.length > 0) { start = txtEdit.selectionStart end = txtEdit.selectionEnd sel = txtEdit.getFormattedText(start, end) .split("")[1] .split("")[0] fontDlg.open() } } } Connections { target: fontDlg function onAccepted() { txtEdit.remove(fontShortcut.start, fontShortcut.end) if (fontShortcut.sel.includes("font-family:")) { let fontToReplace = fontShortcut.sel.split("font-family:'")[1].split("';")[0] fontShortcut.sel = fontShortcut.sel.replace(fontToReplace, fontDlg.font.family) } else { fontShortcut.sel = "<span style=\"font-family: '" + fontDlg.font.family + "'; font-size:" + (fontDlg.font.pixelSize ? fontDlg.font.pixelSize : fontDlg.font.pointSize) + "\">" + fontShortcut.sel + "</span>" } txtEdit.insert(txtEdit.cursorPosition, fontShortcut.sel) txtEdit.select(fontShortcut.start, fontShortcut.end) } }

If we start messing with other font style properties like italic, bold, spacing, etc., we will end up with almost unreadably nasty string manipulation here.

This solution is overall hacky, as we replace HTML-formatted text from a snipped out section. It would be more Qt-idiomatic to retrieve QFont info from a selection and set the properties without editing raw rich text. Furthermore, it’s better to do as much logic as possible in C++ rather than with JavaScript in QML.

Implementation of cursorSelection in Qt 6.7 QML

Let’s take a look at the cursorSelection property of QtQuick TextEdit in Qt 6.7.

By looking at its property declaration in qquicktextedit_p.h, the type of cursorSelection is QQuickTextSelection.

This type is very basic. It has four read/write properties.

Here is the header qquicktextselection_p.h:

class Q_QUICK_EXPORT QQuickTextSelection : public QObject { Q_OBJECT Q_PROPERTY(QString text READ text WRITE setText NOTIFY textChanged FINAL) Q_PROPERTY(QFont font READ font WRITE setFont NOTIFY fontChanged FINAL) Q_PROPERTY(QColor color READ color WRITE setColor NOTIFY colorChanged FINAL) Q_PROPERTY(Qt::Alignment alignment READ alignment WRITE setAlignment NOTIFY alignmentChanged FINAL) QML_ANONYMOUS QML_ADDED_IN_VERSION(6, 7) public: explicit QQuickTextSelection(QObject *parent = nullptr); QString text() const; void setText(const QString &text); QFont font() const; void setFont(const QFont &font); QColor color() const; void setColor(QColor color); Qt::Alignment alignment() const; void setAlignment(Qt::Alignment align); Q_SIGNALS: void textChanged(); void fontChanged(); void colorChanged(); void alignmentChanged(); private: QTextCursor cursor() const; void updateFromCharFormat(const QTextCharFormat &fmt); void updateFromBlockFormat(); private: QTextCursor m_cursor; QTextCharFormat m_charFormat; QTextBlockFormat m_blockFormat; QQuickTextDocument *m_doc = nullptr; QQuickTextControl *m_control = nullptr; };

Notice we’ve got these private data members:

QTextCursor m_cursor; QTextCharFormat m_charFormat; QTextBlockFormat m_blockFormat; QQuickTextDocument *m_doc = nullptr; QQuickTextControl *m_control = nullptr;

The m_doc and m_control are retrieved from the TextEdit which parents the selection object. The object is always constructed by a QQuickTextEdit, so in the constructor, the parent is cast to one using qmlobject_cast. Then we set these two fields.

Now what are m_charFormat and m_blockFormat?

Text documents are composed of a list of text blocks, which can be paragraphs, lists, tables, images, etc. Thus, a block format represents an individual block’s alignment formatting. Char format contains formatting information at the character level, like font family, weight, style, size, color, and so forth.

To initialize these, we need to get the cursor from the text control.

QTextCursor QQuickTextSelection::cursor() const { if (m_control) return m_control->textCursor(); return m_cursor; }

The cursor will give us a char format and a block format, which we use to get the font / color / alignment at the cursor’s location.

QFont QQuickTextSelection::font() const { return cursor().charFormat().font(); } // ... QColor QQuickTextSelection::color() const { return cursor().charFormat().foreground().color(); } // ... Qt::Alignment QQuickTextSelection::alignment() const { return cursor().blockFormat().alignment(); }

currentCharFormatChanged is emitted by QQuickTextControl when the cursor moves or the document’s contents change. If this format is indeed different from the fields of the selection object, we must update them and emit the selection’s signals, just as we would in setters. Since we keep track of block alignment too, we have to do the same when the cursor moves and block format is different.

QQuickTextSelection::QQuickTextSelection(QObject *parent) : QObject(parent) { // When QQuickTextEdit creates its cursorSelection, it passes itself as the parent if (auto *textEdit = qmlobject_cast<QQuickTextEdit *>(parent)) { m_doc = textEdit->textDocument(); m_control = QQuickTextEditPrivate::get(textEdit)->control; connect(m_control, &QQuickTextControl::currentCharFormatChanged, this, &QQuickTextSelection::updateFromCharFormat); connect(m_control, &QQuickTextControl::cursorPositionChanged, this, &QQuickTextSelection::updateFromBlockFormat); } } // ... // ... // ... inline void QQuickTextSelection::updateFromCharFormat(const QTextCharFormat &fmt) { if (fmt.font() != m_charFormat.font()) emit fontChanged(); if (fmt.foreground().color() != m_charFormat.foreground().color()) emit colorChanged(); m_charFormat = fmt; } inline void QQuickTextSelection::updateFromBlockFormat() { QTextBlockFormat fmt = cursor().blockFormat(); if (fmt.alignment() != m_blockFormat.alignment()) emit alignmentChanged(); m_blockFormat = fmt; }

Here are the setters for the properties, which use the cursor to access and mutate the character or block properties at its position.

void QQuickTextSelection::setText(const QString &text) { auto cur = cursor(); if (cur.selectedText() == text) return; cur.insertText(text); emit textChanged(); } // ... void QQuickTextSelection::setFont(const QFont &font) { auto cur = cursor(); if (cur.selection().isEmpty()) cur.select(QTextCursor::WordUnderCursor); if (font == cur.charFormat().font()) return; QTextCharFormat fmt; fmt.setFont(font); cur.mergeCharFormat(fmt); emit fontChanged(); } // ... void QQuickTextSelection::setColor(QColor color) { auto cur = cursor(); if (cur.selection().isEmpty()) cur.select(QTextCursor::WordUnderCursor); if (color == cur.charFormat().foreground().color()) return; QTextCharFormat fmt; fmt.setForeground(color); cur.mergeCharFormat(fmt); emit colorChanged(); } // ... void QQuickTextSelection::setAlignment(Qt::Alignment align) { if (align == alignment()) return; QTextBlockFormat format; format.setAlignment(align); cursor().mergeBlockFormat(format); emit alignmentChanged(); }

Now, we want to do something like this in our code. The issue is that this implementation resides in the Qt source code itself, and cursorSelection is a property of QQuickTextEdit. If we want to do something like this without changing Qt source code, we have to use attached properties.

Implementing an Attached Property

Using CursorSelection as an attached property for a TextEdit in QML might look something like this:

Item { // ... // ... // ... Shortcut { // ctrl+B to toggle bold / not bold for selection sequence: StandardKey.Bold onActivated: { txtEdit.CursorSelection.font = Qt.font({ bold: txtEdit.CursorSelection.font.bold !== true }) } } TextEdit { id: txtEdit // ... CursorSelection.font { bold: false italic: false underline: false } } }

To create our own attached property, we have to create two classes: CursorSelectionAttached and CursorSelection.

CursorSelectionAttached will contain the implementation of the selection, while CursorSelection serves as the attaching type, using the qmlAttachedProperties() method to expose the signals and properties of an instance of CursorSelectionAttached to the parent to which it is attached.

CursorSelection also needs the QML_ATTACHED() macro in its header declaration, and we must specify that it has an attached property with the macro QML_DECLARE_TYPEINFO() outside the class scope.

Thus, CursorSelection will just look like this:

// CursorSelection.h class CursorSelection : public QObject { Q_OBJECT QML_ATTACHED(CursorSelectionAttached) QML_ELEMENT public: static CursorSelectionAttached *qmlAttachedProperties(QObject *object); }; QML_DECLARE_TYPEINFO(CursorSelection, QML_HAS_ATTACHED_PROPERTIES)

Where the entire implementation is just this function definition:

// CursorSelection.cpp CursorSelectionAttached *CursorSelection::qmlAttachedProperties(QObject *object) { if (auto *textEdit = qobject_cast<QQuickTextEdit *>(object)) return new CursorSelectionAttached(textEdit); return nullptr; }

Notice that we perform the qobject_cast here and forward the result as the parent of the attached object. This way we only construct an attached object if we can cast the parent object to a TextEdit.

Now, let’s see how CursorSelectionAttached should be implemented. We begin with the constructor:

// we know that parent will be a QQuickTextEdit * CursorSelectionAttached::CursorSelectionAttached(QQuickTextEdit *parent) noexcept : QObject(parent) , mEdit(parent) // this is the TextEdit we are attached to { // make sure the QTextDocument exists const auto *const quickDoc = mEdit->textDocument(); // QQuickTextDocument * auto *doc = quickDoc->textDocument(); // QTextDocument * Q_ASSERT(doc != nullptr); // retrieve QTextCursor from the QTextDocument mCursor = QTextCursor(doc); // When deselecting, the cursor position and anchor are // set to the TextEdit's cursor position connect(mEdit, &QQuickTextEdit::selectedTextChanged, this, &CursorSelectionAttached::moveAnchorIfDeselected); connect(mEdit, &QQuickTextEdit::cursorPositionChanged, this, &CursorSelectionAttached::updatePosition); // if we set a format with no selection, we keep it in an optional // then when new text is added, it will have this formatting // for example, with no selection we press ctrl+B and then start // typing. we expect the text to be bold. connect(mEdit->textDocument()->textDocument(), &QTextDocument::contentsChange, this, &CursorSelectionAttached::applyFormatToNewTextIfNeeded); }

Note that we connect to these three slots:

moveAnchorIfDeselected
updatePosition
applyFormatToNewTextIfNeeded

Let’s investigate the purpose of these.

moveAnchorIfDeselected is invoked when the TextEdit’s selected text changes. A QTextCursor has an anchor, which controls selection area. If text is being selected, the anchor is fixed in place where the selection is started, and the cursor position moves independently of the anchor. The selection area is located between the two positions. When a cursor moves without selecting anything, the anchor is located at and moves along with the cursor position.

Thus, when a cursor’s position is moved, we need to know if the anchor should be moved with it.

Since we invoke moveAnchorIfDeselected when the selected text changes, we know that if the selection is now empty, this means there was a selection that has been deselected. Thus, the cursor and anchor should be equal to one another.

void CursorSelectionAttached::moveAnchorIfDeselected() { if (mEdit->selectedText().isEmpty()) mCursor.setPosition(mEdit->cursorPosition(), QTextCursor::MoveAnchor); }

updatePosition is invoked when the TextEdit’s cursor position changes. Depending on the TextEdit’s selection start and end positions, there are a few ways the cursor could be updated.

If there is no selected area in the TextEdit, the cursor and anchor should move together. If a selection’s start and end position both change, we must move the cursor twice: once to the start position, with the anchor moving, and once to the end position, with the anchor fixed in place. If the selection area is being resized, for example by dragging or using Shift+ArrowKeys, the cursor should move with the anchor fixed in place.

void CursorSelectionAttached::updatePosition() { // if there's no selection, just move the cursor & anchor if (mEdit->selectionEnd() == mEdit->selectionStart()) { mCursor.setPosition(mEdit->cursorPosition(), QTextCursor::MoveAnchor); } // if both the start and end need to be updated: // move cursor and anchor to selection start, and // move cursor to selection end while keeping anchor at start // // we have to make sure the anchor is moved correctly so the // whole selection matches up -- otherwise cursor selection // start or end might be in the middle of the actual // selection, wherever the anchor is else if (mEdit->selectionStart() != mCursor.selectionStart() && mEdit->selectionEnd() != mCursor.selectionEnd()) { mCursor.setPosition(mEdit->selectionStart(), QTextCursor::MoveAnchor); mCursor.setPosition(mEdit->selectionEnd(), QTextCursor::KeepAnchor); } // these two cases are for selection dragging, only start or // end will move, so anchor stays in place else if (mEdit->selectionStart() != mCursor.selectionStart()) { mCursor.setPosition(mEdit->selectionStart(), QTextCursor::KeepAnchor); } else if (mEdit->selectionEnd() != mCursor.selectionEnd()) { mCursor.setPosition(mEdit->selectionEnd(), QTextCursor::KeepAnchor); } }

applyFormatToNewTextIfNeeded is invoked when the contents of the text document change. This is because font properties might be set without an active selection. In this case, the expected behavior is for the characters added afterwards will have these properties.

For example, if the font family is changed with no selection, and we start typing, we expect our text to be in this new font. To do this, we need an optional in which we can save a format to apply to new text if needed, or otherwise contains nullopt. We will call it mOptFormat. It can be set in property setters, which you will see later. For now, we just make sure to use it when the text document content changes and there exists a value in the optional.

void CursorSelectionAttached::applyFormatToNewTextIfNeeded(int from, int charsRemoved, int charsAdded) { if (charsAdded && mOptFormat) { mCursor.setPosition(mCursor.position() - 1, QTextCursor::KeepAnchor); mCursor.mergeCharFormat(mOptFormat.value()); mOptFormat.reset(); } }

Now, let’s take a look at the properties to expose to QML, and how they can be retrieved and set using the cursor. Like the QQuickTextSelection implementation, we will have properties text and font. We can implement the others as well, but for the sake of brevity, we will just focus on these two.

Q_PROPERTY(QString text READ text WRITE setText NOTIFY textChanged FINAL) Q_PROPERTY(QFont font READ font WRITE setFont NOTIFY fontChanged FINAL)

We’ll need to declare and define these getters and setters, and declare the signals:

Getters:

[[nodiscard]] QString text() const; [[nodiscard]] QFont font() const;

Setters:

void setText(const QString &text); void setFont(const QFont &font);

Signals:

void textChanged(); void fontChanged();

The getter and setter implementations will look very similar to the previous implementations shown for QQuickTextSelection, with some minor differences.

Getter implementations:

QString CursorSelectionAttached::text() const { return mCursor.selectedText(); } QFont CursorSelectionAttached::font() const { // simply get the font at the cursor position using charFormat auto ret = mCursor.charFormat().font(); // if the cursor is at the start of a selection, we need to take the font // at the position right in front of it. otherwise, the font will refer to the // character at the position right before the selection begins if (mCursor.hasSelection() && mCursor.position() == mCursor.selectionStart()) { auto cur = mCursor; cur.setPosition(cur.position() + 1); ret = cur.charFormat().font(); } return ret; }

Setter implementations:

void CursorSelectionAttached::setText(const QString &text) { if (mCursor.selectedText() == text) return; mCursor.insertText(text); emit textChanged(); } void CursorSelectionAttached::setFont(const QFont &font) { if (font == mCursor.charFormat().font()) return; QTextCharFormat fmt = mCursor.charFormat(); fmt.setFont(font, QTextCharFormat::FontPropertiesSpecifiedOnly); // when no selection, formatting must be set on the next insertion if (mCursor.selection().isEmpty()) mOptFormat = fmt; else mCursor.mergeCharFormat(fmt); emit fontChanged(); }

The only thing that needs to be done now is override the destructor, which can just be set to default:

~CursorSelectionAttached() override = default;

Now we have all the implementation we need to use the attached property. If we put the two classes in one header file, it will look like this:

#pragma once #include <QObject> #include <QTextCursor> #include <QtQml> #include <optional> class QQuickTextEdit; class CursorSelectionAttached : public QObject { Q_OBJECT Q_PROPERTY(QString text READ text WRITE setText NOTIFY textChanged FINAL) Q_PROPERTY(QFont font READ font WRITE setFont NOTIFY fontChanged FINAL) QML_ANONYMOUS public: explicit CursorSelectionAttached(QQuickTextEdit *parent) noexcept; ~CursorSelectionAttached() override = default; [[nodiscard]] QString text() const; [[nodiscard]] QFont font() const; void setText(const QString &text); void setFont(const QFont &font); signals: void textChanged(); void fontChanged(); private slots: void moveAnchorIfDeselected(); void updatePosition(); void applyFormatToNewTextIfNeeded(int from, int charsRemoved, int charsAdded); private: QTextCursor mCursor; QQuickTextEdit *mEdit; std::optional<QTextCharFormat> mOptFormat; }; class CursorSelection : public QObject { Q_OBJECT QML_ATTACHED(CursorSelectionAttached) QML_ELEMENT public: static CursorSelectionAttached *qmlAttachedProperties(QObject *object); }; QML_DECLARE_TYPEINFO(CursorSelection, QML_HAS_ATTACHED_PROPERTIES)

With this header, an implementation file containing the definitions, and a call to qmlRegisterUncreatableType<CursorSelection> in your main.cpp, the attached property can be used in QML.

Final Remarks

Though this is not a perfect backport, this code allows us to set font properties for selected text in QML in a nearly identical way to its implementation in Qt 6.7. This is especially useful to implement any kind of richtext editing in a QML application, where this functionality is severely lacking in any Qt version prior to 6.7. Hopefully this is a helpful guide to backporting features, implementing attached properties, and doing more sane text editing in QML apps.

About KDAB

If you like this article and want to read similar material, consider subscribing via our RSS feed.

Subscribe to KDAB TV for similar informative short video content.

KDAB provides market leading software consulting and development services and training in Qt, C++ and 3D/OpenGL. Contact us.

The post Formatting Selected Text in QML appeared first on KDAB.

Categories: FLOSS Project Planets

EuroPython Society: EuroPython 2024 Code of Conduct Transparency Report

Planet Python - Thu, 2024-07-25 02:00

The 2024 version of the EuroPython conference took place both online and in person in July 2024. This was the second conference under our new Code of Conduct (CoC), and we had Code of Conduct working group members continuously available both online and in person.

Reports

We had 4 Code of Conduct working group members continuously available both online and in person. Over the course of the conference the Code of Conduct team was made aware of the following issues:

A disabled person had requested reserved seating for talks, but when he arrived the first day, there was none. He reported this to a CoC member, who filed a report with Ops. It turned out that while the request had been gathered on the web form, there was no mechanism to get that information to the people involved. Once they were informed, the issue was quickly resolved, and the reporter expressed satisfaction with the way it was handled.
One person was uncomfortable with having their last name shown on Discord. They were informed that they could change that as soon as the registration bot ran, limiting the exposure to a minute or so, or that they could come to the registration desk for assistance. The report came via email and there was no response to the email suggesting those options.
An attendee reported that one talk&aposs slides included a meme that seemed to reflect a racist trope. The CoC team reviewed that talk&aposs slides, and agreed that the meme might be interpreted that way. A member of the CoC team contacted the presenter who immediately agreed to remove that meme before uploading the slides, and the video team was alerted to edit that meme out of the talk video before final publication.
There were multiple reports that the toilet signage was confusing and causing people to be uncomfortable with choosing a toilet. Once this was reported the signage was adjusted to make the gender designation visible and no further reports were received. It should be noted that none of the complaints objected to the text of the signs, just to the fact that covering of gender markers led to people entering a toilet they didn&apost want to.
The CoC team also were presented with a potential lightning talk topic that had caused complaints at another conference due to references to current wars that some viewers found disturbing. Since lightning talks are too short for content warnings to be effective, and since they are not reviewed in any detail by the programme committee, the CoC team counselled the prospective presenter against using the references that had been problematic at a prior conference. Given that advice, the presenter elected not to submit that topic.

Categories: FLOSS Project Planets

OSI at the United Nations OSPOs for Good

Open Source Initiative - Wed, 2024-07-24 18:57

Earlier this month the Open Source Initiative participated in the “OSPOs for Good” event promoted by the United Nations in NYC. Stefano Maffulli, the Executive Director of the OSI, participated in a panel moderated by Mehdi Snene about Open Source AI alongside distinguished speakers Ashley Kramer, Craig Ramlal, Sasha Luccioni, and Sergio Gago. Please find below a transcript of Stefano’s presentation.

Mehdi Snene

What is Open Source in AI? What does it mean? What are the foundational pieces? How far along is the data? There is mention of weights, and data skills. How can we truly understand what Open Source in AI is? Today, joining us, we’ll have someone who can help us understand what Open Source in AI means and where we are heading. Stefano, can you offer your insights?

Stefano Maffulli

Thanks. We have some thoughts on this. We’ve been pondering these questions since they first emerged when GPT started to appear. We asked ourselves: How do we transfer the principles of permissionless innovation and the immense value created by the Open Source ecosystem into the AI space?

After a little over two years of research and global conversations with multiple stakeholders, we identified three key elements. Firstly, permissionless innovation needs to be ported to AI, but this is complex and must be broken down into smaller components.

We realized that, as developers, users, and deployers of AI systems, we need to understand how these systems are built. This involves studying all components carefully, being able to run them for any purpose without asking for permission (a basic tenet of Open Source), and modifying them to change outputs based on the same inputs. These basic principles include being able to share these modifications with others.

To achieve this, you need data, the code used for training and cleaning the data (e.g., removing duplicates), the parameters, the weights, and a way to run inference on those weights. It’s fairly straightforward. However, the challenge lies in the legal framework.

Now, the complicated piece is how Open Source software has had a very wonderful run, based on the fact that the legal framework that governs Open Source is fairly simple and globally accepted. It’s built on copyright, a system that has worked wonderfully in both ways. It gives exclusive rights to the content creators, but also the same mechanism can be used to grant rights to anyone who receives the creation.

With data, we don’t have that mechanism. That is a very simple and dramatic realization. When we talk about data, we should pay attention to what kind of data we’re discussing. There is data as content created, and there is data as facts; like fires, speed limits, or traces of a road. Those are facts, and they have different ways of being treated. There is also private data, personal information, and various other kinds of data, each with different rules and regulations around the world.

Governments’ major role in the future will be to facilitate permissionless innovation in data by harmonizing these rules. This will level the playing field, where currently larger corporations have significantly more power than Open Source developers or those wishing to create large language models. Governments should help create datasets, remove barriers, and facilitate access for academia, smaller developers, and the global south.

Mehdi Snene

We already have open data and Open Source. Now, we need to create open AI and open models. Are we bringing these two domains together and keeping them separate, or are we creating something new from scratch when we talk about open AI?

Stefano Maffulli

This is a very interesting and powerful question. I believe that open data as a movement has been around for quite a while. However, it’s only recently that data scientists have truly realized the value they hold in their hands. Data is fungible and can be used to build new things that are completely different from their original domains.

We need to talk more about this and establish platforms for better interaction. One striking example is a popular dataset of images used for training many image generation AI tools, which contained child sexual abuse images for many years. A research paper highlighted this huge problem, but no one filed a bug report, and there was no easy way for the maintainers of this dataset to notice and remove those images.

There are things that the software world understands very well, and things that data scientists understand very well. We are starting to see the need for more space for interactions and learning from each other.

The conversation is extremely complicated. Alex and I have had long discussions about this. I don’t want to focus entirely on this, but I do want to say that Open Source has never been about pleasing companies or specific stakeholders. We need to think of it as an ecosystem where the balances of power are maintained.

While Open Source software and Open Source AI are still evolving, the necessary ingredients—data, code, and other components—are there. However, the data piece still needs to be debated and finalized. Pushing for radical openness with data has clear drawbacks and issues. It’s going to be a balance of intentions, aiming for the best outcome for the general public and the whole ecosystem.

Mehdi Snene

Thank you so much. My next question is about the future. What are your thoughts on the next big technology?

Stefano Maffulli

From the perspective of open innovation, it’s about what’s going to give society control over technology. The focus of Open Source has always been to enable developers and end-users to have sovereignty over the technology they use. Whether it’s quantum computers, AI, or future technologies, maintaining that control is crucial.

Governments need to play a role in enabling innovation and ensuring that no single power becomes too dominant. The balance between the private sector, public sector, nonprofit sector, and the often-overlooked fourth sector—which includes developers and creators who work for the public good rather than for profit—must be maintained. This balance is essential for fostering an ecosystem where all stakeholders have equal interests and influence.

If you would like to listen to the panel discussion in its entirety, you can do so here (the Open Source AI panel starts at 1:00:00 approximately).

Categories: FLOSS Research

FSF News: Let's not celebrate CrowdStrike -- let's point to a better way

GNU Planet! - Wed, 2024-07-24 17:05

Categories: FLOSS Project Planets

Acquia Developer Portal Blog: Drupal 11 Preparation Checklist

Planet Drupal - Wed, 2024-07-24 13:15

Drupal 11 is early! But don’t panic, the new Drupal 10 support model means you are not under pressure to upgrade. Drupal 10 will continue to be supported until mid-late 2026. But as we know, it’s best to be prepared and understand the upgrade process when that time comes for your organization.

Similar to the upgrade from Drupal 9 to Drupal 10, the latest version of Drupal 10 - Drupal 10.3.1 - defined all the deprecated code for Drupal 11. Also like previous modern Drupal major version upgrades there is a recommended set of areas to focus in order to get your applications upgraded as cleanly as possible.

Categories: FLOSS Project Planets

Acquia Developer Portal Blog: The Power of Drupal 11

Planet Drupal - Wed, 2024-07-24 13:15

Welcome, Drupal 11! At a previous DrupalCon Portland Drupal’s creator Dries Buytaert introduced the goal of making Drupal the preferred tool for ambitious site builders on the open web. Recently, Dries shared an updated plan for Drupal 11 which has 3 major focus areas:

Categories: FLOSS Project Planets

PyPy: Abstract interpretation in the Toy Optimizer

Planet Python - Wed, 2024-07-24 10:48

This is a cross-post from Max Bernstein from his excellent blog where he writes about programming languages, compilers, optimizations, virtual machines. He's looking for a (dynamic language runtime or compiler related) job too.

CF Bolz-Tereick wrote some excellent posts in which they introduce a small IR and optimizer and extend it with allocation removal. We also did a live stream together in which we did some more heap optimizations.

In this blog post, I'm going to write a small abtract interpreter for the Toy IR and then show how we can use it to do some simple optimizations. It assumes that you are familiar with the little IR, which I have reproduced unchanged in a GitHub Gist.

Abstract interpretation is a general framework for efficiently computing properties that must be true for all possible executions of a program. It's a widely used approach both in compiler optimizations as well as offline static analysis for finding bugs. I'm writing this post to pave the way for CF's next post on proving abstract interpreters correct for range analysis and known bits analysis inside PyPy.

Before we begin, I want to note a couple of things:

The Toy IR is in SSA form, which means that every variable is defined exactly once. This means that abstract properties of each variable are easy to track.
The Toy IR represents a linear trace without control flow, meaning we won't talk about meet/join or fixpoints. They only make sense if the IR has a notion of conditional branches or back edges (loops).

Alright, let's get started.

Welcome to abstract interpretation

Abstract interpretation means a couple different things to different people. There's rigorous mathematical formalism thanks to Patrick and Radhia Cousot, our favorite power couple, and there's also sketchy hand-wavy stuff like what will follow in this post. In the end, all people are trying to do is reason about program behavior without running it.

In particular, abstract interpretation is an over-approximation of the behavior of a program. Correctly implemented abstract interpreters never lie, but they might be a little bit pessimistic. This is because instead of using real values and running the program---which would produce a concrete result and some real-world behavior---we "run" the program with a parallel universe of abstract values. This abstract run gives us information about all possible runs of the program.1

Abstract values always represent sets of concrete values. Instead of literally storing a set (in the world of integers, for example, it could get pretty big...there are a lot of integers), we group them into a finite number of named subsets.2

Let's learn a little about abstract interpretation with an example program and example abstract domain. Here's the example program:

v0 = 1 v1 = 2 v2 = add(v0, v1)

And our abstract domain is "is the number positive" (where "positive" means nonnegative, but I wanted to keep the words distinct):

top / \ positive negative \ / bottom

The special top value means "I don't know" and the special bottom value means "empty set" or "unreachable". The positive and negative values represent the sets of all positive and negative numbers, respectively.

We initialize all the variables v0, v1, and v2 to bottom and then walk our IR, updating our knowledge as we go.

# here v0:bottom = 1 v1:bottom = 2 v2:bottom = add(v0, v1)

In order to do that, we have to have transfer functions for each operation. For constants, the transfer function is easy: determine if the constant is positive or negative. For other operations, we have to define a function that takes the abstract values of the operands and returns the abstract value of the result.

In order to be correct, transfer functions for operations have to be compatible with the behavior of their corresponding concrete implementations. You can think of them having an implicit universal quantifier forall in front of them.

Let's step through the constants at least:

v0:positive = 1 v1:positive = 2 # here v2:bottom = add(v0, v1)

Now we need to figure out the transfer function for add. It's kind of tricky right now because we haven't specified our abstract domain very well. I keep saying "numbers", but what kinds of numbers? Integers? Real numbers? Floating point? Some kind of fixed-width bit vector (int8, uint32, ...) like an actual machine "integer"?

For this post, I am going to use the mathematical definition of integer, which means that the values are not bounded in size and therefore do not overflow. Actual hardware memory constraints aside, this is kind of like a Python int.

So let's look at what happens when we add two abstract numbers:

top positive negative bottom top top top top bottom positive top positive top bottom negative top top negative bottom bottom bottom bottom bottom bottom

As an example, let's try to add two numbers a and b, where a is positive and b is negative. We don't know anything about their values other than their signs. They could be 5 and -3, where the result is 2, or they could be 1 and -100, where the result is -99. This is why we can't say anything about the result of this operation and have to return top.

The short of this table is that we only really know the result of an addition if both operands are positive or both operands are negative. Thankfully, in this example, both operands are known positive. So we can learn something about v2:

v0:positive = 1 v1:positive = 2 v2:positive = add(v0, v1) # here

This may not seem useful in isolation, but analyzing more complex programs even with this simple domain may be able to remove checks such as if (v2 < 0) { ... }.

Let's take a look at another example using an sample absval (absolute value) IR operation:

v0 = getarg(0) v1 = getarg(1) v2 = absval(v0) v3 = absval(v1) v4 = add(v2, v3) v5 = absval(v4)

Even though we have no constant/concrete values, we can still learn something about the states of values throughout the program. Since we know that absval always returns a positive number, we learn that v2, v3, and v4 are all positive. This means that we can optimize out the absval operation on v5:

v0:top = getarg(0) v1:top = getarg(1) v2:positive = absval(v0) v3:positive = absval(v1) v4:positive = add(v2, v3) v5:positive = v4

Other interesting lattices include:

Constants (where the middle row is pretty wide)
Range analysis (bounds on min and max of a number)
Known bits (using a bitvector representation of a number, which bits are always 0 or 1)

For the rest of this blog post, we are going to do a very limited version of "known bits", called parity. This analysis only tracks the least significant bit of a number, which indicates if it is even or odd.

Parity

The lattice is pretty similar to the positive/negative lattice:

top / \ even odd \ / bottom

Let's define a data structure to represent this in Python code:

class Parity: def __init__(self, name): self.name = name def __repr__(self): return self.name

And instantiate the members of the lattice:

TOP = Parity("top") EVEN = Parity("even") ODD = Parity("odd") BOTTOM = Parity("bottom")

Now let's write a forward flow analysis of a basic block using this lattice. We'll do that by assuming that a method on Parity is defined for each IR operation. For example, Parity.add, Parity.lshift, etc.

def analyze(block: Block) -> None: parity = {v: BOTTOM for v in block} def parity_of(value): if isinstance(value, Constant): return Parity.const(value) return parity[value] for op in block: transfer = getattr(Parity, op.name) args = [parity_of(arg.find()) for arg in op.args] parity[op] = transfer(*args)

For every operation, we compute the abstract value---the parity---of the arguments and then call the corresponding method on Parity to get the abstract result.

We need to special case Constants due to a quirk of how the Toy IR is constructed: the constants don't appear in the instruction stream and instead are free-floating.

Let's start by looking at the abstraction function for concrete values---constants:

class Parity: # ... @staticmethod def const(value): if value.value % 2 == 0: return EVEN else: return ODD

Seems reasonable enough. Let's pause on operations for a moment and consider an example program:

v0 = getarg(0) v1 = getarg(1) v2 = lshift(v0, 1) v3 = lshift(v1, 1) v4 = add(v2, v3) v5 = dummy(v4)

This function (which is admittedly a little contrived) takes two inputs, shifts them left by one bit, adds the result, and then checks the least significant bit of the addition result. It then passes that result into a dummy function, which you can think of as "return" or "escape".

To do some abstract interpretation on this program, we'll need to implement the transfer functions for lshift and add (dummy will just always return TOP). We'll start with add. Remember that adding two even numbers returns an even number, adding two odd numbers returns an even number, and mixing even and odd returns an odd number.

class Parity: # ... def add(self, other): if self is BOTTOM or other is BOTTOM: return BOTTOM if self is TOP or other is TOP: return TOP if self is EVEN and other is EVEN: return EVEN if self is ODD and other is ODD: return EVEN return ODD

We also need to fill in the other cases where the operands are top or bottom. In this case, they are both "contagious"; if either operand is bottom, the result is as well. If neither is bottom but either operand is top, the result is as well.

Now let's look at lshift. Shifting any number left by a non-zero number of bits will always result in an even number, but we need to be careful about the zero case! Shifting by zero doesn't change the number at all. Unfortunately, since our lattice has no notion of zero, we have to over-approximate here:

class Parity: # ... def lshift(self, other): # self << other if other is ODD: return EVEN return TOP

This means that we will miss some opportunities to optimize, but it's a tradeoff that's just part of the game. (We could also add more elements to our lattice, but that's a topic for another day.)

Now, if we run our abstract interpretation, we'll collect some interesting properties about the program. If we temporarily hack on the internals of bb_to_str, we can print out parity information alongside the IR operations:

v0:top = getarg(0) v1:top = getarg(1) v2:even = lshift(v0, 1) v3:even = lshift(v1, 1) v4:even = add(v2, v3) v5:top = dummy(v4)

This is pretty awesome, because we can see that v4, the result of the addition, is always even. Maybe we can do something with that information.

Optimization

One way that a program might check if a number is odd is by checking the least significant bit. This is a common pattern in C code, where you might see code like y = x & 1. Let's introduce a bitand IR operation that acts like the & operator in C/Python. Here is an example of use of it in our program:

v0 = getarg(0) v1 = getarg(1) v2 = lshift(v0, 1) v3 = lshift(v1, 1) v4 = add(v2, v3) v5 = bitand(v4, 1) # new! v6 = dummy(v5)

We'll hold off on implementing the transfer function for it---that's left as an exercise for the reader---and instead do something different.

Instead, we'll see if we can optimize operations of the form bitand(X, 1). If we statically know the parity as a result of abstract interpretation, we can replace the bitand with a constant 0 or 1.

We'll first modify the analyze function (and rename it) to return a new Block containing optimized instructions:

def simplify(block: Block) -> Block: parity = {v: BOTTOM for v in block} def parity_of(value): if isinstance(value, Constant): return Parity.const(value) return parity[value] result = Block() for op in block: # TODO: Optimize op # Emit result.append(op) # Analyze transfer = getattr(Parity, op.name) args = [parity_of(arg.find()) for arg in op.args] parity[op] = transfer(*args) return result

We're approaching this the way that PyPy does things under the hood, which is all in roughly a single pass. It tries to optimize an instruction away, and if it can't, it copies it into the new block.

Now let's add in the bitand optimization. It's mostly some gross-looking pattern matching that checks if the right hand side of a bitwise and operation is 1 (TODO: the left hand side, too). CF had some neat ideas on how to make this more ergonomic, which I might save for later.3

Then, if we know the parity, optimize the bitand into a constant.

def simplify(block: Block) -> Block: parity = {v: BOTTOM for v in block} def parity_of(value): if isinstance(value, Constant): return Parity.const(value) return parity[value] result = Block() for op in block: # Try to simplify if isinstance(op, Operation) and op.name == "bitand": arg = op.arg(0) mask = op.arg(1) if isinstance(mask, Constant) and mask.value == 1: if parity_of(arg) is EVEN: op.make_equal_to(Constant(0)) continue elif parity_of(arg) is ODD: op.make_equal_to(Constant(1)) continue # Emit result.append(op) # Analyze transfer = getattr(Parity, op.name) args = [parity_of(arg.find()) for arg in op.args] parity[op] = transfer(*args) return result

Remember: because we use union-find to rewrite instructions in the optimizer (make_equal_to), later uses of the same instruction get the new optimized version "for free" (find).

Let's see how it works on our IR:

v0 = getarg(0) v1 = getarg(1) v2 = lshift(v0, 1) v3 = lshift(v1, 1) v4 = add(v2, v3) v6 = dummy(0)

Hey, neat! bitand disappeared and the argument to dummy is now the constant 0 because we know the lowest bit.

Wrapping up

Hopefully you have gained a little bit of an intuitive understanding of abstract interpretation. Last year, being able to write some code made me more comfortable with the math. Now being more comfortable with the math is helping me write the code. It's nice upward spiral.

The two abstract domains we used in this post are simple and not very useful in practice but it's possible to get very far using slightly more complicated abstract domains. Common domains include: constant propagation, type inference, range analysis, effect inference, liveness, etc. For example, here is a a sample lattice for constant propagation:

"-inf"; bottom -> "-2"; bottom -> "-1"; bottom -> 0; bottom -> 1; bottom -> 2; bottom -> "+inf"; "-inf" -> negative; "-2" -> negative; "-1" -> negative; 0 -> top; 1 -> nonnegative; 2 -> nonnegative; "+inf" -> nonnegative; negative -> nonzero; nonnegative -> nonzero; nonzero->top; {rank=same; "-inf"; "-2"; "-1"; 0; 1; 2; "+inf"} {rank=same; nonnegative; negative;} } -->

It has multiple levels to indicate more and less precision. For example, you might learn that a variable is either 1 or 2 and be able to encode that as nonnegative instead of just going straight to top.

Check out some real-world abstract interpretation in open source projects:

Known bits in LLVM
Constant range in LLVM
But I am told that the ranges don't form a lattice (see Interval Analysis and Machine Arithmetic: Why Signedness Ignorance Is Bliss)
Tristate numbers for known bits in Linux eBPF
Range analysis in Linux eBPF
GDB prologue analysis of assembly to understand the stack and find frame pointers without using DWARF (some docs)

If you have some readable examples, please share them so I can add.

Acknowledgements

Thank you to CF Bolz-Tereick for the toy optimizer and helping edit this post!

In the words of abstract interpretation researchers Vincent Laviron and Francesco Logozzo in their paper Refining Abstract Interpretation-based Static Analyses with Hints (APLAS 2009):

The three main elements of an abstract interpretation are: (i) the abstract elements ("which properties am I interested in?"); (ii) the abstract transfer functions ("which is the abstract semantics of basic statements?"); and (iii) the abstract operations ("how do I combine the abstract elements?").

We don't have any of these "abstract operations" in this post because there's no control flow but you can read about them elsewhere! ↩
These abstract values are arranged in a lattice, which is a mathematical structure with some properties but the most important ones are that it has a top, a bottom, a partial order, a meet operation, and values can only move in one direction on the lattice.

Using abstract values from a lattice promises two things:
- The analysis will terminate
- The analysis will be correct for any run of the program, not just one sample run
↩
Something about __match_args__ and @property... ↩

Categories: FLOSS Project Planets

FSF Events: Free Software Directory meeting on IRC: Friday, July 26, starting at 12:00 EDT (16:00 UTC)

GNU Planet! - Wed, 2024-07-24 10:45

Join the FSF and friends on Friday, July 26 from 12:00 to 15:00 EDT (16:00 to 19:00 UTC) to help improve the Free Software Directory.

Categories: FLOSS Project Planets

Tag1 Consulting: Drupal Workspaces: A Game-changer for Site Wide Content Staging

Planet Drupal - Wed, 2024-07-24 10:02

Join us as Andrei Mateescu demonstrates the Workspaces module's powerful capabilities for enterprise-level Drupal sites. Discover how the module allows preview and management of extensive content changes and integrates with core functionalities like translations and Layout Builder. Although currently labeled experimental, Workspaces is already in use in production environments and will become a stable part of Drupal Core.

Read more michaelemeyers Wed, 07/24/2024 - 07:02

Categories: FLOSS Project Planets

Real Python: Hugging Face Transformers: Leverage Open-Source AI in Python

Planet Python - Wed, 2024-07-24 10:00

Transformers is a powerful Python library created by Hugging Face that allows you to download, manipulate, and run thousands of pretrained, open-source AI models. These models cover multiple tasks across modalities like natural language processing, computer vision, audio, and multimodal learning. Using pretrained open-source models can reduce costs, save the time needed to train models from scratch, and give you more control over the models you deploy.

In this tutorial, you’ll learn how to:

Navigate the Hugging Face ecosystem
Download, run, and manipulate models with Transformers
Speed up model inference with GPUs

Throughout this tutorial, you’ll gain a conceptual understanding of Hugging Face’s AI offerings and learn how to work with the Transformers library through hands-on examples. When you finish, you’ll have the knowledge and tools you need to start using models for your own use cases. Before starting, you’ll benefit from having an intermediate understanding of Python and popular deep learning libraries like pytorch and tensorflow.

Get Your Code: Click here to download the free sample code that shows you how to use Hugging Face Transformers to leverage open-source AI in Python.

Take the Quiz: Test your knowledge with our interactive “Hugging Face Transformers” quiz. You’ll receive a score upon completion to help you track your learning progress:

Interactive Quiz

Hugging Face Transformers

In this quiz, you'll test your understanding of the Hugging Face Transformers library. This library is a popular choice for working with transformer models in natural language processing tasks, computer vision, and other machine learning applications.

The Hugging Face Ecosystem

Before using Transformers, you’ll want to have a solid understanding of the Hugging Face ecosystem. In this first section, you’ll briefly explore everything that Hugging Face offers with a particular emphasis on model cards.

Exploring Hugging Face

Hugging Face is a hub for state-of-the-art AI models. It’s primarily known for its wide range of open-source transformer-based models that excel in natural language processing (NLP), computer vision, and audio tasks. The platform offers several resources and services that cater to developers, researchers, businesses, and anyone interested in exploring AI models for their own use cases.

There’s a lot you can do with Hugging Face, but the primary offerings can be broken down into a few categories:

Models: Hugging Face hosts a vast repository of pretrained AI models that are readily accessible and highly customizable. This repository is called the Model Hub, and it hosts models covering a wide range of tasks, including text classification, text generation, translation, summarization, speech recognition, image classification, and more. The platform is community-driven and allows users to contribute their own models, which facilitates a diverse and ever-growing selection.
Datasets: Hugging Face has a library of thousands of datasets that you can use to train, benchmark, and enhance your models. These range from small-scale benchmarks to massive, real-world datasets that encompass a variety of domains, such as text, image, and audio data. Like the Model Hub, 🤗 Datasets supports community contributions and provides the tools you need to search, download, and use data in your machine learning projects.
Spaces: Spaces allows you to deploy and share machine learning applications directly on the Hugging Face website. This service supports a variety of frameworks and interfaces, including Streamlit, Gradio, and Jupyter notebooks. It is particularly useful for showcasing model capabilities, hosting interactive demos, or for educational purposes, as it allows you to interact with models in real time.
Paid offerings: Hugging Face also offers several paid services for enterprises and advanced users. These include the Pro Account, the Enterprise Hub, and Inference Endpoints. These solutions offer private model hosting, advanced collaboration tools, and dedicated support to help organizations scale their AI operations effectively.

These resources empower you to accelerate your AI projects and encourage collaboration and innovation within the community. Whether you’re a novice looking to experiment with pretrained models, or an enterprise seeking robust AI solutions, Hugging Face offers tools and platforms that cater to a wide range of needs.

This tutorial focuses on Transformers, a Python library that lets you run just about any model in the Model Hub. Before using transformers, you’ll need to understand what model cards are, and that’s what you’ll do next.

Understanding Model Cards

Model cards are the core components of the Model Hub, and you’ll need to understand how to search and read them to use models in Transformers. Model cards are nothing more than files that accompany each model to provide useful information. You can search for the model card you’re looking for on the Models page:

Hugging Face Models page

On the left side of the Models page, you can search for model cards based on the task you’re interested in. For example, if you’re interested in zero-shot text classification, you can click the Zero-Shot Classification button under the Natural Language Processing section:

Hugging Face Models page filtered for zero-shot text classification models

In this search, you can see 266 different zero-shot text classification models, which is a paradigm where language models assign labels to text without explicit training or seeing any examples. In the upper-right corner, you can sort the search results based on model likes, downloads, creation dates, updated dates, and popularity trends.

Each model card button tells you the model’s task, when it was last updated, and how many downloads and likes it has. When you click a model card button, say the one for the facebook/bart-large-mnli model, the model card will open and display all of the model’s information:

A Hugging Face model card

Even though a model card can display just about anything, Hugging Face has outlined the information that a good model card should provide. This includes detailed information about the model, its uses and limitations, the training parameters and experiment details, the dataset used to train the model, and the model’s evaluation performance.

A high-quality model card also includes metadata such as the model’s license, references to the training data, and links to research papers that describe the model in detail. In some model cards, you’ll also get to tinker with a deployed instance of the model via the Inference API. You can see an example of this in the facebook/bart-large-mnli model card:

Tinker with Hugging Face models using the Inference API Read the full article at https://realpython.com/huggingface-transformers/ »

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

The Drop Times: 5 Basic Rules to Keep your Website Dependencies Secure

Planet Drupal - Wed, 2024-07-24 08:31

In web security, maintaining a secure Drupal site involves more than just core updates—it's about managing a complex web of dependencies. In this article, Grzegorz Pietrzak explores the critical steps every Drupal site maintainer should take to safeguard their sites against potential vulnerabilities. From keeping Composer and dependencies up-to-date to leveraging automated tools, discover practical strategies to fortify your Drupal site against modern threats. Stay ahead of security risks with these essential tips and insights.

Categories: FLOSS Project Planets

Kentaro Hayashi: apt-upgrade-canary - PoC apt JSON hook use case

Planet Debian - Wed, 2024-07-24 08:03

apt-upgrade-canary is a helper program to alert when upgrading packages via apt.

If there are some packages which causes a critical or serious bug, it shows warnings for terminal.

Then you can stay on current version of package by canceling upgrades.

This program is aimed to help not to installing problematic packages via manual upgrades.

Usually, you should delegate package upgrade to unattended-upgrades and apt-listbugs.

apt-listbugs kindly warns if there are known bugs.

In most cases it is enough safer.

But there is one exception, it is not true that always apt-listbugs can follow very latest critical or serious bugs in timely manner.

apt-upgrade-canary checks UDD mirror database (it take into account to cache query not to throw redundant one frequently)

Getting started

Install required packages.

$ sudo apt install -y make ruby-pg ruby-json ruby-term-ansicolor

Clone https://salsa.debian.org/kenhys/apt-upgrade-canary.
Just sudo make install.

Use case in action

If there is no serious bugs which is reported against target packages, apt-upgrade-canary reports no error.

apt-upgrade-canary: no problem

If there are something weird happen, it reports matched bugs.

apt-upgrade-canary: serious case

So you can stay on.

ランキング参加中テクノロジー

Categories: FLOSS Project Planets

Real Python: Quiz: Logging in Python

Planet Python - Wed, 2024-07-24 08:00

In this quiz, you’ll test your understanding of Python’s logging module.

Logging is a very useful tool in a programmer’s toolbox. It can help you develop a better understanding of the flow of a program and discover scenarios that you might not have thought of while developing.

Logs provide developers with an extra set of eyes that are constantly looking at the flow an application is going through.

They can store information, like which user or IP accessed the application. If an error occurs, then they can provide more insights than a stack trace by telling you what the state of the program was before it arrived and the line of code where the error occurred.

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Real Python: Quiz: Python Protocols: Leveraging Structural Subtyping

Planet Python - Wed, 2024-07-24 08:00

Test your understanding of how to create and use Python protocols while providing type hints for your functions, variables, classes, and methods.

Take this quiz after reading our Python Protocols: Leveraging Structural Subtyping tutorial.

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

GSoC '24 Progress: Week 7 and 8

Planet KDE - Wed, 2024-07-24 08:00

Multiple Subtitle Track

I continued to refine the feature proposed in my previous blog. We can now add new layers directly on the timeline by simply dragging the existing subtitle out of the bottom border of the subtitle track. Adding, moving, and deleting subtitles work as before, now with layer support.

I also added an indicator to the header of the subtitle track. It looks like this:

Besides setting a style to a specific subtitle event, I also plan to add the feature of setting different default styles for different subtitle layers. This will allow us to easily apply a consistent style to groups of subtitles within each layer.

Improved Subtitle Manager

Layer management is now integrated into the subtitle manager, giving it a fresh new look.

The duplicate and delete operations now work for layers as well.

Automatic Conversion of .srt Subtitle

To better test and develop the style feature, I switched the subtitle storage format to .ass. With the help of my mentor, we can now automatically convert the .srt files from old projects to .ass files while keeping the original .srt file.

There are still some minor issues with style conversion, such as incorrect font sizes. However, I believe it’s time to shift my focus to the styling widget and address these bugs later. The next two weeks will be dedicated to style management, which is the most important part of the project, so stay tuned!

Categories: FLOSS Project Planets

Real Python: Quiz: Hugging Face Transformers

Planet Python - Wed, 2024-07-24 08:00

In this quiz, you’ll test your understanding of Hugging Face Transformers. This library is a popular choice for working with transformer models in natural language processing tasks, computer vision, and other machine learning applications.

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Search form

Tag cloud

Feeds

Qt Creator 14 released

Hailey Schoelkopf: Voices of the Open Source AI Definition

Gary Benson: Python atomic counter

The Drop Times: Drupal Cafe Lutsk #25 Recap: Key Insights and Community Support

Formatting Selected Text in QML

EuroPython Society: EuroPython 2024 Code of Conduct Transparency Report

OSI at the United Nations OSPOs for Good

FSF News: Let's not celebrate CrowdStrike -- let's point to a better way

Acquia Developer Portal Blog: Drupal 11 Preparation Checklist

Acquia Developer Portal Blog: The Power of Drupal 11

PyPy: Abstract interpretation in the Toy Optimizer

FSF Events: Free Software Directory meeting on IRC: Friday, July 26, starting at 12:00 EDT (16:00 UTC)

Tag1 Consulting: Drupal Workspaces: A Game-changer for Site Wide Content Staging

Real Python: Hugging Face Transformers: Leverage Open-Source AI in Python

The Drop Times: 5 Basic Rules to Keep your Website Dependencies Secure

Kentaro Hayashi: apt-upgrade-canary - PoC apt JSON hook use case

Real Python: Quiz: Logging in Python

Real Python: Quiz: Python Protocols: Leveraging Structural Subtyping

GSoC '24 Progress: Week 7 and 8

Real Python: Quiz: Hugging Face Transformers

Pages

Recent Publications

FLOSS Project Planets

FLOSS Research

Search form

Tag cloud

You are here

Feeds

Pages

Recent Publications

FLOSS Project Planets

FLOSS Research