Feeds

Real Python: The Real Python Podcast – Episode #213: Constraint Programming & Exploring Python's Built-in Functions

Planet Python - Fri, 2024-07-19 08:00

What are discrete optimization problems? How do you solve them with constraint programming in Python? Christopher Trudeau is back on the show this week, bringing another batch of PyCoder's Weekly articles and projects

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Real Python: Quiz: Python Strings and Character Data

Planet Python - Fri, 2024-07-19 08:00

This quiz will evaluate your understanding of Python’s string data type and test your knowledge about manipulating textual data with string objects. You’ll cover the basics of creating strings using literals and the str() function, applying string methods, using operators and built-in functions with strings, indexing and slicing strings, and more!

Take this quiz after reading our Strings and Character Data in Python tutorial.

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Debug Academy: Evaluating Acquia storage limits - "Emergency upsize" notification

Planet Drupal - Fri, 2024-07-19 06:54
Evaluating Acquia storage limits - "Emergency upsize" notification

In addition to best-in-class Drupal training courses (pun intended), we at Debug Academy provide Drupal development services and Drupal 7 migration services. Our clients may self-host, host with Acquia, Pantheon, or even host with us.

Recently, a client who hosts their Drupal 9 website on Acquia reached out to to ask us to investigate an alert they had received from Acquia. Acquia sent an email which said "This email is to inform you that we emergency upsized your storage from 200GB to 300GB FS in Case #[redacted]. The cost to keep this upsize in place [is..]"

ashrafabed Fri, 07/19/2024
Categories: FLOSS Project Planets

PyBites: How to convert a Python script into a web app, a product others can use

Planet Python - Fri, 2024-07-19 06:02

So, you’re a Python developer or you just use Python to make your life easier by writing utility scripts to automate repetitive tasks and boost your productivity.
Such Python scripts are usually local to your machine, run from the command line and require some Python skills to use including for instance setting up a virtual environment and knowing the basics of using the CLI.
You may have wondered what it involves to convert a script like this into a web app to make it available for other users such as your family members and co-workers.

In this article, I’ll walk through the general process I followed to “convert” a Python utility script into a full-blown web app by building a backend API using Django and Django Rest Framework and a single-page-app frontend using React.js.

But first, here is a bit of a background…

My family often needed me to merge (combine) PDF files for them. When they apply for online services they’re often required to upload all documents as a single PDF file.

As they only use Windows, they couldn’t find a reliable tool of merging PDFs. The apps they found worked only sometimes, they also didn’t want to upload sensitive docs to random PDF merging websites.
Being the family tech guy, they would ask me for help. They would send me the files to merge and I’d send them back a single PDF file; merged and ready to upload.

I used pdftk for a while. It’s available only on Linux (as far as I know) and the command looks like this:

pdftk 1.pdf 2.pdf 3.pdf cat output out.pdf

I then upgraded to a tiny Python script using PyPDF. Something like this:

from pathlib import Path from pypdf import PdfWriter merger = PdfWriter() path = Path.home() / "Documents" / "uni-application" with merger as PdfWriter: for file in path.glob('*.pdf'): merger.append(file) merger.write("merged.pdf")

I had to set the path manually every time and rename the PDFs so that they are merged in the desired order.

At some point, I felt I was merging PDFs for the family too often that I needed a way to expose this script to them somehow. I thought I should just build a web interface for it, host it on my local home server and let them merge as many files as they want without needing my help.

There are many ways to package and distribute a Python script including making a stand-alone executable. However, unless you build an intuitive GUI for it, average users may not be able to benefit from it.
From all GUI options out there, I chose to make it a web app mainly because web apps have a common look and an expected functionality, users are used to interacting with them almost everyday and they run (hopefully) everywhere.

So, as a side little project over months, I built a basic REST-ish API for the backend that allows you to create a merge request, add PDF files to it, merge the files and get back a single PDF file to download and use.
I also built a simple frontend for it using React.js.

For deployment, I used docker to run the app on my home server and expose it on the LAN with a custom “.home” local domain. Now, my family use it almost everyday to merge PDFs on their own and everyone is happy.

I’ll go over the general process I followed to build this small web app that serves only one purpose: merging PDF files.
You can swap PDF merging for any functionality like: converting audio/video files, resizing images, extracting text from PDFs or images, etc. The process is pretty much the same.

This is not a step-by-step guide. I’ll just explain the overall process. Tutorials on any technology mentioned here can be easily found online. Also, I won’t clutter the article with code examples as you can refer to the source code in the Github repo.

Backend API and Auth

I chose to go with Django and Django Rest Framework for building the backend as I’m more familiar with these two than other Python options like Flask and FastAPI. In addition, being a battery-included framework, Django comes with a lot of nice features I needed for this specific app, the most important of which are authentication and ORM.

Why the API? Why not just full-stack Django?

I could have just built a typical Django app that works as the backend and frontend at the same time. But I knew for sure I’m going to build a REST API alongside it at some point down the road, so why not just start by an API?
Moreover, the API-based backend is frontend-agnostic. It doesn’t assume a specific client-side technology. I can swap React for Vue for example. I can also build more clients to consume the API like a mobile or desktop app.

Building the API:

I started by thinking what the original Python script does and what the user usually wants to get back using it. In this case the user wants a way to:

  • upload at least 2 PDF files,
  • merge the PDF files on the server,
  • and download the single merged PDF file.
Models

To satisfy these requirements, I started a new Django project with a “merger” app, installed Django Rest Framework, added the models to save everything in the DB. Meaning, when a user creates a merge request, it’s saved to the DB to track various things about it like how many files it has and whether it’s been merged or not. Later, we can utilize this data and build upon it if we decide for example to put each user on a plan/package with a limited number of merges/files per merge etc.

I used the name “order” for the merge request effectively treating it as a transaction. My idea is to make things easier later if the app turns into a paid service.

Serializers

Then I created DRF serializers which are responsible for converting data from Python to JSON format and back (APIs expect and return JSON, not Python data types like dicts).

They say: don’t trust user uploads. So, in serializers also, I validate the uploaded PDF file to make sure they’re indeed a PDF file by verifying that the content type is ‘pdf’ and checking MIME type by reading the first 2084 bits of the file.
This is not a bullet-proof validation system. It may keep away casual malicious users but it won’t probably protect against determined attackers. Nevertheless, it prevents common user errors like uploading the wrong file format.

The validation also checks the file size to make sure it doesn’t exceed the limit set in settings.py, otherwise users can intentionally or naively upload huge files and block server response while uploading.
If the file fails to validate, the API will return a message containing the reason for rejection.

Although you can leave some of this validation for the frontend, you should always expect that some advanced users may go around frontend validation by trying to send requests directly to the API or even build their own frontend. After all, the API is open and can be consumed from any client. This off course doesn’t mean that they can bypass auth or permissions but by having validation in the backend, if they use a custom-built client, they can’t bypass the backend validation.

Views & Endpoints

I then switched to building the views which are essentially the endpoints the frontend will call to interact with the backend.
Based on the requirements of this specific app, following are the API endpoints with a short description of what each does:

GET: /orders/

when a GET request is sent to this endpoint, it returns back a list all orders that belong to the currently authenticated user.

POST: /orders/

A POST request with a payload containing the merge name like: {“name”: “my merge”} creates a new order (merge request).

GET: /orders/{uuid}/

returns details of an order identified by its id

DELETE /orders/{uuid}/

deletes an order

POST /orders/{uuid}/add_files

A POST request with a payload containing a file add files to an order after validating file type and max files limit. It also checks to see if the order has already been merged in which case no files can be added.

GET /orders/{uuid}/files/

lists files of an order

GET /orders/{uuid}/merge/

merges order identified by its id.
This is where the core feature of the app lives. This view/endpoint does some initial checks to verify that the order has at least 2 files to merge and has not been previously merged then it hands work over to a utility function that preforms the actual merging of all PDF files associated with the order:

import os import uuid from pypdf import PdfWriter from pypdf.errors import PdfReadError, PyPdfError from django.conf import settings from .exceptions import MergeException def merge_pdf_files(pdf_files): merger = PdfWriter() for f in pdf_files: try: merger.append(f.file) except FileNotFoundError: raise MergeException merged_path = os.path.join(settings.MEDIA_ROOT, f"merged_pdfs/merged_pdf_{uuid.uuid4()}.pdf") try: merger.write(merged_path) except FileNotFoundError: raise MergeException except PdfReadError: raise MergeException except PyPdfError: raise MergeException merger.close() if os.path.isfile(merged_path): return merged_path return None

Nothing fancy here, the function takes a list of PDF files, merges them using PyPDF and returns the path for the merged PDF.

On a successful merge, the view takes the path and sets it as the value for download_url property of the order instance for the next endpoint to use. It also marks the order and all of its files as merged. This can be used to cleanup all merged orders and their associated files to save server space.

GET /orders/{uuid}/download/

download the merged PDF of an order after verifying that it has been merged and ready for download. The API allows a max number of downloads of each order and max time on server. This prevents users from keeping merged files on the server forever and sharing the links, turning the app basically into a free file sharing service.

DELETE /files/{uuid}/

delete a file identified by its id

I then wired the urls to the views connecting each endpoint to the corresponding view.

Auth and Permissions:

To insure privacy (not necessarily security), and to allow the app to be used as a paid service later, I decided to require an account for every user to use the app. Thanks to Django’s built-in features, this can be done fairly easily with the auth module. However, I didn’t want to use the traditional session-based authentication flow. I just prefer the token-based auth as it’s more suitable for API-based apps.

I chose to use JWT (JSON Web Token). Basically it’s a way to allow the user to exchange their credentials for a set of tokens: an access token and a refresh token.
The access token will be attached in the header of every request (prefixed with “Bearer “) that requires authentication. Otherwise the request will fail. This token has shorter life span (usually hours or even minutes) and when it expires, the user can send the refresh token to get a new access token and continue using the app.

There are a number of packages that can add JWT auth flow to Django. Most of them use simple-jwt which you can use directly but it requires you to write more code yourself to implement the minimum register/login/out flow.
I went with Djoser which is a REST implementation of Django’s auth system.
It allowed me to use a custom user model to have extra fields in the user table and most importantly, utilize email/password for registration and login instead of Django’s default username/password although I had to tweak the models in the example project to work as intended.
Djoser also gave me the following endpoints for free:

  • GET /auth/users/ : lists all users if you’re an admin. Only returns your user info if you aren’t admin.
  • POST /auth/users/ : a POST request with a payload containing: name, email and password will register a new user.
  • POST /auth/jwt/create: a POST request with a payload containing: email and password will return access and refresh tokens to use for authenticating subsequent requests.
  • POST /auth/refresh : a POST request with a payload containing: the refresh token will return a new access token.
    as well as some other useful endpoints for changing and resetting passwords.

On top of Django’s auth, DRF uses permissions to decide what API resources a user can access. In short, in the API, I check if the user is logged in first. Then I have only two permissions:

  • check if the user is the owner of the order before allowing them to access it, upload files to it, merge it or download it.
  • check if the user is the owner of the order that a file is associated with before allowing them to view the file details or delete it.
    Failure to meet required permissions causes the API to raise a permission error and return a descriptive message.

During development of the backend, I used the browser extension “ModHeader” to include the access token in all requests I made through the browser (for testing via DRF built-in web UI).

Frontend

I chose React.js for building the frontend as a single-page-app (SPA) because it’s relatively easy to use for small apps and has a huge community. I won’t go into much details of how the frontend is built (this is a Python blog after all) but I will touch on the main points.

Auth in the frontend

First, here is a brief description of the auth workflow in the frontend:

  • when a user first visits the app, they’re asked to login or signup to continue.
  • the user can signup by filling out their name, email, password. This form data will be sent via an API “POST” request to the backend endpoint /auth/users and register the user.
  • for existing users, email and password are sent in a “POST” request to /auth/jwt/create which will return a pair of tokens: refresh and access. These tokens are saved in browser cookies. The access token will be sent in the header of all subsequent request to authenticate the user. When it expires, the frontend will request a new token on behalf of the user by sending the refresh token to /auth/refresh. If both expire, the user will be redirected to login again to obtain a new set of tokens.
  • when a user navigates to /logout, all tokens are cleared from browser cookies and the user is redirected to /login route.
Routes:

I built the necessary React component and routes for the app to roughly match the backend endpoints I discussed earlier. These are the routes the app has for authenticated users:

  • / => home screen to choose to create a new merge or list previous merges.
  • /orders => list all merges for the currently logged in user with links to edit or delete any order.
  • /create => create new merge.
  • /order/{id} =>details of a merge by id.
  • /logout => logs the user out.
Merging workflow:

The workflow for merging a PDF file from the frontend is as follows:

  • when the user creates a new merge, they’re redirected to the merge detail route showing the merge they’ve just created with three buttons: add PDFs, merge, download.
  • merge detail route allows the user to upload PDF files to the merge. If the files are less than 2, only “Add PDF files” is enabled. When the user adds 2 files the “merge” button is activated. When the files reach the max number of files allowed in a single merge (5 currently) the “Add PDFs” button is disabled and the upload form is hidden.
  • When the user is done adding PDFs, they can click “merge” which will merge the files on the server and activate only the download button.
  • clicking the download button opens a new browser tab with the merged PDF.
UI & CSS:

For UI, the app uses daisyUI, a free component library that makes it easier to use TailwindCSS. The latter is super popular in the frontend world as a utility-first CSS framework.

Deployment:

I’ve not deployed the app to a real production server yet as the home server environment is very forgiving and you can skip some steps that you wouldn’t skip in a production deployment.
For now, I just have a basic Dockerfile and a docker-compose file to spin up the backend API (a regular Django project) and have it ready to accept calls from the frontend.

Likewise, a set of docker files is used to spin up the frontend. After building it using “npm run build”, the docker file copies the deployable app from the “dist” folder to the Nginx document root folder inside the docker container and just runs it as any other website hosted on a web server.
This setup is probably enough for development and hosting locally. When it comes time to publish the on the web, “real” deployment considerations must be taken.

It’s worth noting that I have a separate repo for backend and frontend to keep both ends decoupled from each other. The backend API can be consumed from any frontend be it a web, mobile or desktop app.

Further improvements:

The app in its current state works and does the intended job. It’s far from perfect and can use some improvements. I’ll include these in the README of the repos.

Source code: Conclusion

In this article I walked through the general process of how I built a web app for a Python script to make it available for use by average end users.
Python scripts are great starting points for apps. They’re also a source of inspiration for app ideas. If you have a script that performs a common daily-life task, consider building an app for it. The process will teach you a ton on the lifecycle of app development. It forces you to think of and account for new aspects you don’t usually consider when writing a stand-alone script.
As you build the app though, always remember to:

  • keep it simple. Don’t over complicate things.
  • ship fast. Aim at building an MVP (Minimum Viable Product) with the necessary functionality. Don’t wait until you’ve built every feature you think the app should have. Instead, ship it and then iterate on it and add features slowly as they’re needed.
  • not to feel intimidated by other mature projects out there. They’ve been built most likely over a long period of time and have been iterated over tens or even thousands of times before they reached their mature state they are in today.

I hope you found this article helpful and I look forward to seeing you in a future one.

Categories: FLOSS Project Planets

GNU Taler news: Video interview with Mikolai Gütschow on payments for the Internet of Things

GNU Planet! - Fri, 2024-07-19 04:52
On the occasion of the Point Zero Forum's Innovation Tour, Evgeny Grin has interviewed Mikolai Gütschow who designed and implemented solutions for the payments in the Internet of Things (IoT).
Categories: FLOSS Project Planets

Matt Layman: Activation Email Job - Building SaaS #196

Planet Python - Thu, 2024-07-18 20:00
In this episode, we chatted about managing dependencies and the cost of maintenance. Then we got into some feature work and began building a job that will send users an email as reminder to activate their account shortly before it expires.
Categories: FLOSS Project Planets

Quansight Labs Blog: The convoluted story behind `np.top_k`

Planet Python - Thu, 2024-07-18 20:00
In this blog post, I describe my experience as a first-time contributor to NumPy and talk about the story behind `np.top_k`.
Categories: FLOSS Project Planets

GNUnet News: The European Union must keep funding free software

GNU Planet! - Thu, 2024-07-18 18:00
The European Union must keep funding free software

The GNUnet project was granted NGI funding via NLnet . Other FOSS related projects also benefit from NGI funding. This funding is now at risk for future projects.

The following is an open letter initially published in French by the Petites Singularités association. To co-sign it, please publish it on your website in your preferred language, then add yourself to this table .

Open Letter to the European Commission.

Since 2020, Next Generation Internet ( NGI ) programmes, part of European Commission’s Horizon programme, fund free software in Europe using a cascade funding mechanism (see for example NLnet’s calls ). This year, according to the Horizon Europe working draft detailing funding programmes for 2025, we notice that Next Generation Internet is not mentioned any more as part of Cluster 4.

NGI programmes have shown their strength and importance to supporting the European software infrastructure, as a generic funding instrument to fund digital commons and ensure their long-term sustainability. We find this transformation incomprehensible, moreover when NGI has proven efficient and economical to support free software as a whole, from the smallest to the most established initiatives. This ecosystem diversity backs the strength of European technological innovation, and maintaining the NGI initiative to provide structural support to software projects at the heart of worldwide innovation is key to enforce the sovereignty of a European infrastructure. Contrary to common perception, technical innovations often originate from European rather than North American programming communities, and are mostly initiated by small-scaled organizations.

Previous Cluster 4 allocated 27 million euros to:

  • “Human centric Internet aligned with values and principles commonly shared in Europe” ;
  • “A flourishing internet, based on common building blocks created within NGI, that enables better control of our digital life” ;
  • “A structured ecosystem of talented contributors driving the creation of new internet commons and the evolution of existing internet commons”.

In the name of these challenges, more than 500 projects received NGI funding in the first 5 years, backed by 18 organisations managing these European funding consortia.

NGI contributes to a vast ecosystem, as most of its budget is allocated to fund third parties by the means of open calls, to structure commons that cover the whole Internet scope - from hardware to application, operating systems, digital identities or data traffic supervision. This third-party funding is not renewed in the current program, leaving many projects short on resources for research and innovation in Europe.

Moreover, NGI allows exchanges and collaborations across all the Euro zone countries as well as “widening countries” 1 , currently both a success and an ongoing progress, likewise the Erasmus programme before us. NGI also contributes to opening and supporting longer relationships than strict project funding does. It encourages implementing projects funded as pilots, backing collaboration, identification and reuse of common elements across projects, interoperability in identification systems and beyond, and setting up development models that mix diverse scales and types of European funding schemes.

While the USA, China or Russia deploy huge public and private resources to develop software and infrastructure that massively capture private consumer data, the EU can’t afford this renunciation. Free and open source software, as supported by NGI since 2020, is by design the opposite of potential vectors for foreign interference. It lets us keep our data local and favors a community-wide economy and know-how, while allowing an international collaboration. This is all the more essential in the current geopolitical context: the challenge of technological sovereignty is central, and free software allows addressing it while acting for peace and sovereignty in the digital world as a whole.

  1. As defined by Horizon Europe, widening Member States are Bulgaria, Croatia, Cyprus, Czechia, Estonia, Greece, Hungary, Latvia, Lituania, Malta, Poland, Portugal, Romania, Slovakia, and Slovenia. Widening associated countries (under condition of an association agreement) include Albania, Armenia, Bosnia, Feroe Islands, Georgia, Kosovo, Moldavia, Montenegro, Morocco, North Macedonia, Serbia, Tunisia, Turkeye, and Ukraine. Widening overseas regions are Guadeloupe, French Guyana, Martinique, Reunion Island, Mayotte, Saint-Martin, The Azores, Madeira, the Canary Islands. ↩︎

Categories: FLOSS Project Planets

Cailean Osborne: voices of the Open Source AI Definition

Open Source Initiative - Thu, 2024-07-18 13:09

The Open Source Initiative (OSI) is running a series of stories about a few of the people involved in the Open Source AI Definition (OSAID) co-design process. Today, we are featuring Cailean Osborne, one of the volunteers who has helped to shape and are shaping the OSAID.

Question: What’s your background related to Open Source and AI?

My interest in Open Source AI began around 2020 when I was working in AI policy at the UK Government. I was surprised that Open Source never came up in policy discussions, given its crucial role in AI R&D. Having been a regular user of libraries like scikit-learn and PyTorch in my previous studies. I followed Open Source AI trends in my own time and eventually I decided to do a PhD on the topic. When I started my PhD back in 2021, Open Source AI still felt like a niche topic, so it’s been exciting to watch it become a major talking point over the years. 

Beyond my PhD, I’ve been involved in Open Source AI community as a contributor to scikit-learn and as a co-developer of the Model Openness Framework (MOF) with peers from the Generative AI Commons community. Our goal with the MOF is to provide guidance for AI researchers and developers to evaluate the completeness and openness of “Open Source” models based on open science principles. We were chuffed that the OSI team chose to use the 16 components from the MOF as the rubric for reviewing models in the co-design process. 

Question: What motivated you to join this co-design process to define Open Source AI?

The short answer is: to contribute to establishing an accurate definition for “Open Source AI” and to learn from all the other experts involved in the co-design process. The longer answer is: There’s been a lot of confusion about what is or is not “Open Source AI,” which hasn’t been helped by open-washing. “Open source” has a specific definition (i.e. the right to use, study, modify, and redistribute source code) and what is being promoted as “Open Source AI” deviates significantly from this definition. Rather than being pedantic, getting the definition right matters for several reasons; for example, for the “Open Source” exemptions in the EU AI Act to work (or not work), we need to know precisely what “Open Source” models actually are. Andreas Liesenfeld and Mark Dingemanse have written a great piece about the issues of open-washing and how they relate to the AI Act, which I recommend reading if you haven’t yet. So, I got involved to help develop a definition and to learn from all the other experts involved. It hasn’t been easy (it’s a pretty divisive topic!), but I think we’ve made good progress.

Question: Can you describe your experience participating in this process? What did you most enjoy about it and what were some of the challenges you faced?

First off, I have to give credit to Stef and Mer for maintaining momentum throughout the process. Coordinating a co-design effort with volunteers scattered around the globe, each with varying levels of availability and (strong) opinions on the matter, is no small feat. So, well done! I also enjoyed seeing how others agreed or disagreed when reviewing models. The moments of disagreement were the most interesting; for example, about whether training data should be available versus documented and if so, in how much detail… Personally, the main challenge was searching for information about the various components of models that were apparently “Open Source” and observing how little information was actually provided beyond weights, a model card, and if you’re lucky an arXiv preprint or technical report.

Question: Why do you think AI should be Open Source?

When talking about the benefits of Open Source AI, I like to point folks to a 2007 paper, in which 16 researchers highlighted “The Need for Open Source Software in Machine Learning” due to basically the complete lack of OSS for ML/AI at the time. Fast forward to today, AI R&D is practically unthinkable without OSS, from data tooling to the deep learning frameworks used to build LLMs. Open source and openness in general have many benefits for AI, from enabling access to SOTA AI technologies and transparency which is key for reproducibility, scrutiny, and accountability to widening participation in their design, development, and governance. 

Question: What do you think is the role of data in Open Source AI?

If the question is strictly about the role of data in developing open AI models, the answer is pretty simple: Data plays a crucial role because it is needed for training, testing, aligning, and auditing models. But if the question is asking “should the release of data be a condition for an open model to qualify as Open Source AI,” then the answer is obviously much more complicated. 

Companies are in no rush to share training data due to a handful of reasons: be it competitive advantage, data protection, or frankly being sued for copyright infringement. The copyright concern isn’t limited to companies: EleutherAI has also been sued and had to take down the Books3 dataset from The Pile. There are also many social and cultural concerns that restrict data sharing; for example, the Kōrero Kaitiakitanga license has been developed to protect the interests of indigenous communities in New Zealand. So, the data question isn’t easy and perhaps we shouldn’t be too dogmatic about it.  

Personally, I think the compromise in v. 0.0.8, which states that model developers should provide sufficiently detailed information about data if they can’t release the training dataset itself, is a reasonable halfway house. I also hope to see more open pre-training datasets like the one developed by the community-driven BigScience Project, which involved open deliberation about the design of the dataset and provides extensive documentation about data provenance and processing decisions (e.g. check out their Data Catalogue). The FineWeb dataset by Hugging Face is another good example of an open pre-training dataset, which they released with pre-processing code, evaluation results, and super detailed documentation.

Question: Has your personal definition of Open Source AI changed along the way? What new perspectives or ideas did you encounter while participating in the co-design process?

To be honest, my personal definition hasn’t changed much. I am not a big fan of the use of “Open Source AI” when folks specifically mean “open models” or “open-weight models”. What we need to do is raise awareness about appropriate terminology and point out “open-washing”, as people have done, and I must say that subjectively I’ve seen improvements: less “Open Source models” and more “open models”. But I will say that I do find “Open Source AI” a useful umbrella term for the various communities of practice that intertwine in the development of open models, including OSS, open data, and AI researchers and developers, who all bring different perspectives and ways of working to the overarching “Open Source AI” community.

Question: What do you think the primary benefit will be once there is a clear definition of Open Source AI?

We’ll be able to reduce confusion about what is or isn’t “Open Source AI” and more easily combat open-washing efforts. As I mentioned before, this clarity will be beneficial for compliance with regulations like the AI Act which includes exemptions for “Open Source” AI.  

Question: What do you think are the next steps for the community involved in Open Source AI?

We still have many steps to take but I’ll share three for now.

First, we urgently need to improve the auditability and therefore the safety of open models. With OSS, we know that (1) the availability of source code and (2) open development enable the distributed scrutiny of source code. Think Linus’ Law: “Given enough eyeballs, all bugs are shallow.” Yet open models are more complex than just source code, and the lack of openness of many key components like training data is holding back adoption because would-be adopters can’t adequately run due diligence tests on the models. If we want to realise the benefits of “Open Source AI,” we need to figure out how to increase the transparency and openness of models —we hope the Model Openness Framework can help with this. 

Second, I’m really excited about grassroots initiatives that are leading community-driven approaches to developing open models and open datasets like the BigScience project. They’re setting an example of how to do “Open Source AI” in a way that promotes open collaboration, transparency, reproducibility, and safety from the ground up. I can still count such initiatives with my fingers but I am hopeful that we will see more community-driven efforts in the future.
Third, I hope to see the public sector and non-profit foundations get more involved in supporting public interest and grassroots initiatives. France has been a role model on this front: providing a public grant to train the BigScience project’s BLOOM model on the Jean Zay supercomputer, as well as funding the scikit-learn team to build out a data science commons.

Categories: FLOSS Research

The Drop Times: The Chief Who Drives and Is Driven by Drupal: A Talk with Dries Buytaert

Planet Drupal - Thu, 2024-07-18 12:07
Join The DropTimes (TDT) for its milestone 100th interview with Dries Buytaert, the innovative founder of Drupal. Interviewed by Anoop John, Founder and Lead of The DropTimes, this conversation explores Drupal’s rich history and transformative journey. Dries shares key moments that boosted Drupal’s adoption, insights on community growth through events like DrupalCon, and the impact of the Drupal Starshot initiative. He discusses strategies for making Drupal more accessible, integrating AI, and effective community communication. This interview captures Drupal’s evolution and future aspirations, offering valuable insights for both seasoned users and newcomers. Don’t miss this engaging discussion celebrating Drupal’s ongoing impact and future.
Categories: FLOSS Project Planets

mark.ie: My LocalGov Drupal contributions for week-ending July 19th, 2024

Planet Drupal - Thu, 2024-07-18 12:00

Here's what I've been working on for my LocalGov Drupal contributions this week. Thanks to Big Blue Door for sponsoring the time to work on these.

Categories: FLOSS Project Planets

Enrico Zini: meson, includedir, and current directory

Planet Debian - Thu, 2024-07-18 09:16

Suppose you have a meson project like this:

meson.build:

project('example', 'cpp', version: '1.0', license : '…', default_options: ['warning_level=everything', 'cpp_std=c++17']) subdir('example')

example/meson.build:

test_example = executable('example-test', ['main.cc'])

example/string.h:

/* This file intentionally left empty */

example/main.cc:

#include <cstring> int main(int argc,const char* argv[]) { std::string foo("foo"); return 0; }

This builds fine with autotools and cmake, but not meson:

$ meson setup builddir The Meson build system Version: 1.0.1 Source dir: /home/enrico/dev/deb/wobble-repr Build dir: /home/enrico/dev/deb/wobble-repr/builddir Build type: native build Project name: example Project version: 1.0 C++ compiler for the host machine: ccache c++ (gcc 12.2.0 "c++ (Debian 12.2.0-14) 12.2.0") C++ linker for the host machine: c++ ld.bfd 2.40 Host machine cpu family: x86_64 Host machine cpu: x86_64 Build targets in project: 1 Found ninja-1.11.1 at /usr/bin/ninja $ ninja -C builddir ninja: Entering directory `builddir' [1/2] Compiling C++ object example/example-test.p/main.cc.o FAILED: example/example-test.p/main.cc.o ccache c++ -Iexample/example-test.p -Iexample -I../example -fdiagnostics-color=always -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Wextra -Wpedantic -Wcast-qual -Wconversion -Wfloat-equal -Wformat=2 -Winline -Wmissing-declarations -Wredundant-decls -Wshadow -Wundef -Wuninitialized -Wwrite-strings -Wdisabled-optimization -Wpacked -Wpadded -Wmultichar -Wswitch-default -Wswitch-enum -Wunused-macros -Wmissing-include-dirs -Wunsafe-loop-optimizations -Wstack-protector -Wstrict-overflow=5 -Warray-bounds=2 -Wlogical-op -Wstrict-aliasing=3 -Wvla -Wdouble-promotion -Wsuggest-attribute=const -Wsuggest-attribute=noreturn -Wsuggest-attribute=pure -Wtrampolines -Wvector-operation-performance -Wsuggest-attribute=format -Wdate-time -Wformat-signedness -Wnormalized=nfc -Wduplicated-cond -Wnull-dereference -Wshift-negative-value -Wshift-overflow=2 -Wunused-const-variable=2 -Walloca -Walloc-zero -Wformat-overflow=2 -Wformat-truncation=2 -Wstringop-overflow=3 -Wduplicated-branches -Wattribute-alias=2 -Wcast-align=strict -Wsuggest-attribute=cold -Wsuggest-attribute=malloc -Wanalyzer-too-complex -Warith-conversion -Wbidi-chars=ucn -Wopenacc-parallelism -Wtrivial-auto-var-init -Wctor-dtor-privacy -Weffc++ -Wnon-virtual-dtor -Wold-style-cast -Woverloaded-virtual -Wsign-promo -Wstrict-null-sentinel -Wnoexcept -Wzero-as-null-pointer-constant -Wabi-tag -Wuseless-cast -Wconditionally-supported -Wsuggest-final-methods -Wsuggest-final-types -Wsuggest-override -Wmultiple-inheritance -Wplacement-new=2 -Wvirtual-inheritance -Waligned-new=all -Wnoexcept-type -Wregister -Wcatch-value=3 -Wextra-semi -Wdeprecated-copy-dtor -Wredundant-move -Wcomma-subscript -Wmismatched-tags -Wredundant-tags -Wvolatile -Wdeprecated-enum-enum-conversion -Wdeprecated-enum-float-conversion -Winvalid-imported-macros -std=c++17 -O0 -g -MD -MQ example/example-test.p/main.cc.o -MF example/example-test.p/main.cc.o.d -o example/example-test.p/main.cc.o -c ../example/main.cc In file included from ../example/main.cc:1: /usr/include/c++/12/cstring:77:11: error: ‘memchr’ has not been declared in ‘::’ 77 | using ::memchr; | ^~~~~~ /usr/include/c++/12/cstring:78:11: error: ‘memcmp’ has not been declared in ‘::’ 78 | using ::memcmp; | ^~~~~~ /usr/include/c++/12/cstring:79:11: error: ‘memcpy’ has not been declared in ‘::’ 79 | using ::memcpy; | ^~~~~~ /usr/include/c++/12/cstring:80:11: error: ‘memmove’ has not been declared in ‘::’ 80 | using ::memmove; | ^~~~~~~ …

It turns out that meson adds the current directory to the include path by default:

Another thing to note is that include_directories adds both the source directory and corresponding build directory to include path, so you don't have to care.

It seems that I have to care after all.

Thankfully there is an implicit_include_directories setting that can turn this off if needed.

Its documentation is not as easy to find as I'd like (kudos to Kangie on IRC), and hopefully this blog post will make it easier for me to find it in the future.

Categories: FLOSS Project Planets

Python Software Foundation: PSF Board update on improvements to the PSF Grants program

Planet Python - Thu, 2024-07-18 07:00

In December 2023 we received an open letter from a coalition of organizers from the pan-African Python community asking the PSF to address concerns and frustrations around our Grants Program. The letter writers agreed to meet with us in December and January to go into more detail and share more context from the pan-African community. Since then, we have been doing a lot of listening and discussing how to better serve the community with the input that was offered.

The PSF Board takes the open letter from the pan-African delegation seriously, and we began to draft a plan to address everything in the letter. We also set up improved two-way communications so that we can continue the conversation with the community. The writers of the open letter have now met several times with members of the PSF board. We are thankful for their insight and guidance on how we can work together and be thoroughly and consistently supportive of the pan-African Python community.

We care a lot about building consensus and ensuring that we are promising solutions that have support and a realistic workflow. Building an achievable plan that meets the needs of the community has involved work for the PSF’s small staff. It also included additional conversations with and input from the volunteers who serve on the Board and in our working groups, especially the Grants Working Group. We are grateful for the input as well as the opportunity to improve.

Plans and progress on the Grants Program

Here is what’s already been done:

  • Set up Grants Program Office Hours to open up a line of casual sustained communication between the community and our staff members who support the grants program. Several sessions have already taken place.
  • The PSF contracted Carol Willing to do a retrospective on the DjangoCon Africa review and approval and make suggestions for improvements or changes. We published her report in March.
  • We published a transparency report for our grants numbers from the last two years, and plan to publish a report on our grants work for every year going forward so we can continue to work in the open on continually improving the grants program.  
  • In May, the board voted that we will not override or impose any country-specific human rights regulation for Python communities when deciding whether or not to fund community-run Python or Python-related events. The Grants Program will use the same criteria for all grant requests, no matter their country of origin. This does not affect our criteria for choosing a specific US state for PyCon US and it does not change our ability to fund events in countries that are sanctioned by the US government (where the PSF is based.) Finally, the Grants Working Group will still require a robust and enforceable code of conduct and we expect local organizers to choose what is appropriate for their local community when drafting their code of conduct.


What is on our roadmap:

  • With community input, we’ll be overhauling the grant application process and requirements for applications. Our goal is to make the process inclusive and the administrative requirements as lightweight as possible, while not creating additional legal or administrative work.
  • We’re conducting a thorough examination of our grant priorities by subject matter and location. We hope to make requesting and reviewing grants for activities beyond events easier.
  • Continuing to reimagine the PSF Board’s responsibility within the community. Please read on for our thought process and work in this area.


Reevaluating PSF Board member communications and conduct norms and standards

We discussed Board member conduct and communications norms – both past and future – at our retreat in January. We realize that some things were said by past and current Board members that did not reflect the PSF’s outlook or values. We are working to ensure current and future Board members understand the power their communications have on our community. Understanding the expectations and responsibilities that come with service on the PSF Board is part of orientation for service. Going forward we plan to invest more time into this topic during our PSF Board orientations.

The Board has agreed to hold each other accountable and use the position of PSF Board member responsibly in communications with the community. We acknowledge that PSF Board members have not always recognized the impact that their comments have had on community members, either in private or in public. Going forward, community members can report board and board-delegated working group members’ conduct to individuals who do not serve on the board. Two members of the PSF’s Code of Conduct Working Group (Jeff Triplett (jeff.triplett@pyfound.org) and Tereza Iofciu (email is coming)) have volunteered to receive these reports and handle them separately. At a time that Jeff or Tereza are unable to receive these reports, other non-board members of the Code of Conduct working group will be nominated to manage such reports.

Moving forward together

Moving forward, the PSF Board and Staff will continue to prioritize transparency through the form of the Grants Office Hours and yearly reports. Our focus will move from response to charter, process, and documentation improvements based on the findings we have made. The PSF Board will continue to conduct annual orientations and ad hoc check-ins on our communication and conduct standards. We welcome you to send your questions, comments, and suggestions for the Grants Program to grants@pyfound.org.

As the great Maya Angelou has said, “Do the best you can until you know better. Then, when you know better, do better.” We want to thank the pan-African community for showing us that we can do better and we look forward to being a community partner that can be counted on to hear criticism and continually make changes that improve our service to the Python community.

Categories: FLOSS Project Planets

Python Insider: Python 3.13.0 beta 4 released

Planet Python - Thu, 2024-07-18 06:16

I'm pleased to announce the release of Python 3.13 beta 4.

https://www.python.org/downloads/release/python-3130b4/

 This is a beta preview of Python 3.13

Python 3.13 is still in development. This release, 3.13.0b4, is the final beta release preview of 3.13.

Beta release previews are intended to give the wider community the opportunity to test new features and bug fixes and to prepare their projects to support the new feature release.

We strongly encourage maintainers of third-party Python projects to test with 3.13 during the beta phase and report issues found to the Python bug tracker as soon as possible. While the release is planned to be feature complete entering the beta phase, it is possible that features may be modified or, in rare cases, deleted up until the start of the release candidate phase (Tuesday 2024-07-30). Our goal is to have no ABI changes after this final beta release, and as few code changes as possible after 3.13.0rc1, the first release candidate. To achieve that, it will be extremely important to get as much exposure for 3.13 as possible during the beta phase.

Please keep in mind that this is a preview release and its use is not recommended for production environments.

 Major new features of the 3.13 series, compared to 3.12

Some of the new major new features and changes in Python 3.13 are:

New features Typing Removals and new deprecations
  • PEP 594 (Removing dead batteries from the standard library) scheduled removals of many deprecated modules: aifc, audioop, chunk, cgi, cgitb, crypt, imghdr, mailcap, msilib, nis, nntplib, ossaudiodev, pipes, sndhdr, spwd, sunau, telnetlib, uu, xdrlib, lib2to3.
  • Many other removals of deprecated classes, functions and methods in various standard library modules.
  • C API removals and deprecations. (Some removals present in alpha 1 were reverted in alpha 2, as the removals were deemed too disruptive at this time.)
  • New deprecations, most of which are scheduled for removal from Python 3.15 or 3.16.

(Hey, fellow core developer, if a feature you find important is missing from this list, let Thomas know.)

For more details on the changes to Python 3.13, see What’s new in Python 3.13. The next pre-release of Python 3.13 will be 3.13.0rc1, the first release candidate, currently scheduled for 2024-07-30.

 More resources  Enjoy the new releases

Thanks to all of the many volunteers who help make Python Development and these releases possible! Please consider supporting our efforts by volunteering yourself or through organization contributions to the Python Software Foundation.

 

Your release team,
Thomas Wouters
Łukasz Langa
Ned Deily
Steve Dower 

Categories: FLOSS Project Planets

Behind the Scenes of Embedded Updates

Planet KDE - Thu, 2024-07-18 04:00

An over-the-air (OTA) update capability is an increasingly critical part of any embedded product to close cybersecurity vulnerabilities, allow just-in-time product rollouts, stomp out bugs, and deliver new features. We’ve talked about some of the key structural elements that go into an embedded OTA architecture before. But what about the back end? Let’s address some of those considerations now.

The challenges of embedded connectivity

The ideal of a constant Internet connection is more aspiration than reality for many embedded devices. Sporadic connections, costly cellular or roaming charges, and limited bandwidth are common hurdles. These conditions necessitate smart management of update payloads and robust retry strategies that can withstand interruptions, resuming where they left off without getting locked in a continually restarting update cycle.

There are other ways to manage spotty connections. Consider using less frequent update schedules or empower users to initiate updates. These strategies however have trade-offs, including the potential to miss critical security patches. One way to strike a balance is to implement updates as either optional or mandatory, or flag updates as mandatory only when critical, allowing users to pace out updates when embedded connectivity isn’t reliable.

To USB or not to USB

When network access is very unreliable, or even just plain absent, then USB updates are indispensable for updating device software. These updates can also serve as effective emergency measures or for in-field support. While the process of downloading and preparing a USB update can often be beyond a normal user’s capability, it’s a critical fallback and useful tool for technical personnel.

OTA servers: SaaS or self-hosted

Deciding between software as a service (SaaS) and self-hosted options for your OTA server is a decision that impacts not just the update experience but also compliance with industry and privacy regulations. While SaaS solutions can offer ease and reliability, certain scenarios may necessitate on-premise servers. If you do need to host an OTA server yourself, you’ll need to supply the server hardware and assign a maintenance team to manage it. But you may not have to build it all from scratch – you can still call in the experts with proven experience in setting up self-hosted OTA solutions.

Certificates: The bedrock of OTA security

SSL certificates are non-negotiable for genuine and secure OTA updates. They verify your company as the authentic source of updates. Choosing algorithms with the longest (comparatively equivalent) key lengths will extend the reliable lifespan of these certificates. However, remember that certificates do expire; having a game plan in place to deal with expired certificates will allow you to avoid the panic of an emergency scramble if it should happen unexpectedly.

Accurate timekeeping is also essential for validating SSL certificates. A functioning and accurate real-time clock that is regularly NTP/SNTP synchronized is critical. If timekeeping fails, your certificates won’t be validated properly, causing all sorts of issues. (We recommend reading our OTA best practice guide for advice on what to do proactively and reactively with invalidated or expired certificates.

Payload encryption: Non-negotiable

Encrypted update payloads are imperative as a safeguard against reverse-engineering and content tampering. This is true for OTA updates as well as any USB or offline updates. Leveraging the strongest possible encryption keys that your device can handle will enhance security significantly.

Accommodating the right to repair

The growing ‘right to repair’ movement and associated legislation imply that devices should support updates outside of your organization’s tightly controlled processes. This may mean that you need to provide a manual USB update to meet repair requirements without exposing systems to unauthorized OTA updates. To prevent your support team from struggling with amateur software updates, you’ll want to configure your device to set a flag when unauthorized software has been loaded. This status can be checked by support teams to invalidate support or warranty agreements.

Summary

By carefully navigating the critical aspects of OTA updates, such as choosing the right hosting option and managing SSL certificates and encryption protocols, your embedded systems can remain up-to-date and secure under any operating conditions. While this post introduces the issues involved in embedded-system updates, there is much more to consider for a comprehensive strategy. For a deeper exploration and best practices in managing an embedded product software update strategy, please visit our best practice guide, Updates Outside the App Store.

About KDAB

If you like this article and want to read similar material, consider subscribing via our RSS feed.

Subscribe to KDAB TV for similar informative short video content.

KDAB provides market leading software consulting and development services and training in Qt, C++ and 3D/OpenGL. Contact us.

The post Behind the Scenes of Embedded Updates appeared first on KDAB.

Categories: FLOSS Project Planets

GSoC 2024: Midterm Updates For MankalaEngine

Planet KDE - Wed, 2024-07-17 20:00
Design Considerations

One of the main focuses while designing this engine was flexibility, to ensure the library is usable for a wide variety of Mancala variants. While certain abstractions are necessary to achieve this goal, it’s also important not to overly abstract the library.

Provided Functionality

MankalaEngine provides classes to assist in creating computerized opponents for many Mancala variants. As mentioned earlier, these classes are designed with a degree of abstraction to allow for greater flexibility on the developer’s end. In addition to these base classes, a concrete implementation tailored for the Bohnenspiel variant [1] is also provided.

The Board struct

The board struct is the base struct for a Mancala game board. It is used to specify the structure of the board used by a variant. As of now, this struct allows for a board with an arbitrary number of holes and two stores, one per player, as this seems to be the case for most Mancala games.

The Rules class

The rules class is the base class for the rules of a Mancala variant. It is used to specify the behaviour when making a move, what constitutes a valid move, when the game is over, etc. The only rule assumed to be shared by all variants is that the winning player is the one with more pebbles in their store, as this seems to be relatively common in Mancala variants.

The moveselection functions

The functions provided in the moveselection file are general adversarial search functions that can be used to select moves for Mancala games. In addition to Minimax [2] and MTDF-f [3], random selection and user selection functions are also provided.

The Minimax move selection function

Minimax works by recursively exploring the game tree, considering each player’s moves and assuming that each player chooses the optimal move to maximize their chances of winning. If we’re scoring a Mancala position using an evaluation function that subtracts the pebbles in Player 2’s store from the pebbles in Player 1’s store, this means that Player 1 will want to maximize the score, while Player 2 will want to minimize it. The diagram below shows how a position is evaluated using the Minimax algorithm.

Each node represents a specific board configuration, and each level of the tree represents a turn of play. The tree nodes are squares on Player 1’s turn and circles on Player 2’s turn. Leaf nodes (nodes without children) are scored using the evaluation function. The rest of the nodes are scored by selecting the best score out of their children nodes - highest score if it’s Player 1’s turn and lowest score if it’s Player 2’s turn.

The Minimax implementation in the library also uses alpha-beta pruning, a technique used to reduce the number of nodes evaluated in the tree by eliminating branches that are guaranteed to be worse than previously examined branches.

A great explanation of Minimax and Alpha-beta prunning can be found in Sebastian Lague’s Youtube video about this algorithm.

The MTD-f move selection function

MTD-f works by repeatedly calling Minimax until it converges to a value. The Minimax used by MTD-f is implemented using alpha-beta pruning.

Since MTD-f calls Minimax several times, it’s also important to use a transposition table, which is a data structure that stores previously evaluated positions and their scores.

Below is Aske Plaat’s pseudo-code for the algorithm.

function MTDF(root : node_type; f : integer; d : integer) : integer; g := f; upperbound := +INFINITY; lowerbound := -INFINITY; repeat if g == lowerbound then beta := g + 1 else beta := g; g := Minimax(root, beta - 1, beta, d); if g < beta then upperbound := g else lowerbound := g; until lowerbound >= upperbound; return g; Evaluating Mancala positions

The static evaluation function used in this library consists of subtracting the pebbles in Player 2’s store from the pebbles in Player 1’s store. This is the same function that was used when solving the Mancala variant Kalah [4].

This way of scoring Mancala positions is particulary suitable for MTD-f, since, according to Plaat, the static evaluation function used should be coarse-grained, meaning that we should avoid using heuristics with little weight. As he says in his post about MTD(f), “The coarser the grain of eval, the less passes MTD(f) has to make to converge to the minimax value. Some programs have a fine grained evaluation function, where positional knowledge can be worth as little as one hundredst of a pawn.” [3].

The MankalaEngine class

The MankalaEngine class ties everything together. When instatiating it, you’ll need to choose a move selection function. It then provides a function, play, that, given the player whose turn it is to play, the rules to use, and the board in which the move will be played, executes the move selected by its move selection function. This allows reusing the common structure of a play across all Mancala variants while deferring variant-specific behaviour to the rules object.

bool MankalaEngine::play(Player player, const Rules& rules, Board& state) const { if (rules.gameOver(player, state)) { const Player winner = player == player_1 ? player_2 : player_1; rules.finishGame(winner, state); return false; } const int move = _selectMove(player, rules, state); rules.move(move, player, state); return true; } Next steps

The idea is to continue adding concrete variant implementations to the library so that developers wanting to create a Mancala game don’t have to implement them themselves. Additionally, adding the option to choose the difficulty of an opponent is also relevant. This may be implemented, for example, by allowing changes to the cutoff depth for the Minimax and MTD-f opponents, which is not currently supported.

As of now, the implemented Minimax only uses alpha-beta prunning and transposition tables. Adding more optimizations, such as move ordering, per example, might also be of interest.

Another possible route is developing a Qt application for playing Mancala that uses this engine. This would likely help generate interest in the project within the broader community.

If you’re interested in this project, you can check it out on Invent and come talk to us on Matrix.

References

1. “Das Bohnenspiel”, Wikipedia, 2023. https://en.wikipedia.org/wiki/Das_Bohnenspiel.

2. “Algorithms - Minimax” https://cs.stanford.edu/people/eroberts/courses/soco/projects/2003-04/intelligent-search/minimax.html.

3. “Aske Plaat: MTD(f), a new chess algorithm.” https://people.csail.mit.edu/plaat/mtdf.html.

4. G. Irving, J. Donkers, and J. Uiterwijk, “Solving Kalah”, Icga journal, vol. 23, no. 3, pp. 139–147, Sep. 2000, doi: 10.3233/ICG-2000-23303.

Categories: FLOSS Project Planets

scikit-learn: Interview with Yao Xiao, scikit-learn Team Member

Planet Python - Wed, 2024-07-17 20:00
Author: Reshama Shaikh , Yao Xiao

Yao Xiao recently earned his undergraduate degree in mathematics and computer science. He will be pursuing a Master’s degree in Computational Science and Engineering at Harvard SEAS. Yao joined the scikit-learn team in February 2024.

  1. Tell us about yourself.

    My name is Yao Xiao and I live in Shanghai, China. At the time of interview I have just got my Bachelor’s degree in Honors Mathematics and Computer Science at NYU Shanghai, and I’m going to pursue a Master’s degree in Computational Science and Engineering at Harvard SEAS. My current research interests are in networks and systems (e.g. sys4ml and ml4sys), but this may change in the future.

  2. How did you first become involved in open source and scikit-learn?

    In my junior year I took a course at NYU Courant called Open Source Software Development where we needed to make contributions to an open source software as our final project - and I chose scikit-learn.

  3. We would love to learn of your open source journey.

    I was lucky to get involved in a pretty easy meta-issue when I first started contributing to scikit-learn. I made quite a few PRs towards that issue, familiarizing myself with the coding standards, contributing workflow etc., and during which I gradually explored the codebase and learned a lot from maintainers how to write better code. After that meta-issue was completed, I decided to continue contributing since I enjoyed the experience, and I started looking through the open issues, tried reproducing and investigating them, then opened PRs for those that I was able to solve. It is the process of familiarizing with more parts of the codebase, being able to make more PRs, so on and so forth. While contributing to scikit-learn, sometimes there are also issues to solve upstream, so I also had opportunities to contribute to projects like pandas and pydata-sphinx-theme. Up till today I’m still far from familiar with the entire scikit-learn project, but I will definitely continue the amazing open-source journey.

  4. To which OSS projects and communities do you contribute?

    I have contributed to scikit-learn, pandas, pydata-sphinx-theme, sphinx-gallery. I’m also writing some small softwares that I decide to make open source.

  5. What do you find alluring about OSS?

    It is amazing to feel that my code is being used by so many people all around the world through contributing to open source projects. Well it might be inappropriate to say “my code”, but I do feel like making some actual contributions to the community instead of just writing code for myself. Also OSS makes me care about code quality and so on instead of merely making things “work”, which is very important for programmers but not really taught in school.

  6. What pain points do you observe in community-led OSS?

    Collaboration can lead to better code but also slows down the development process. Especially when there are not enough reviewers around, issues and PRs can easily get stale or forgotten. But I would say it’s more like a tradeoff rather than a pain point.

  7. If we discuss how far OS has evolved in 10 years, what would you like to see happen?

    I couldn’t say about the past 10 years since I’ve only been involved for about one and a half years, but regarding the scientific Python ecosystem I would like to see better coordination across projects (which is already happening). For instance a common interface for array libraries and dataframe libraries would allow downstream dependents to easily provide more flexible support for different input/output types, etc. And as a Chinese I would also hope that open source can thrive in my country some day as well.

  8. What are your favorite resources, books, courses, conferences, etc?

    As for physical books I would recommend The Pragmatic Programmer by Andy Hunt and Dave Thomas, and Refactoring: Improving the Design of Existing Code by Martin Fowler and Kent Back. As for courses I like MIT’s The Missing Semester of Your CS Education. In particular about learning Python, The Python Tutorial in the official Python documentation is good enough for me. By the way I want to mention that documentations of most languages and popular packages are very nice and they are the best place to learn the most up-to-date information.

  9. What are your hobbies, outside of work and open source?

    I would say my largest hobby is programming (not for school, not for work, just for fun). I’ve recently been fascinated with Tauri and wrote a lot of small desktop applications for myself in my spare time. Apart from this I also love playing the piano and I’m an anime lover, so I often listen to or play piano versions of anime theme songs (mostly arranged by Animenz).

Categories: FLOSS Project Planets

Dries Buytaert: Join the Drupal Starshot team as a track lead

Planet Drupal - Wed, 2024-07-17 19:24

The Drupal Starshot initiative has been making significant progress behind the scenes, and I'm excited to share some updates with the community.

Leadership team formation and product definition

Over the past few months, we've been working diligently on Drupal Starshot. One of our first steps was to appoint a leadership team to guide the project. With the leadership team in place as well as the new Starshot Advisory Council, we shifted our focus to defining the product. We've made substantial progress on this front and will be sharing more details about the product strategy in the coming weeks.

Introducing Drupal Starshot tracks

We already started to break down the initiative into manageable components, and are introducing the concept of "tracks". Tracks are smaller, focused parts of the Drupal Starshot project that allow for targeted development and contributions. We've already published the first set of tracks on the Drupal Starshot issue queue on Drupal.org.

Example tracks include:

  1. Creating Drupal Recipes for features like contact forms, advanced search, events, SEO and more.
  2. Enhancing the Drupal installer to enable Recipes during installation.
  3. Updating Drupal.org for Starshot, including product marketing and a trial experience.

While many tracks are technical and need help from developers, most of the tracks need contribution from designers, UX experts, marketers, testers and site builders.

Recruiting more track leads

Several tracks already have track leads and have made significant progress:

However, we need many additional track leads to drive our remaining tracks to completion.

We're now accepting applications for track lead positions. Interested individuals and organizations can apply by completing our application form. The application window closes on July 31st, two weeks from today.

Key responsibilities of a track lead

Track leads can be individuals, teams, or organizations, including Drupal Certified Partners. While technical expertise is beneficial, the role primarily focuses on strategic coordination and project management. Key responsibilities include:

  • Defining and validating requirements to ensure the track meets the expectations of our target audience.
  • Developing and maintaining a prioritized task list, including creating milestones and timelines.
  • Overseeing and driving the track's implementation.
  • Collaborating with key stakeholders, including the Drupal Starshot leadership team, module maintainers, the marketing team, etc.
  • Communicating progress to the community (e.g. blogging).
Track lead selection and announcement

After the application deadline, the Drupal Starshot Leadership Team will review the applications and appoint track leads. We expect to announce the selected track leads in the first week of August.

While the application period is open, we will be available to answer any questions you may have. Feel free to reach out to us through the Drupal.org issue queue, or join us in an upcoming zoom meeting (details to be announced / figured out).

Looking ahead to DrupalCon Barcelona

Our goal is to make significant progress on these tracks by DrupalCon Barcelona, where we plan to showcase the advancements we've made. We're excited about the momentum building around Drupal Starshot and can't wait to see the contributions from the community.

If you're passionate about Drupal and want to play a key role in shaping its future, consider applying for a track lead position.

Stay tuned for more updates on Drupal Starshot, and thank you for your continued support of the Drupal community.

Categories: FLOSS Project Planets

Dirk Eddelbuettel: Rcpp 1.0.13 on CRAN: Some Updates

Planet Debian - Wed, 2024-07-17 17:50

The Rcpp Core Team is once again pleased to announce a new release (now at 1.0.13) of the Rcpp package. It arrived on CRAN earlier today, and has since been uploaded to Debian. Windows and macOS builds should appear at CRAN in the next few days, as will builds in different Linux distribution–and of course r2u should catch up tomorrow too. The release was uploaded last week, but not only does Rcpp always gets flagged because of the grandfathered .Call(symbol) but CRAN also found two packages ‘regressing’ which then required them to take five days to get back to us. One issue was known; another did not reproduce under our tests against over 2800 reverse dependencies leading to the eventual release today. Yay. Checks are good and appreciated, and it does take time by humans to review them.

This release continues with the six-months January-July cycle started with release 1.0.5 in July 2020. As a reminder, we do of course make interim snapshot ‘dev’ or ‘rc’ releases available via the Rcpp drat repo as well as the r-universe page and repo and strongly encourage their use and testing—I run my systems with these versions which tend to work just as well, and are also fully tested against all reverse-dependencies.

Rcpp has long established itself as the most popular way of enhancing R with C or C++ code. Right now, 2867 packages on CRAN depend on Rcpp for making analytical code go faster and further, along with 256 in BioConductor. On CRAN, 13.6% of all packages depend (directly) on Rcpp, and 59.9% of all compiled packages do. From the cloud mirror of CRAN (which is but a subset of all CRAN downloads), Rcpp has been downloaded 86.3 million times. The two published papers (also included in the package as preprint vignettes) have, respectively, 1848 (JSS, 2011) and 324 (TAS, 2018) citations, while the the book (Springer useR!, 2013) has another 641.

This release is incremental as usual, generally preserving existing capabilities faithfully while smoothing our corners and / or extending slightly, sometimes in response to changing and tightened demands from CRAN or R standards. The move towards a more standardized approach for the C API of R leads to a few changes; Kevin did most of the PRs for this. Andrew Johnsom also provided a very nice PR to update internals taking advantage of variadic templates.

The full list below details all changes, their respective PRs and, if applicable, issue tickets. Big thanks from all of us to all contributors!

Changes in Rcpp release version 1.0.13 (2024-07-11)
  • Changes in Rcpp API:

    • Set R_NO_REMAP if not already defined (Dirk in #1296)

    • Add variadic templates to be used instead of generated code (Andrew Johnson in #1303)

    • Count variables were switches to size_t to avoid warnings about conversion-narrowing (Dirk in #1307)

    • Rcpp now avoids the usage of the (non-API) DATAPTR function when accessing the contents of Rcpp Vector objects where possible. (Kevin in #1310)

    • Rcpp now emits an R warning on out-of-bounds Vector accesses. This may become an error in a future Rcpp release. (Kevin in #1310)

    • Switch VECTOR_PTR and STRING_PTR to new API-compliant RO variants (Kevin in #1317 fixing #1316)

  • Changes in Rcpp Deployment:

    • Small updates to the CI test containers have been made (#1304)

Thanks to my CRANberries, you can also look at a diff to the previous release Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page. Bugs reports are welcome at the GitHub issue tracker as well (where one can also search among open or closed issues).

If you like this or other open-source work I do, you can sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

Categories: FLOSS Project Planets

Kalyani Kenekar: Securing Your Website: Installing and Configuring Nginx with SSL

Planet Debian - Wed, 2024-07-17 14:30

The Initial Encounter:

I recently started to work with Nginx to explore the requirements on how to configure a then so called server block. It’s quite different than within Apache. But there are a tons of good websites out there which do explain the different steps and options quite well. I also realized quickly that I need to be able to configure my Nginx setups in a way so the content is delivered through https with some automatic redirection from http URLs.

  • Let’s install Nginx
Installing Nginx $ sudo apt update $ sudo apt install nginx Checking your Web Server
  • We can check now nginx service is active or inactive
Output ● nginx.service - A high performance web server and a reverse proxy server Loaded: loaded (/lib/systemd/system/nginx.service; enabled; vendor preset: enabled) Active: active (running) since Tue 2024-02-12 09:59:20 UTC; 3h ago Docs: man:nginx(8) Main PID: 2887 (nginx) Tasks: 2 (limit: 1132) Memory: 4.2M CPU: 81ms CGroup: /system.slice/nginx.service ├─2887 nginx: master process /usr/sbin/nginx -g daemon on; master_process on; └─2890 nginx: worker process
  • Now we successfully installed nginx and it in running state.
How To Secure Nginx with Let’s Encrypt on Debian 12
  • In this documentation, you will use Certbot to obtain a free SSL certificate for Nginx on Debian 12 and set up your certificate.
Step 1 — Installing Certbot

$ sudo apt install certbot python3-certbot-nginx

  • Certbot is now ready to use, but in order for it to automatically configure SSL for Nginx, we need to verify some of Nginx’s configuration.
Step 2 — Confirming Nginx’s Configuration
  • Certbot needs to be able to find the correct server block in your Nginx configuration for it to be able to automatically configure SSL. Specifically, it does this by looking for a server_name directive that matches the domain you request a certificate for. To check, open the configuration file for your domain using nano or your favorite text editor.

$ sudo vi /etc/nginx/sites-available/example.com

server { listen 80; root /var/www/html/; index index.html; server_name example.com location / { try_files $uri $uri/ =404; } location /test.html { try_files $uri $uri/ =404; auth_basic "admin area"; auth_basic_user_file /etc/nginx/.htpasswd; } }
  • Fillup above data your project wise and then save the file, quit your editor, and verify the syntax of your configuration edits.

$ sudo nginx -t

Step 3 — Obtaining an SSL Certificate
  • Certbot provides a variety of ways to obtain SSL certificates through plugins. The Nginx plugin will take care of reconfiguring Nginx and reloading the config whenever necessary. To use this plugin, type the following command line.

$ sudo certbot --nginx -d example.com

  • The configuration will be updated, and Nginx will reload to pick up the new settings. certbot will wrap up with a message telling you the process was successful and where your certificates are stored.
Output IMPORTANT NOTES: - Congratulations! Your certificate and chain have been saved at: /etc/letsencrypt/live/example.com/fullchain.pem Your key file has been saved at: /etc/letsencrypt/live/example.com/privkey.pem Your cert will expire on 2024-05-12. To obtain a new or tweaked version of this certificate in the future, simply run certbot again with the "certonly" option. To non-interactively renew *all* of your certificates, run "certbot renew" - If you like Certbot, please consider supporting our work by: Donating to ISRG / Let's Encrypt: https://letsencrypt.org/donate Donating to EFF: https://eff.org/donate-le
  • Your certificates are downloaded, installed, and loaded. Next check the syntax again of your configuration.

$ sudo nginx -t

  • If you get an error, reopen the server block file and check for any typos or missing characters. Once your configuration file’s syntax is correct, reload Nginx to load the new configuration.

$ sudo systemctl reload nginx

  • Try reloading your website using https:// and notice your browser’s security indicator. It should indicate that the site is properly secured, usually with a lock icon.

Now your website is secured by SSL usage.

Categories: FLOSS Project Planets

Pages