Feeds

Open Source AI Definition – Weekly update April 2

Open Source Initiative - Mon, 2024-04-01 17:10
Seeking document reviewers for Pythia and OpenCV
  • We are now in the process of reviewing legal documents to check the compatibility with the version 0.0.6 definition of open-source AI, specifically for Pythia and OpenCV.
    • Click here to see the past activities of the four working groups
  • To get involved, respond on the forum or message Mer here.
The data requirement: “Sufficiently detailed information” for what?
  • Central question: What criteria define “sufficiently detailed information”?
    • There is a wish to change the term “Sufficiently detailed information” to “Sufficiently detailed to allow someone to replicate the entire dataset” to avoid vagueness and solidify reproducibility as openness
  • Stefano points out that “reproducibility” in itself might not be a sustainable term due to its loaded connotations.
  • There’s a proposal to modify the Open Source AI Definition requirement to specify providing detailed information to replicate the entire dataset.
    • However, concerns arise about how this would apply to various machine learning methods where dataset replication might not be feasible.
Action on the 0.0.6 draft
  • Contribution concerned with the usage of the wording “deploy” under “Out of Scope Issues” in relation to code alone.
    • OSI has replied asking for clarification on the question, as “deploy” refers to the whole AI system, not just the code.
  • Contribution concerned with the wording of “learning, using, sharing and improving software systems” under “Why We Need Open Source Artificial Intelligence”. Specifically, when relating to AI as opposed to “traditional” software, there is a growing concern that these values might be broad compared to the impact, in terms of safety and ethics, AI can have.
    • OSI replied that while the ethics of AI will continue to be discussed, these discussions are out of the scope of this definition. This will be elaborated on in an upcoming FAQ.
Categories: FLOSS Research

The Drop Times: Drupal Page Builders—Part 3: Other Alternative Solutions

Planet Drupal - Mon, 2024-04-01 16:04
Venture into the realm of alternatives to Paragraphs and Layout Builder with the third installment of the Drupal Page Builder series by André Angelantoni, Senior Drupal Architect at HeroDevs, showcased on The DropTimes. This segment navigates through a variety of server-side rendered page generation solutions, offering a closer look at innovative modules that provide a broader range of page-building capabilities beyond Drupal's native tools. From the adaptability of Component Builder and the intuitive DXPR Page Builder to the cutting-edge HAX module utilizing W3C-standard web components, this article illuminates a path for developers seeking polished, ready-made components for their site builds. Before exploring advanced Drupal solutions, ensure you're caught up by reading the first two parts of the series, laying the groundwork for a comprehensive understanding of Drupal's extensive page-building ecosystem.
Categories: FLOSS Project Planets

The Drop Times: DrupalCamp Ouagadougou Concludes Successfully

Planet Drupal - Mon, 2024-04-01 16:04
Experience the highlights of DrupalCamp Ouagadougou! Dive into captivating pictures and relive the vibrant atmosphere of this successful event.
Categories: FLOSS Project Planets

Talking Drupal: Talking Drupal #444 - Design to Development Workflow Optimization

Planet Drupal - Mon, 2024-04-01 14:00

Today we are talking about design to development hand off, common complications, and ways to optimize your process with guest Crispin Bailey. We’ll also cover Office Hours as our module of the week.

For show notes visit: www.talkingDrupal.com/444

Topics
  • Primary activities of the team
  • Where does handoff start
  • Handoff artifact
  • Tools for collaboration
  • Figma
  • Evaluating new tools
  • Challenges of developers and designers working together
  • How can we optimize handoff
  • What steps can the dev team take to facilitate smooth handoff
  • Framework recommendation
  • Final quality
  • AI
Guests

Crispin Bailey - kalamuna.com crispinbailey

Hosts

Nic Laflin - nLighteneddevelopment.com nicxvan John Picozzi - epam.com johnpicozzi Anna Mykhailova - kalamuna.com amykhailova

MOTW Correspondent

Martin Anderson-Clutz - mandclu

  • Brief description:
    • Have you ever wanted to manage and display the hours of operation for a business on your Drupal site? There’s a module for that
  • Module name/project name:
  • Brief history
    • How old: created in Jan 2008 by Ozeuss, though recent releases are by John Voskuilen of the Netherlands
    • Versions available: 7.x-1.11 and 8.x-1.17
  • Maintainership
    • Actively maintained, latest release was 3 weeks ago
    • Security coverage
    • Test coverage
    • Documentation: no user guide, but a pretty extensive README
    • Number of open issues: 15 open issues, only 1 of which are bugs against the current branch, though it’s postponed for more info
  • Usage stats:
    • Almost 20,000 sites
  • Module features and usage
    • Previously covered in episode 113, more than 8 years ago, in the “Drupal 6 end of life” episode
    • The module provides a specialized widget to set the hours for each weekday, with the option to have more than one time slot per day
    • You can define exceptions, for example on stat holidays
    • You can also define seasons, with a start and end date, during which the hours are different
    • The module also offers a variety of options for formatting the output:
    • You can show days as ranges, for example Monday to Friday, 9am to 5pm, 12-hour or 24-hour clocks, and so on
    • Obviously it will show any exceptions or upcoming seasonal hours too
    • It can also show an “open now” or “closed now” indicator
    • It can create schema.org-compliant markup for openingHours, and has integration with the Schema.org Metatag module
    • Office Hours does all this with a new field type, so you could add it to Stores in a Drupal Commerce site, a Locations content type in a site for a bricks-and-mortar chain, or if you just need a single set of hours for the site, you should be able to use it with something like the Config Pages module
    • The README file also includes some suggestions on how to use Office Hours with Views, which can give you a lot of flexibility on where and how to show the information
   
Categories: FLOSS Project Planets

The Drop Times: The Power of Embracing New Challenges and Technologies

Planet Drupal - Mon, 2024-04-01 13:04

“The greater the obstacle, the more glory in overcoming it.” – Molière


Dear Readers,

Stepping out of our comfort zones is undoubtedly a daunting task. Yet, it's precisely this leap into the unknown that often leads to remarkable growth and self-discovery. Embracing new challenges and learning from scratch can feel overwhelming at first, but through these experiences, we truly push our limits and uncover our hidden capabilities.

In our journey of embracing the unfamiliar, we expand our skill sets and gain a deeper understanding of ourselves and the paths we never thought possible. Each new challenge becomes an opportunity to stretch beyond what we thought we were capable of, illuminating uncharted territories of potential and opportunity.

Embrace the technological diversity surrounding us, as it serves as a rich tapestry of tools and methodologies that can enhance our creativity, efficiency, and impact in ways we've only begun to explore. Like the inspiring journey of Tanay Sai, a seasoned builder, engineering leader, and AI/ML practitioner who recently embarked on a transformative adventure beyond the familiar horizons of Drupal. Tanay's story is a testament to the idea that stepping out of one's comfort zone can lead to groundbreaking achievements and a deeper understanding of the multifaceted digital ecosystem.

The importance of continuous learning and the willingness to embrace new challenges is profound. It encourages us to look beyond the familiar, to experiment with emerging technologies, and to remain adaptable in our pursuit of delivering exceptional digital experiences.

Now, Let's take a moment to revisit the highlights from last week's coverage at The Drop Times.

Last month, we celebrated the Women in Drupal community and released the second part of "Inspiring Inclusion: Celebrating the Women in Drupal | #2" penned by Alka Elizabeth. Part 3 of this series will be coming soon.

Explore the dynamic evolution of Drupal's page-building features in Part 2 of André Angelantoni's latest series on The Drop Times. Each module discussed extends Layout Builder and can be integrated individually. Part 3 might already be released by the time this newsletter comes your way. Access the second part here.

DrupalCon Portland 2024, scheduled from May 6 to 9, will feature an empowering Women in Drupal Lunch event. This gathering aims to uplift female attendees and inspire and support women within the Drupal community. Learn more here.

Save the dates for DrupalCamp Spain 2024 in Benidorm! The event is scheduled for October 25 and 26, with the venue to be announced soon. Additionally, mark your calendars for October 24, designated as Business Day.

DrupalCamp Belgium has unveiled the keynote speakers for its highly anticipated 2024 edition in Ghent. For more information and to discover the lineup of keynote speakers, be sure to check out the details here. A complete list of events for the week is available here. Additionally, Gander Documentation is now available on Drupal.org, as announced by Janez Urevc, Tag1 Consulting's Strategic Growth and Innovation Manager, on March 25, 2024.

Also, read about Tanay Sai, an accomplished builder, engineering leader, and AI/ML practitioner who shares insights into his transformative journey beyond Drupal. After a decade immersed in the Drupal realm, Tanay candidly expresses his pivotal decision to venture beyond its confines. Learn more here.

Monika Branicka of Droptica conducts a comprehensive analysis of content management systems (CMS) employed by 314 higher education institutions in Poland. This study aims to unveil the prevalent CMS preferences among both public and non-public universities, providing insights into the educational sector's technological landscape. The report comes amidst a growing call for resources to track Drupal usage across industry sectors, coinciding with similar studies conducted by The DropTimes and Grzegorz Pietrzak.

DevBranch has announced the launch of a Drupal BootCamp tailored for aspiring web developers. This initiative aims to equip individuals with the necessary skills and knowledge to excel in web development using Drupal. For further details, click here.

The development of Drupal 11 has reached a critical phase, marked by ongoing updates to its system requirements within the development branch. Gábor Hojtsy has provided valuable insights on preparing core developers and informing the community about these changes. Stay updated on the latest developments as Drupal 11 evolves to meet the needs of its users and developers.

We acknowledge that there are more stories to share. However, due to selection constraints, we must pause further exploration for now.

To get timely updates, follow us on LinkedIn, Twitter and Facebook. Also, join us on Drupal Slack at #thedroptimes.

Thank you,
Sincerely
Elma John
Sub-editor, TheDropTimes.

Categories: FLOSS Project Planets

Luke Plant: Enforcing conventions in Django projects with introspection

Planet Python - Mon, 2024-04-01 11:05

Naming conventions can make a big difference to the maintenance issues in software projects. This post is about how we can use the great introspection capabilities in Python to help enforce naming conventions in Django projects.

Contents

Let’s start with an example problem and the naming convention we’re going to use to solve it. There are many other applications of the techniques here, but it helps to have something concrete.

The problem: DateTime and DateTimeField confusion

Over several projects I’ve found that inconsistent or bad naming of DateField and DateTimeField fields can cause various problems.

First, poor naming means that you can confuse them for each other, and this can easily trip you up. In Python, datetime is a subclass of date, so if you use a field called created_date assuming it holds a date when it actually holds a datetime, it might be not obvious initially that you are mishandling the value, but you’ll often have subtle problems down the line.

Second, sometimes you have a field named like expired which is actually the timestamp of when the record expired, but it could easily be confused for a boolean field.

Third, not having a strong convention, or having multiple conventions, leads to unnecessary time wasted on decisions that could have been made once.

Finally, inconsistency in naming is just confusing and ugly for developers, and often for users further down the line, because names tend to leak.

Even if you do have an established convention, it’s possible for people not to know. It’s also very easy for people to change a field’s type between date and datetime without also changing the name. So merely having the convention is not enough, it needs to be enforced.

Note

If you want to change the name and type of a field (or any other atribute), and want the data to preserve data as much as possible, you usually need to do it in two stages or more depending on your needs, and always check the migrations created – otherwise Django’s migration framework will just see one field removed and a completely different one added, and generate migrations that will destroy your data.

For this specific example, the convention I quite like is:

  • field names should end with _at for timestamp fields that use DateTimeField, like expires_at or deleted_at.

  • field names should end with _on or _date for fields that use DateField, like issued_on or birth_date.

This is based on the English grammar rule that we use “on” for dates but “at” for times – “on the 25th March”, but “at 7:00 pm” – and conveniently it also needs very few letters and tends to read well in code. The _date suffix is also helpful in various contexts where _on seems very unnatural. You might want different conventions, of course.

To get our convention to be enforced with automated checks we need a few tools.

The tools Introspection

Introspection means the ability to use code to inspect code, and typically we’re talking about doing this when our code is already running, from within the same program and using the same programming language.

In Python, this starts from simple things like isintance() and type() to check the type of an object, to things like hasattr() to check for the presence of attributes and many other more advanced techniques, including the inspect module and many of the metaprogramming dunder methods.

Django app and model introspection

Django is just Python, so you can use all normal Python introspection techniques. In addition, there is a formally documented and supported set of functions and methods for introspecting Django apps and models, such as the apps module and the Model _meta API.

Django checks framework

The third main tool we’re going to use in this solution is Django’s system checks framework, which allows us to run certain kinds of checks, at both “warning” and “error” level. This is the least important tool, and we could in fact switch it out for something else like a unit test.

The solution

It’s easiest to present the code, and then discuss it:

from django.apps import apps from django.conf import settings from django.core.checks import Tags, Warning, register @register() def check_date_fields(app_configs, **kwargs): exceptions = [ # This field is provided by Django's AbstractBaseUser, we don't control it # and we’ll break things if we change it: "accounts.User.last_login", ] from django.db.models import DateField, DateTimeField errors = [] for field in get_first_party_fields(): field_name = field.name model = field.model if f"{model._meta.app_label}.{model.__name__}.{field_name}" in exceptions: continue # Order of checks here is important, because DateTimeField inherits from DateField if isinstance(field, DateTimeField): if not field_name.endswith("_at"): errors.append( Warning( f"{model.__name__}.{field_name} field expected to end with `_at`, " + "or be added to the exceptions in this check.", obj=field, id="conventions.E001", ) ) elif isinstance(field, DateField): if not (field_name.endswith("_date") or field_name.endswith("_on")): errors.append( Warning( f"{model.__name__}.{field_name} field expected to end with `_date` or `_on`, " + "or be added to the exceptions in this check.", obj=field, id="conventions.E002", ) ) return errors def get_first_party_fields(): for app_config in get_first_party_apps(): for model in app_config.get_models(): yield from model._meta.get_fields() def get_first_party_apps() -> list[AppConfig]: return [app_config for app_config in apps.get_app_configs() if is_first_party_app(app_config)] def is_first_party_app(app_config: AppConfig) -> bool: if app_config.module.__name__ in settings.FIRST_PARTY_APPS: return True app_config_class = app_config.__class__ if f"{app_config_class.__module__}.{app_config_class.__name__}" in settings.FIRST_PARTY_APPS: return True return False

We start here with some imports and registration, as documented in the “System checks” docs. You’ll need to place this code somewhere that will be loaded when your application is loaded.

Our checking function defines some allowed exceptions, because there are some things out of our control, or there might be other reasons. It also mentioned the exceptions mechanism in the warning message. You might want a different mechanism for exceptions here, but I think having some mechanism like this, and advertising its existence in the warnings, is often pretty important. Otherwise, you can end up with worse consequences when people just slavishly follow rules. Notice how in the exception list above I’ve given a comment detailing why the exception is there though – this helps to establish a precedent that exceptions should be justified, and the justification should be there in the code.

We then loop through all “first party” model fields, looking for DateTimeField and DateField instances. This is done using our get_first_party_fields() utility, which is defined in terms of get_first_party_apps(), which in turn depends on:

The id values passed to Warning here are examples – you should change according to your needs. You might also choose to use Error instead of Warning.

Output

When you run manage.py check, you’ll then get output like:

System check identified some issues: WARNINGS: myapp.MyModel.created: (conventions.E001) MyModel.created field expected to end with `_at`, or be added to the exceptions in this check. System check identified 1 issue (0 silenced).

As mentioned, you might instead want to run this kind of check as a unit test.

Conclusion

There are many variations on this technique that can be used to great effect in Django or other Python projects. Very often you will be able to play around with a REPL to do the introspection you need.

Where it is possible, I find doing this far more effective than attempting to document things and relying on people reading and remembering those docs. Every time I’m tripped up by bad names, or when good names or a strong convention could have helped me, I try to think about how I could push people towards a good convention automatically – while also giving a thought to unintended bad consequences of doing that prematurely or too forcefully.

Categories: FLOSS Project Planets

Marknote 1.1.0

Planet KDE - Mon, 2024-04-01 11:05

Marknote 1.1.0 is out! Marknote is the new WYSIWYG note-taking application from KDE. Despite the latest release being just a few days ago, we have been hard at work and added a few new features and, more importantly, fixed some bugs.

Marknote now boasts broader Markdown support, and can now display images and task lists in the editor. And once you are done editing your notes, you can export them to various formats, including PDF, HTML and ODT.

Export to PDF, HTML and ODT

Marknote’s interface now seamlessly integrates the colors assigned to your notebooks, enhancing its visual coherence and making it easier to distinguish one notebook from another. Additionally, your notebooks remember the last opened note, automatically reopening it upon selection.

Accent color in list delegate

We’ve also introduced a convenient command bar similar to the one in Merkuro. This provides quick access to essential actions within Marknote. Currently it only creates a new notebook and note, but we plan to make more actions available in the future. Finally we have reworked all the dialogs in Markdown to use the newly introduced FormCardDialog from KirigamiAddons.

Command bar

We have created a small feature roadmap with features we would like to add in the future. Contributions are welcome!

Packager section

You can find the package on download.kde.org and it has been signed with my GPG key.

Note that this release introduce a new recommanded dependencies: md4c and require the latest Kirigami Addons release (published a few hours ago).

Categories: FLOSS Project Planets

Kirigami Addons 1.1.0

Planet KDE - Mon, 2024-04-01 11:00

It’s again time for a new Kirigami Addons release. Kirigami Addons is a collection of helpful components for your QML and Kirigami applications.

FormCard

I added a new FormCard delegate: FormColorDelegate which allow to select a color and a new delegate container: FormCardDialog which is a new type of dialog.

FormCardDialog containing a FormColorDelegate in Marknote

Aside from these new components, Joshua fixed a newline bug in the AboutKDE component and I updated the code examples in the API documentation.

TableView

This new component is intended to provide a powerful table view on top of the barebone one provided by QtQuick and similar to the one we have in our QtWidgets application.

This was contributed by Evgeny Chesnokov. Thanks!

TableView with resizable and sortable columns

Other components

The default size of MessageDialog was decreased and is now more appropriate.

MessageDialog new default size

James Graham fixed the autoplay of the video delegate for the maximized album component.

Packager section

You can find the package on download.kde.org and it has been signed with my GPG key.

Categories: FLOSS Project Planets

The interpersonal side of the xz-utils compromise

Planet KDE - Mon, 2024-04-01 10:54

While everyone is busy analyzing the highly complex technical details of the recently discovered xz-utils compromise that is currently rocking the internet, it is worth looking at the underlying non-technical problems that make such a compromise possible. A very good write-up can be found on the blog of Rob Mensching...

"A Microcosm of the interactions in Open Source projects"

Categories: FLOSS Project Planets

Ben Hutchings: FOSS activity in March 2024

Planet Debian - Mon, 2024-04-01 10:51
Categories: FLOSS Project Planets

Ben Hutchings: FOSS activity in March 2024

Planet Debian - Mon, 2024-04-01 10:51
Categories: FLOSS Project Planets

Drupal Association blog: Unveiling the Power of Drupal: Your Ultimate Choice for Web Development

Planet Drupal - Mon, 2024-04-01 09:54

Welcome to DrupalCon Portland 2024, where innovation, collaboration, and excellence converge! As the premier event for Drupal enthusiasts, developers, and businesses, it's the perfect occasion to explore why Drupal stands tall as the preferred choice for web development. In this article, we'll delve into the compelling reasons that make Drupal the ultimate solution for your web development needs.

Open Source Excellence

Drupal is renowned for being an open-source content management system (CMS), fostering a vibrant community of developers and contributors. The power of collaboration within the Drupal community results in continuous improvements, security updates, and a wealth of modules that cater to a wide range of functionalities. Choosing Drupal means embracing a platform that is constantly evolving and adapting to the ever-changing landscape of the digital world.

Flexibility and Scalability

Drupal's flexibility is one of its key strengths. Whether you're building a personal blog, a corporate website, or a complex e-commerce platform, Drupal adapts to your needs. Its modular architecture allows developers to create custom functionalities and integrate third-party tools seamlessly. As your business grows, Drupal scales with you, ensuring that your website remains robust, high-performing, and capable of handling increased traffic and data.

Exceptional Content Management

Content is at the heart of any successful website, and Drupal excels in providing an intuitive and powerful content management experience. The platform offers a sophisticated taxonomy system, making it easy to organize and categorize content. With a user-friendly interface, content creators can effortlessly publish, edit, and manage content, empowering organizations to maintain a dynamic and engaging online presence.

Security First

In the digital age, security is non-negotiable. Drupal takes a proactive approach to security, with a dedicated security team that monitors, identifies, and addresses vulnerabilities promptly. The platform's robust security features, frequent updates, and a vigilant community ensure that your website is well-protected against potential threats. By choosing Drupal, you're investing in a platform that prioritizes the security of your digital assets.

Mobile Responsiveness

With the increasing prevalence of mobile devices, it's crucial for websites to be responsive and accessible across various screen sizes. Drupal is designed with mobile responsiveness in mind, offering a seamless experience for users on smartphones, tablets, and other devices. This ensures that your website not only looks great but also performs optimally, regardless of the device your audience is using.

Community Support and Knowledge Sharing

Drupal's strength lies not only in its codebase but also in its vast and supportive community. DrupalCon is a testament to the spirit of collaboration and knowledge sharing within the community. Whether you're a seasoned developer or a newcomer, Drupal's community is there to offer support, guidance, and a wealth of resources to help you succeed. By choosing Drupal, you're not just adopting a technology but becoming part of a global network of passionate individuals.

As we gather at DrupalCon Portland 2024, the choice is clear – Drupal is the unparalleled solution for web development. Its open-source nature, flexibility, security features, exceptional content management capabilities, mobile responsiveness, and thriving community make it the go-to platform for building robust and scalable websites. Join the Drupal revolution and unlock the full potential of your digital presence!

Register now for DrupalCon Portland 2024!

Categories: FLOSS Project Planets

KDE - Kiosco De Empanadas

Planet KDE - Mon, 2024-04-01 09:47

You like tasy! At KDE we got tasty!

Categories: FLOSS Project Planets

DrupalEasy: DrupalEasy Podcast - A very special episode

Planet Drupal - Mon, 2024-04-01 09:38

A very special episode of the DrupalEasy Podcast - an episode two years in the making.

Categories: FLOSS Project Planets

Colin Watson: Free software activity in March 2024

Planet Debian - Mon, 2024-04-01 09:10

My Debian contributions this month were all sponsored by Freexian.

Categories: FLOSS Project Planets

LN Webworks: 7 Reasons Why Drupal is the Perfect Platform for Your Real Estate Website

Planet Drupal - Mon, 2024-04-01 07:06

The website of your business, be it any, is a complete brand narration in itself. These days, whenever you hear a brand’s name - you always either check out their social handles and then move on to their website. This holds true when we are talking about e-commerce business, real estate, and so on. 

Speaking of real estate specifically, creating a website that works in your favor and not for the sake of it can make or break your business. But, out of so many CMS options in the market, which one should you pick? The answer is simple - the one that suits your business and all its specific needs. And, what’s better than Drupal? Well, to simplify it, let’s have a look at some of the big top

Categories: FLOSS Project Planets

Simon Josefsson: Towards reproducible minimal source code tarballs? On *-src.tar.gz

GNU Planet! - Mon, 2024-04-01 06:28

While the work to analyze the xz backdoor is in progress, several ideas have been suggested to improve the entire software supply chain ecosystem. Some of those ideas are good, some of the ideas are at best irrelevant and harmless, and some suggestions are plain bad. I’d like to attempt to formalize one idea (remains to be see in which category it belongs), which have been discussed before, but the context in which the idea can be appreciated have not been as clear as it is today.

  1. Reproducible source tarballs. The idea is that published source tarballs should be possible to reproduce independently somehow, and that this should be continuously tested and verified — preferrably as part of the upstream project continuous integration system (e.g., GitHub action or GitLab pipeline). While nominally this looks easy to achieve, there are some complex matters in this, for example: what timestamps to use for files in the tarball? I’ve brought up this aspect before.
  2. Minimal source tarballs without generated vendor files. Most GNU Autoconf/Automake-based tarballs pre-generated files which are important for bootstrapping on exotic systems that does not have the required dependencies. For the bootstrapping story to succeed, this approach is important to support. However it has become clear that this practice raise significant costs and risks. Most modern GNU/Linux distributions have all the required dependencies and actually prefers to re-build everything from source code. These pre-generated extra files introduce uncertainty to that process.

My strawman proposal to improve things is to define new tarball format *-src.tar.gz with at least the following properties:

  1. The tarball should allow users to build the project, which is the entire purpose of all this. This means that at least all source code for the project has to be included.
  2. The tarballs should be signed, for example with PGP or minisign.
  3. The tarball should be possible to reproduce bit-by-bit by a third party using upstream’s version controlled sources and a pointer to which revision was used (e.g., git tag or git commit).
  4. The tarball should not require an Internet connection to download things.
    • Corollary: every external dependency either has to be explicitly documented as such (e.g., gcc and GnuTLS), or included in the tarball.
    • Observation: This means including all *.po gettext translations which are normally downloaded when building from version controlled sources.
  5. The tarball should contain everything required to build the project from source using as much externally released versioned tooling as possible. This is the “minimal” property lacking today.
    • Corollary: This means including a vendored copy of OpenSSL or libz is not acceptable: link to them as external projects.
    • Open question: How about non-released external tooling such as gnulib or autoconf archive macros? This is a bit more delicate: most distributions either just package one current version of gnulib or autoconf archive, not previous versions. While this could change, and distributions could package the gnulib git repository (up to some current version) and the autoconf archive git repository — and packages were set up to extract the version they need (gnulib’s ./bootstrap already supports this via the –gnulib-refdir parameter), this is not normally in place.
    • Suggested Corollary: The tarball should contain content from git submodule’s such as gnulib and the necessary Autoconf archive M4 macros required by the project.
  6. Similar to how the GNU project specify the ./configure interface we need a documented interface for how to bootstrap the project. I suggest to use the already well established idiom of running ./bootstrap to set up the package to later be able to be built via ./configure. Of course, some projects are not using the autotool ./configure interface and will not follow this aspect either, but like most build systems that compete with autotools have instructions on how to build the project, they should document similar interfaces for bootstrapping the source tarball to allow building.

If tarballs that achieve the above goals were available from popular upstream projects, distributions could more easily use them instead of current tarballs that include pre-generated content. The advantage would be that the build process is not tainted by “unnecessary” files. We need to develop tools for maintainers to create these tarballs, similar to make dist that generate today’s foo-1.2.3.tar.gz files.

I think one common argument against this approach will be: Why bother with all that, and just use git-archive outputs? Or avoid the entire tarball approach and move directly towards version controlled check outs and referring to upstream releases as git URL and commit tag or id. My counter-argument is that this optimize for packagers’ benefits at the cost of upstream maintainers: most upstream maintainers do not want to store gettext *.po translations in their source code repository. A compromise between the needs of maintainers and packagers is useful, so this *-src.tar.gz tarball approach is the indirection we need to solve that.

What do you think?

Categories: FLOSS Project Planets

Simon Josefsson: Towards reproducible minimal source code tarballs? On *-src.tar.gz

Planet Debian - Mon, 2024-04-01 06:28

While the work to analyze the xz backdoor is in progress, several ideas have been suggested to improve the entire software supply chain ecosystem. Some of those ideas are good, some of the ideas are at best irrelevant and harmless, and some suggestions are plain bad. I’d like to attempt to formalize one idea (remains to be see in which category it belongs), which have been discussed before, but the context in which the idea can be appreciated have not been as clear as it is today.

  1. Reproducible source tarballs. The idea is that published source tarballs should be possible to reproduce independently somehow, and that this should be continuously tested and verified — preferrably as part of the upstream project continuous integration system (e.g., GitHub action or GitLab pipeline). While nominally this looks easy to achieve, there are some complex matters in this, for example: what timestamps to use for files in the tarball? I’ve brought up this aspect before.
  2. Minimal source tarballs without generated vendor files. Most GNU Autoconf/Automake-based tarballs pre-generated files which are important for bootstrapping on exotic systems that does not have the required dependencies. For the bootstrapping story to succeed, this approach is important to support. However it has become clear that this practice raise significant costs and risks. Most modern GNU/Linux distributions have all the required dependencies and actually prefers to re-build everything from source code. These pre-generated extra files introduce uncertainty to that process.

My strawman proposal to improve things is to define new tarball format *-src.tar.gz with at least the following properties:

  1. The tarball should allow users to build the project, which is the entire purpose of all this. This means that at least all source code for the project has to be included.
  2. The tarballs should be signed, for example with PGP or minisign.
  3. The tarball should be possible to reproduce bit-by-bit by a third party using upstream’s version controlled sources and a pointer to which revision was used (e.g., git tag or git commit).
  4. The tarball should not require an Internet connection to download things.
    • Corollary: every external dependency either has to be explicitly documented as such (e.g., gcc and GnuTLS), or included in the tarball.
    • Observation: This means including all *.po gettext translations which are normally downloaded when building from version controlled sources.
  5. The tarball should contain everything required to build the project from source using as much externally released versioned tooling as possible. This is the “minimal” property lacking today.
    • Corollary: This means including a vendored copy of OpenSSL or libz is not acceptable: link to them as external projects.
    • Open question: How about non-released external tooling such as gnulib or autoconf archive macros? This is a bit more delicate: most distributions either just package one current version of gnulib or autoconf archive, not previous versions. While this could change, and distributions could package the gnulib git repository (up to some current version) and the autoconf archive git repository — and packages were set up to extract the version they need (gnulib’s ./bootstrap already supports this via the –gnulib-refdir parameter), this is not normally in place.
    • Suggested Corollary: The tarball should contain content from git submodule’s such as gnulib and the necessary Autoconf archive M4 macros required by the project.
  6. Similar to how the GNU project specify the ./configure interface we need a documented interface for how to bootstrap the project. I suggest to use the already well established idiom of running ./bootstrap to set up the package to later be able to be built via ./configure. Of course, some projects are not using the autotool ./configure interface and will not follow this aspect either, but like most build systems that compete with autotools have instructions on how to build the project, they should document similar interfaces for bootstrapping the source tarball to allow building.

If tarballs that achieve the above goals were available from popular upstream projects, distributions could more easily use them instead of current tarballs that include pre-generated content. The advantage would be that the build process is not tainted by “unnecessary” files. We need to develop tools for maintainers to create these tarballs, similar to make dist that generate today’s foo-1.2.3.tar.gz files.

I think one common argument against this approach will be: Why bother with all that, and just use git-archive outputs? Or avoid the entire tarball approach and move directly towards version controlled check outs and referring to upstream releases as git URL and commit tag or id. My counter-argument is that this optimize for packagers’ benefits at the cost of upstream maintainers: most upstream maintainers do not want to store gettext *.po translations in their source code repository. A compromise between the needs of maintainers and packagers is useful, so this *-src.tar.gz tarball approach is the indirection we need to solve that.

What do you think?

Categories: FLOSS Project Planets

Pages