FLOSS Project Planets

FSF News: Free Software Awards winners announced: Eli Zaretskii, Tad (SkewedZeppelin), GNU Jami

GNU Planet! - Sat, 2023-03-18 21:05
BOSTON, Massachusetts, USA -- Saturday, March 18, 2023 -- The Free Software Foundation (FSF) today announced the recipients of the 2022 Free Software Awards, which are given annually at the FSF's LibrePlanet conference to groups and individuals in the free software community who have made significant contributions to the cause for software freedom. This year's recipients of the awards are Eli Zaretskii, Tad (SkewedZeppelin), and GNU Jami. As LibrePlanet 2023 is a hybrid in-person and online conference this year, the ceremony was conducted both in person and virtually.
Categories: FLOSS Project Planets

Michael Ablassmeier: small standalone sshds in go

Planet Debian - Sat, 2023-03-18 20:00

Been looking into some existant sshd implementations in go. Most of the projects on github seem to use the standard x/crypto/ssh lib.

During testing, i just wanted to see which banner these kind of ssh servers provide, using the simple command:

nc localhost <port>

And noticed that at least some of these “sshds” did not accept any further connection. Simple DoS via netcat, nice.

Until this day, the Golang documentation is missing some crucial hint that the function handling the connection should be called as goroutine, otherwise it simply blocks any further incoming connections.

Created some pull requests on the most starred projects i found, seems even experienced golang devs missed this part.

Categories: FLOSS Project Planets

Jonathan Dowland: Qi charger stand

Planet Debian - Sat, 2023-03-18 18:02

I've got a Qi-charging phone cradle at home which orients the phone up at an angle which works with Apple's Face ID. At work, I've got a simpler "puck"-shaped one which is less convenient, so I designed a basic cradle to raise both the charger and the phone up.

I did two iterations, and the second iteration was "good enough" to use that I stopped there, although I would make some further alterations if I was to print it again: more of a cut-out for the USB-C cable, raise the plinth for the Qi charger so that USB-C cables with long collars have enough room, elongate the base to compensate for the changed weight distribution.

Categories: FLOSS Project Planets

Glyph Lefkowitz: Building And Distributing A macOS Application Written in Python

Planet Python - Sat, 2023-03-18 17:45
Why Bother With All This?

In other words: if you want to run on an Apple platform, why not just write everything in an Apple programming language, like Swift? If you need to ship to multiple platforms, you might have to rewrite it all anyway, so why not give up?

Despite the significant investment that platform vendors make in their tools, I fundamentally believe that the core logic in any software application ought to be where its most important value lies. For small, independent developers, having portable logic that can be faithfully replicated on every platform without massive rework might be tricky to get started with, but if you can’t do it, it may not be cost effective to support multiple platforms at all.

So, it makes sense for me to write my applications in Python to achieve this sort of portability, even though on each platform it’s going to be a little bit more of a hassle to get it all built and shipped since the default tools don’t account for the use of Python.

But how much more is “a little bit” more of a hassle? I’ve been slowly learning about the pipeline to ship independently-distributed1 macOS applications for the last few years, and I’ve encountered a ton of annoying roadblocks.

Didn’t You Do This Already? What’s New?

So nice of you to remember. Thanks for asking. While I’ve gotten this to mostly work in the past, some things have changed since then:

  • the notarization toolchain has been updated (altool is now notarytool),
  • I’ve had to ship libraries other than just PyGame,
  • Apple Silicon launched, necessitating another dimension of build complexity to account for multiple architectures,
  • Perhaps most significantly, I have written a tool that attempts to encode as much of this knowledge as possible, Encrust, which I have put on PyPI and GitHub. If this is of interest to you, I would encourage you to file bugs on it, and hopefully add in more corner cases which I have missed.

I’ve also recently shipped my first build of an end-user application that successfully launches on both Apple Silicon and Intel macs, so here is a brief summary of the hoops I needed to jump through, from the beginning, in order to make everything work.

Wait did you say you wrote a tool? Is this fixed, then?

Encrust is, I hope, a temporary stopgap on the way to a much better comprehensive solution.

Specifically, I believe that Briefcase is a much more holistic solution to the general problem being described here, but it doesn’t suit my very specific needs right now4, and it doesn’t address a couple of minor points that I was running into here.

It is mostly glue that is shelling out to other tools that already solve portions of the problem, even when better APIs exist. It addresses three very specific layers of complexity:

  1. It enforces architecture independence, so that your app built on an M1 machine will still actually run on about half of the macs remaining out there2.
  2. It remembers tricky nuances of the notarization submission process, such as the highly specific way I need to generate my zip files to avoid mysterious notarization rejections3.
  3. Providing a common and central way to store the configuration for these things across repositories so I don’t need to repeat this process and copy/paste a shell script every time I make a tiny new application.

It only works on Apple Silicon macs, because I didn’t bother to figure out how pip actually determines which architecture to download wheels for.

As such, unfortunately, Encrust is mostly a place for other people who have already solved this problem to collaborate to centralize this sort of knowledge and share ideas about where this code should ultimately go, rather than a tool for users trying to get started with shipping an app.

Open Offer

That said:

  1. I want to help Python developers ship their Python apps to users who are not also Python developers.
  2. macOS is an even trickier platform to do that on than most.
  3. It’s now easy for me to sign, notarize, and release new applications reliably

Therefore:

If you have an open source Python application that runs on macOS5 but can’t ship to macOS — either because:

  1. you’ve gotten stuck on one of the roadblocks that this post describes,
  2. you don’t have $100 to give to Apple, or because
  3. the app is using a cross-platform toolkit that should work just fine and you don’t have access to a mac at all, then

Send me an email and I’ll sign and post your releases.

What’s this post about, then?

People still frequently complain that “Python packaging” is really bad. And I’m on record that packaging Python (in the sense of “code”) for Python (in the sense of “deployment platform”) is actually kind of fine right now; if what you’re trying to get to is a package that can be pip installed, you can have a reasonably good experience modulo a few small onboarding hiccups that are well-understood in the community and fairly easy to overcome.

However, it’s still unfortunately hard to get Python code into the hands of users who are not also Python programmers with their own development environments.

My goal here is to document the difficulties themselves to try to provide a snapshot of what happens if you try to get started from scratch today. I think it is useful to record all the snags and inscrutable error messages that you will hit in a row, so we can see what the experience really feels like.

I hope that everyone will find it entertaining.

  • Other Mac python programmers might find pieces of trivia useful, and
  • Linux users will have fun making fun of the hoops we have to jump through on Apple platforms,

but the main audience is the maintainers of tools like Briefcase and py2app to evaluate the new-user experience holistically, and to see how much the use of their tools feels like this. This necessarily includes the parts of the process that are not actually packaging.

This is why I’m starting from the beginning again, and going through all the stuff that I’ve discussed in previous posts again, to present the whole experience.

Here Goes

So, with no further ado, here is a non-exhaustive list of frustrations that I have encountered in this process:

  • Okay. Time to get started. How do I display a GUI at all? Nothing happens when I call some nominally GUI API. Oops: I need my app to exist in an app bundle, which means I need to have a framework build. Time to throw those partially-broken pyenv pythons in the trash; best to use the official python.org from here on out.
  • Bonus Frustration since I’m using AppKit directly: why is my app segfaulting all the time? Oh, target is a weak reference in objective C, so if I make a window and put a button in it that points at a Python object, the Python interpreter deallocates it immediately because only the window (which is “nothing” as it’s a weakref) is referring to it. I need to start stuffing every Python object that talks to a UI element like a window or a button into a global list, or manually calling .retain() on all of them and hoping I don’t leak memory.
  • Everything seems to be using the default Python Launcher icon, and the app menu says “Python”. That wouldn’t look too good to end users. I should probably have my own app.
  • I’ll skip the part here where the author of a new application might have to investigate py2app, briefcase, pyoxidizer, and pyinstaller separately and try to figure out which one works the best right now. As I said above, I started with py2app and I’m stubborn to a fault, so that is the one I’m going to make work.
  • Now I need to set up py2app. Oops, I can’t use pyproject.toml any more, time to go back to setup.py.
  • Now I built it and the the app is crashing on startup when I click on it. I can’t see a traceback anywhere, so I guess I need to do something in the console.
    • Wow; the console is an unusable flood of useless garbage. Forget that.
    • I guess I need to run it in the terminal somehow. After some googling I figure out it’s ./dist/MyApp.app/Contents/Resources/MacOS/MyApp. Aha, okay, I can see the traceback now, and it’s … an import error?
    • Ugh, py2app isn’t actually including all of my code, it’s using some magic to figure out which modules are actually used, but it’s doing it by traversing import statements, which means I need to put a bunch of fake static import statements for everything that is used indirectly at the top of my app’s main script so that it gets found by the build. I experimentally discover a half a dozen things that are dynamically imported inside libraries that I use and jam them all in there.
  • Okay. Now at least it starts up. The blank app icon is uninspiring, though, time to actually get my own icon in there. Cool, I’ll make an icon in my favorite image editor, and save it as... icons must be PNGs, right? Uhh... no, looks like they have to be .icns files. But luckily I can convert the PNG I saved with a simple 12-line shell script that invokes sips and iconutil6.

At this point I have an app bundle which kinda works. But in order to run on anyone else’s computer, I have to code-sign it.

  • In order to code-sign anything, I have to have an account with Apple that costs $99 per year, on developer.apple.com.
  • The easiest way to get these certificates is to log in to Xcode itself. There’s a web portal too but using it appears to involve a lot more manual management of key material, so, no thanks. This requires the full-fat Xcode.app though, not just the command-line tools that come down when I run xcode-select --install, so, time to wait for an 11GB download.
  • Oops, I made the wrong certificate type. Apparently the only right answer here is a “Developer ID Application” certificate.
  • Now that I’ve logged in to Xcode to get the certificate, I need to figure out how to tell my command-line tools about it (for starters, “codesign”). Looks like I need to run security find-identity -v -p codesigning.
  • Time to sign the application’s code.
    • The codesign tool has a --deep option which can sign the whole bundle. Great!
    • Except, that doesn’t work, because Python ships shared libraries in locations that macOS doesn’t automatically expect, so I have to manually locate those files and sign them, invoking codesign once for each.
    • Also, --deep is deprecated. There’s no replacement.
    • Logically, it seems like I still need --deep, because it does some poorly-explained stuff with non-code resource files that maybe doesn’t happen properly if I don’t? Oh well. Let's drop the option and hope for the best.8
    • With a few heuristics I think we can find all the relevant files with a little script7.

Now my app bundle is signed! Hooray. 12 years ago, I’d be all set. But today I need some additional steps.

  • After I sign my app, Apple needs to sign my app (to indicate they’ve checked it for malware), which is called “notarization”.
    • In order to be eligible for notarization, I can’t just code-sign my app. I have to code-sign it with entitlements.
    • Also I can’t just code sign it with entitlements, I have to sign it with the hardened runtime, or it fails notarization.
    • Oops, out of the box, the hardened runtime is incompatible with a bunch of stuff in Python, including cffi and ctypes, because nobody has implemented support for MAP_JIT yet, so it crashes at startup. After some thrashing around I discover that I need a legacy “allow unsigned executable memory” entitlement. I can’t avoid importing this because a bunch of things in py2app’s bootstrapping code import things that use ctypes, and probably a bunch of packages which I’m definitely going to need, like cryptography require cffi directly anyway.
    • In order to set up notarization external to Xcode, I need to create an App Password which is set up at appleid.apple.com, not the developer portal.
    • Bonus Frustration since I’ve been doing this for a few years: Originally this used to be even more annoying as I needed to wait for an email (with altool), and so I couldn’t script it directly. Now, at least, the new notarytool (which will shortly be mandatory) has a --wait flag.
    • Although the tool is documented under man notarytool, I actually have to run it as xcrun notarytool, even though codesign can be run either directly or via xcrun codesign.
    • Great, we’re ready to zip up our app and submit to Apple. Wait, they’re rejecting it? Why???
    • Aah, I need to manually copy and paste the UUID in the console output of xcrun notarytool submit into xcrun notarytool log to get some JSON that has some error messages embedded in it.
    • Oh. The bundle contains internal symlinks, so when I zipped it without the -y option, I got a corrupt archive.
    • Great, resubmitted with zip -y.
    • Oops, just kidding, that only works sometimes. Later, a different submission with a different hash will fail, and I’ll learn that the correct command line is actually ditto -c -k --sequesterRsrc --keepParent MyApp.app MyApp.app.zip.
      • Note that, for extra entertainment value, the position of the archive itself and directory are reversed on the command line from zip (and tar, and every other archive tool).
    • notarytool doesn’t record anything in my app though; it puts the “notarization ticket” on Apple's servers. Apparently, I still need to run stapler for users to be able to launch it while those servers are inaccessible, like, for example, if they’re offline.
    • Oops, not stapler. xcrun stapler. Whatever.
    • Except notarytool operates on a zip archive, but stapler operates on an app bundle. So we have to save the original app bundle, run stapler on it, then re-archive the whole thing into a new archive.

Hooray! Time to release my great app!

  • Whoops, just got a bug report that it crashes immediately on every Intel mac. What’s going on?
  • Turns out I’m using a library whose authors distribute both aarch64 and x86_64 wheels; pip will prefer single-architecture wheels even if universal2 wheels are also available, so I’ve got to somehow get fat binaries put together. Am I going to have to build a huge pile of C code by myself? I thought all these Python hassles would at least let me avoid the C hassles!
  • Whew, okay, no need for that: there’s an amazing Swiss-army knife for macOS binary wheels, called delocate that includes a delocate-fuse tool that can fuse two wheels together. So I just need to figure out which binaries are the wrong architecture and somehow install my fixed/fused wheels before building my app with py2app.

    • except, oops, this tool just rewrites the file in-place without even changing its name, so I have to write some janky shell scripts to do the reinstallation. Ugh.
  • OK now that all that is in place, I just need to re-do all the steps:

    • universal2-ize my virtualenv!
    • build!
    • sign!
    • archive!
    • notarize!
    • wait!!!
    • staple!
    • re-archive!
    • upload!

And we have an application bundle we can ship to users.

It’s just that easy.

As long as I don’t need sandboxing or Mac App Store distribution, of course. That’s a challenge for another day.

So, that was terrible. But what should be happening here?

Some of this is impossible to simplify beyond a certain point - many of the things above are not really about Python, but are about distribution requirements for macOS specifically, and we in the Python community can’t affect operating system vendors’ tooling.

What we can do is build tools that produce clear guidance on what step is required next, handle edge cases on their own, and generally guide users through these complex processes without requiring them to hit weird binary-format or cryptographic-signing errors on their own with no explanation of what to do next.

I do not think that documentation is the answer here. The necessary steps should be discoverable. If you need to go to a website, the tool should use the webbrowser module to open a website. If you need to launch an app, the tool should launch that app.

With Encrust, I am hoping to generalize the solutions that I found while working on this for this one specific slice of the app distribution pipeline — i.e. a macOS desktop application desktop, as distributed independently and not through the mac app store — but other platforms will need the same treatment.

However, even without really changing py2app or any of the existing tooling, we could imagine a tool that would interactively prompt the user for each manual step, automate as much of it as possible, verify that it was performed correctly, and give comprehensible error messages if it was not.

For a lot of users, this full code-signing journey may not be necessary; if you just want to run your code on one or two friends’ computers, telling them to right click, go to ‘open’ and enter their password is not too bad. But it may not even be clear to them what the trade-off is, exactly; it looks like the app is just broken when you download it. The app build pipeline should tell you what the limitations are.

Other parts of this just need bug-fixes to address. py2app specifically, for example, could have a better self-test for its module-collecting behavior, launching an app to make sure it didn’t leave anything out.

Interactive prompts to set up a Homebrew tap, or a Flatpak build, or a Microsoft Store Metro app, might be similarly useful. These all have outside-of-Python required manual steps, and all of them are also amenable to at least partial automation.

Thanks to my patrons on Patreon for supporting this sort of work, including development of Encrust, of Pomodouroboros, of posts like this one and of that offer to sign other people’s apps. If you think this sort of stuff is worthwhile, you might want to consider supporting me over there as well.

  1. I am not even going to try to describe building a sandboxed, app-store ready application yet. 

  2. At least according to the Steam Hardware Survey, which as of this writing in March of 2023 pegs the current user-base at 54% apple silicon and 46% Intel. The last version I can convince the Internet Archive to give me, from December of 2022, has it closer to 51%/49%, which suggests a transition rate of 1% per month. I suspect that this is pretty generous to Apple Silicon as Steam users would tend to be earlier adopters and more sensitive to performance, but mostly I just don’t have any other source of data. 

  3. It is truly remarkable how bad the error reporting from the notarization service is. There are dozens of articles and forum posts around the web like this one where someone independently discovers this failure mode after successfully notarizing a dozen or so binaries and then suddenly being unable to do so any more because one of the bytes in the signature is suddenly not valid UTF-8 or something. 

  4. A lot of this is probably historical baggage; I started with py2app in 2008 or so, and I have been working on these apps in fits and starts for… ugh… 15 years. At some point when things are humming along and there are actual users, a more comprehensive retrofit of the build process might make sense but right now I just want to stop thinking about this

  5. If your application isn’t open source, or if it requires some porting work, I’m also available for light contract work, but it might take a while to get on my schedule. Feel free to reach out as well, but I am not looking to spend a lot of time doing porting work. 

  6. I find this particular detail interesting; it speaks to the complexity and depth of this problem space that this has been a known issue for several years in Briefcase, but there’s just so much other stuff to handle in the release pipeline that it remains open. 

  7. I forgot both .a files and the py2app-included python executable itself here, and had to discover that gap when I signed a different app where that made a difference. 

  8. Thus far, it seems to be working. 

Categories: FLOSS Project Planets

FSF Latin America: Linux-libre turns 15!

GNU Planet! - Sat, 2023-03-18 14:13
Linux-libre turns 15!

It was February 2008 when Jeff Moe announced Linux-libre, a project to share the efforts that freedom-respecting distros had to undertake to drop the nonfree bits distributed as part of the kernel Linux.
https://web.archive.org/web/1/lists.autistici.org/message/20080221.002845.467ba592.en.html

"For fifteen years, the Linux-libre project has remained dedicated to providing a kernel that respects everyone's freedom and has become an essential part of the free software movement. Linux-libre is widely used by those who value their freedom to use, study, change, and share software without restrictions or limitations. These freedoms are essential to creating a just society."
-- Jason Self

Since around 1996, Linux has carried sourceless firmware encoded as sequences of numbers disguised as source code. UTUTO and gNewSense pioneered the efforts of removing them. Cleaning Linux up is a substantial amount of work, so the existence of Linux-libre has alleviated one of the main difficulties in maintaining GNU+Linux distros that abide by the GNU Free Software Distribution Guidelines. The Linux-libre compiled kernel distributions maintained by Jason Self, Freesh (.deb), liberRTy (low-latency .deb) and RPMFreedom (.rpm), make it easy for users of other GNU+Linux distros to take a step towards freedom when their hardware is not too user-hostile.

"Thanks to Linux-libre, we have entirely libre GNU+Linux distros. Thanks to Linux-libre, people like me who are not kernel hackers can install one of those distros and have a computer which never runs a nonfree program on the CPU. (Provided we use LibreJS as well to reject nonfree Javascript programs that web sites send us.)"
-- Richard Stallman

Early pieces of firmware in Linux ran peripheral devices, but some of the blobs loaded by Linux nowadays reconfigure the primary central processing units and others contain an entire operating system for the peripherals' CPUs, including a copy of the kernel Linux itself and several other freedom-depriving programs!

After years of our denouncing the social, technical, and legal risks out of Linux's misbehavior, most of the blobs got moved to separate files, still part of the kernel Linux, and then to separate packages, which mitigates some of the legal risks, but the problem keeps growing: more and more devices depend on nonfree firmware and thus remain under exclusive and proprietary control by their suppliers.

Challenge

For 27 years, the nonfree versions of Linux have shown that tolerating blobs and making it easy for users to install and accept them makes users increasingly dependent on user-hostile, blob-requiring devices for their computing. Refusing to give these devices' suppliers what they wish, namely your money and control over your computing, is more likely to succeed at changing their practices if more users refuse.

If you're the kind of software freedom supporter who demands respect for your freedom, keep on enjoying the instant gratification that GNU Linux-libre affords you, and supporting (or being!) those who refurbish old computers and build new ones to respect our autonomy.

However, if you're of the kind for whom last-generation computers are hard to resist, even though you'd prefer if they were more respectful of your freedom, you may wish to consider a delayed gratification challenge: if you and your friends resist hostile computers now, you may get more respectful ones later, for yourselves and for all of us; if you don't, the next generations will likely be even more hostile. Are you up for the challenge?
https://en.wikipedia.org/wiki/Delayed_gratification

Present and Future

GNU Linux-libre releases are currently prepared with scripts that automate the cleaning-up and part of the verification. For each upstream major and stable release, we run the scripts, updating them as needed, and publish them, along with the cleaning-up logs and the cleaned-up sources, in a git repository. Each source release is an independent tag, as in, there are no branches for cleaned-up sources. This is so we can quickly retract releases if freedom bugs are found.

We have plans to change the cleaning-up process and the repository structure in the future: we're (slowly) preparing to move to a rewritten git repository, in which, for each commit in upstream Linux main and stable repositories, there will be a corresponding cleaned-up commit in ours. Undesirable bits are going to be cleaned up at the commit corresponding to the one in which upstream introduced or modified them, and other modifications will be checked and integrated unchanged, mirroring the upstream commit graph, with "git replace" mappings for individual commits and, perhaps, also for cleaned-up files.

This is expected to enable us to track upstream development very closely, to get stable and major releases out nearly instantly and often automatically and to enable Linux developers to clone our freed repository instead of our upstream to write and test their changes. The same techniques used to create the cleaned-up repository can be used to fix freedom bugs in it.

Artwork

Jason Self has made several beautiful pictures of his version of Freedo, our light-blue penguin mascot, and we've used them for our recent releases.

Marking the beginning of the week in which we celebrate 15 years of Linux-libre, we had the pleasure of publishing a major release, 6.2-gnu, codenamed "la quinceañera", with a picture of Freedo dressed up for the occasion.
https://www.fsfla.org/pipermail/linux-libre/2023-February/003502.html

But there's more! He also made a commemorative black-and-white wallpaper with classic Freedo, also dressed up for the occasion. Check them out, and feel free to tune the colors to your liking!
https://linux-libre.fsfla.org/#news

He also modeled a 3D Freedo in Blender, and we're looking for someone who could 3D-print it and get it to the FSF office in time for the LibrePlanet conference. Rumor has it that Richard Stallman is going to auction it off to raise funds for the FSF! Can you help?
https://libreplanet.org/2023/

About GNU Linux-libre

GNU Linux-libre is a GNU package maintained by Alexandre Oliva, on behalf of FSFLA, and by Jason Self. It releases cleaned-up versions of Linux, suitable for use in distributions that comply with the Free Software Distribution Guidelines published by the GNU project, and by users who wish to run Free versions of Linux on their GNU systems. The project offers cleaning-up scripts, Free sources, binaries for some GNU+Linux distributions, and artwork with GNU and the Linux-libre mascot: Freedo, the clean, Free and user-friendly light-blue penguin. Visit our web site and Be Free!
https://linux-libre.fsfla.org/
https://www.gnu.org/distros/

About the GNU Operating System and Linux

Richard Stallman announced in September 1983 the plan to develop a Free Software Unix-like operating system called GNU. GNU is the only operating system developed specifically for the sake of users' freedom.
https://www.gnu.org/
https://www.gnu.org/gnu/the-gnu-project.html

In 1992, the essential components of GNU were complete, except for one, the kernel. When in 1992 the kernel Linux was re-released under the GNU GPL, making it Free Software, the combination of GNU and Linux formed a complete Free operating system, which made it possible for the first time to run a PC without non-Free Software. This combination is the GNU+Linux system.
https://www.gnu.org/gnu/gnu-linux-faq.html

About FSFLA

Free Software Foundation Latin America joined in 2005 the international FSF network, previously formed by Free Software Foundations in the United States, in Europe and in India. These sister organizations work in their corresponding geographies towards promoting the same Free Software ideals and defending the same freedoms for software users and developers, working locally but cooperating globally.
https://www.fsfla.org/

Copyright 2023 FSFLA

Permission is granted to make and distribute verbatim copies of this entire document without royalty, provided the copyright notice, the document's official URL, and this permission notice are preserved.

Permission is also granted to make and distribute verbatim copies of individual sections of this document worldwide without royalty provided the copyright notice and the permission notice above are preserved, and the document's official URL is preserved or replaced by the individual section's official URL.

https://www.fsfla.org/anuncio/2023-02-Linux-libre-15

Categories: FLOSS Project Planets

My experience taking part in Season of KDE

Planet KDE - Sat, 2023-03-18 10:23
My experience taking part in Season of KDE Background and motivations.

The year : 2020. Covid-19 has reached Europe, and a high-school student finds himself trapped home with a lot more free time than usual. So, of course, he spends the first week playing video games practically non-stop. However, he soon gets slightly bored by it, and begins to follow an online tutorial about programming in c++ with an introduction to the Qt framework.

Fast-forward to 2022. In the meantime, I had made the switch to Linux, and I had continued to make small c++/Qt programs in my free time. I decided that it would be nice to put my knowledge to practice, and do something useful for the community. KDE was a natural choice, as I really enjoy Plasma and the KDE apps, and I already had some experience with c++ and Qt.

What is Season of Kde ?

“Season of KDE is an outreach program hosted by the KDE community. The Season of KDE provides an opportunity for people to do mentored projects for KDE.”

- https://community.kde.org/SoK/About

The Adventure Begins !

The first challenge was to choose what to do. Coming from the outside, it is hard to know what projects are in active development, and where help is needed. This is one of the big advantages of SoK, as developers can submit project ideas, so mentees can just pick what interests them. However, I ultimately chose not to work on a proposed project, but to work on AudioTube. AudioTube is a YouTube Music client I had discovered a short while before, and which, most importantly, is in active development and I had some Ideas on features I could implement.

I got in touch with the developers and found some amazing persons who where willing to mentor my project, namely Jonah Brüchert, Carl Schwan and Devin Lin.

After having written the project proposal and set up a development environment, everything was ready to start !

The first merge request

I wanted to start with something simple. Adding the ability to remove songs from the playback history. Just adding an item to a popup menu and writing a small backend, what could be hard about that ? Wait, this application uses qml ? No widgets ?

qml crash-course

It turned out the ui side was still pretty easy, I just had to copy-paste some lines of code and change some keywords. The backend was a little bit more challenging, requiring some sql and a separate c++ class to handle when the menu item should be displayed, but nothing undoable

And then it was time to submit my first ever merge request (and getting aquainted with git and gitlab !).

Later merge requests

As a natural continuation to this first success, I began to work on making it possible to remove old search queries to the database. The search queries were handled by a model-view pattern. It turned out that the model was reset and reloaded every time a change was made to it, so I made it possible to change the model and the underlying database separately.

I also implemented a volume slider, a “clear playlist” button, the ability to play the favourite and most played songs as a playlist, enabled keyboard navigation of search propositions, and introduced support for the mpris standard, making it possible to control AudioTube from the system tray.

Useful lessons

If you hesitate to start contributing to an open source project because you have no previous experience in development, I can only encourage you to give it a shot. It isn’t really complicated. Also, it is very fun and there are lots of people who will gladly help you to get started. You might want to consider the following tips (most of them should also be valid outside of SoK and KDe) :

  • Start small. In the first merge request, you will need to get accustomed to many new tools and a new code base. So first make some small, easy changes to get the hang of it and gradually move to more complex, exiting new features.
  • When you encounter a new tool, learn only what you need for your current task. You don’t need to go through the whole git book before you can make your first contribution. If you learn only what you currently need, you’ll be able to get started much faster, and you can learn more every time you use that tool.
  • Don’t hesitate to ask for help. It is always good to try to solve problems yourself, but if you’re stuck somewhere, consider asking more experienced contributors. You’ll most probably learn something along the way.
  • If you want to start out in KDE, you’ll find useful information at https://community.kde.org/Get_Involved. Also, the #kde-welcome:kde.org matrix room is intended for new developers.
Closing thoughts

Getting new contributors is absolutely crucial for a healthy, long-living open source project. It is therefore important to make the onboarding process as easy as possible. Season of KDE achieves this by providing project ideas and facilitating the process of finding mentors to help. So, if you are already a contributor and have a nice idea for a project, please consider submitting it next year at SoK and maybe propose to mentor it. Or put it in the junior job list. And thanks for being patient with us newbies who make stupid mistake and ask obvious questions ;)
I also want to thank everybody who makes Season of KDE possible, especially my mentors.

Categories: FLOSS Project Planets

Hynek Schlawack: How to Automatically Switch to Rosetta With Fish and Direnv

Planet Python - Sat, 2023-03-18 08:00

I love my Apple silicon computer, but having to manually switch to Rosetta-enabled shells for my Intel-only projects was a bummer.

Categories: FLOSS Project Planets

Talk Python to Me: #407: pytest tips and tricks for better testing

Planet Python - Sat, 2023-03-18 04:00
If you're like most people, the simplicity and easy of getting started is a big part of pytest's appeal. But beneath that simplicity, there is a lot of power and depth. We have Brian Okken on this episode to dive into his latest pytest tips and tricks for beginners and power users.<br/> <br/> <strong>Links from the show</strong><br/> <br/> <div><b>pytest tips and tricks article</b>: <a href="https://pythontest.com/pytest-tips-tricks/" target="_blank" rel="noopener">pythontest.com</a><br/> <b>Getting started with pytest Course</b>: <a href="https://training.talkpython.fm/courses/getting-started-with-testing-in-python-using-pytest" target="_blank" rel="noopener">training.talkpython.fm</a><br/> <b>pytest book</b>: <a href="https://pythontest.com/pytest-book/" target="_blank" rel="noopener">pythontest.com</a><br/> <b>Python Bytes podcast</b>: <a href="https://pythonbytes.fm" target="_blank" rel="noopener">pythonbytes.fm</a><br/> <b>Brian on Mastodon</b>: <a href="https://fosstodon.org/@brianokken" target="_blank" rel="noopener">@brianokken@fosstodon.org</a><br/> <br/> <b>Hypothesis</b>: <a href="https://hypothesis.readthedocs.io/en/latest/" target="_blank" rel="noopener">readthedocs.io</a><br/> <b>Hypothesis: Reproducability</b>: <a href="https://hypothesis.readthedocs.io/en/latest/reproducing.html" target="_blank" rel="noopener">readthedocs.io</a><br/> <b>Get More Done with the DRY Principle</b>: <a href="https://zapier.com/blog/dont-repeat-yourself/" target="_blank" rel="noopener">zapier.com</a><br/> <b>"The Key" Keyboard</b>: <a href="https://stackoverflow.blog/2021/03/31/the-key-copy-paste/" target="_blank" rel="noopener">stackoverflow.blog</a><br/> <b>pytest plugins</b>: <a href="https://docs.pytest.org/en/7.1.x/reference/plugin_list.html" target="_blank" rel="noopener">docs.pytest.org</a><br/> <b>Watch this episode on YouTube</b>: <a href="https://www.youtube.com/watch?v=qQ6b7OwT124" target="_blank" rel="noopener">youtube.com</a><br/> <b>Episode transcripts</b>: <a href="https://talkpython.fm/episodes/transcript/407/pytest-tips-and-tricks-for-better-testing" target="_blank" rel="noopener">talkpython.fm</a><br/> <br/> <b>--- Stay in touch with us ---</b><br/> <b>Subscribe to us on YouTube</b>: <a href="https://talkpython.fm/youtube" target="_blank" rel="noopener">youtube.com</a><br/> <b>Follow Talk Python on Mastodon</b>: <a href="https://fosstodon.org/web/@talkpython" target="_blank" rel="noopener"><i class="fa-brands fa-mastodon"></i>talkpython</a><br/> <b>Follow Michael on Mastodon</b>: <a href="https://fosstodon.org/web/@mkennedy" target="_blank" rel="noopener"><i class="fa-brands fa-mastodon"></i>mkennedy</a><br/></div><br/> <strong>Sponsors</strong><br/> <a href='https://talkpython.fm/foundershub'>Microsoft Founders Hub 2023</a><br> <a href='https://talkpython.fm/brilliant'>Brilliant 2023</a><br> <a href='https://talkpython.fm/training'>Talk Python Training</a>
Categories: FLOSS Project Planets

This week in KDE: “More Wayland fixes”

Planet KDE - Fri, 2023-03-17 23:07

It’s become almost a running joke on Phoronix at this point, but this week we do indeed have more Wayland fixes! …And other things as well, including some good UI improvements to various KDE apps in addition to the background work of Qt 6 porting that is continuing full steam ahead, and reaching a position of increasing stability. Come see!

User Interface Improvements

Ark’s welcome screen is now richer with features, to be more like the one in Kate (Eugene Popov, Ark 23.04. Link):

Made a few UI improvements to Elisa, such as showing a “Quit” menu item in the hamburger menu when using the System Tray icon feature, correctly returning to the prior window state when exiting full screen mode, and resetting the playback position slider to the beginning when the playlist is manually cleared (Nikita Karpei and me: Nate Graham, Elisa 23.04. Link 1, link 2, and link 3)

Okular’s default toolbar layout has now been tweaked a bit, and now includes the “View Mode” menu by default and also shows the zoom and view buttons on the left side, with the tools on the right side (me: Nate Graham, Okular 23.04. Link 1 and link 2):

When “Fix it for me!”-style actions in the Samba sharing wizard fail, you’re now shown an appropriate error message explaining what went wrong (me: Nate Graham, kdenetwork-filesharing 23.08. Link)

Plasma now exposes global actions for “Restart” and “Shut Down” so you can add keyboard shortcuts to trigger them. We already had the “without confirmation” versions of these actions, but these new ones will ask for confirmation first (me: Nate Graham, Plasma 6.0. Link)

When importing VPN configurations, any errors are now shown in the UI so you can figure out what went wrong and maybe fix it yourself (Nicolas Fella, Plasma 5.27.3. Link)

While downloading new Flatpak apps, Discover now reports the status as “Downloading” correctly (Aleix Pol Gonzalez, Plasma 5.27.4. Link)

If your keyboard has an Emoji key, pressing it now opens the Emoji Picker window (Konrad Borowski, Plasma 5.27.4. Link)

Info Center has adopted a flattened sidebar structure so pages no longer live in sub-categories. This should make it easier and faster to access everything (Oliver Beard, Plasma 6.0. Link):

When you synchronize your Plasma settings to SDDM, it now also syncs the cursor size (me: Nate Graham, Plasma 6.0. Link)

We no longer misleadingly use Filelight’s icon for the 3rd-party GParted app in the Breeze icon theme (me: Nate Graham, Frameworks 5.105. Link)

Significant Bugfixes

(This is a curated list of e.g. HI and VHI priority bugs, Wayland showstoppers, major regressions, etc.)

Fixed a source of crashes in System Settings when importing VPN configuration files (Nicolas Fella, Plasma 5.27.3. Link)

Fixed another source of clipboard-related crashes in Plasma (Fushan Wen, Plasma 5.27.4. Link)

Significantly improved robustness of screen arrangements when using a multi-monitor setup that includes monitors with identical EDID values (Xaver Hugl, Plasma 5.27.3. Link)

Significantly improved robustness of Plasma containments’ mapping to screens when using multi-monitor setups (David Edmundson, Plasma 5.27.3. Link)

Fixed the way GTK apps scale themselves in the Plasma Wayland session when using multiple screens with different physical DPI values (Luca Bacci, Plasma 5.27.4. Link)

In the Plasma Wayland session, Plasma no longer quits (not crashes!) when an app sends a window title that’s wayyyyy too long (David Edmundson, Plasma 5.27.4. Link)

In the Plasma Wayland session, screen recording and Task Manager thumbnails now work properly for users of NVIDIA GPUs with the proprietary drivers (Jan Grulich, Plasma 5.27.4. Link)

Other bug-related information of interest:

Automation & Systematization

Added a UI test for Discover to test installing and uninstalling apps from the PackageKit backend (Harald Sitter. Link)

Changes not in KDE that affect KDE

In its native Wayland mode, Firefox no longer has an invisible animation that forces the screen to constantly repaint, causing KWin to unnecessarily consume excessive CPU resources (Emilio Cobos Álvarez, Firefox 113, Link)

…And everything else

This blog only covers the tip of the iceberg! If you’re hungry for more, check out https://planet.kde.org, where you can find more news from other KDE contributors.

How You Can Help

If you’re a user, upgrade to Plasma 5.27! If your distro doesn’t offer it and won’t anytime soon, consider switching to a different one that ships software closer to its developer’s schedules.

If you’re a developer, consider working on known Plasma 5.27 regressions! You might also want to check out our 15-Minute Bug Initiative. Working on these issues makes a big difference quickly!

Otherwise, visit https://community.kde.org/Get_Involved to discover other ways to be part of a project that really matters. Each contributor makes a huge difference in KDE; you are not a number or a cog in a machine! You don’t have to already be a programmer, either. I wasn’t when I got started. Try it, you’ll like it! We don’t bite!

And finally, KDE can’t work without financial support, so consider making a donation today! This stuff ain’t cheap and KDE e.V. has ambitious hiring goals. We can’t meet them without your generous donations!

Categories: FLOSS Project Planets

GNU Health: Leading Public Mental Health Hospital in Argentina embraces GNU Health

GNU Planet! - Fri, 2023-03-17 20:53

The World Health Organization defines health as a state of complete physical, mental and social well-being and not merely the absence of disease or infirmity.

Unfortunately, this definition is far from being a reality in our societies. Instead of embracing the system of health, we live in the system of disease, ruled by a reactive, reductionist and unsustainable model of healthcare. The beautiful noble art and science of medicine is ill. Financial institutions and giant technological corporations are removing the human factor from medicine, transforming people and patients into clients. They are reducing the non-negotiable human right to healthcare to a privilege of a few.

Coming back to the formal definition of health, in the current system of disease very little is taken into account from the social and mental well-being . Today, many people with mental health conditions not only have to deal with the physiopathological aspects of the disorder, but also with the stigma, exclusion and invisibilization from the society.

But there is hope. Medicine is a social science, and GNUHealth is a social project with some technology behind. That feeling of optimism and hope has been reinforced in last week trip to Argentina and their people. In the end, medicine is about people interacting and taking care of people. Is about people before patients. I know them well, because I did my medical career in Argentina.

Group picture with health professionals from HESM, UNER, Government officials and GNU Solidario at the entrance of the leading Public Mental Health Hospital in Entre Ríos, Argentina

The Mental Health Hospital has chosen GNUHealth to improve the management of the institution resources, as well as to provide the best medical care for their community, both in outpatient and inpatient settings. Being able to properly identify every person who needs attention, and knowing the socio-sanitary, medical and clinical history in real time will make a big difference in the care of the individual.

The implementation of GNUHealth in this health institution will be lead by Prof. Dr. Fernando Sassetti and the department of public health studies of the University of Entre Ríos in the context of the GNU Health Alliance of Academic and Research Institutions agreement signed with GNU Solidario.

Health is an equilibrium of the inseparable and interconnected physical, social, mental and spiritual domains. Medicine is about taking into consideration and maintaining this body-mind-spirit-environment balance. This holistic approach to medicine is encoded in the genome of every nurse, psychologist, social worker and doctor from the Mental Health Hospital and the Primary care centers I got to know during these years in Entre Ríos, Argentina.

Links / References

Un software Libre para mejorar las políticas de salud: https://www.eldiario.com.ar/253548-un-software-para-mejorar-las-politicas-de-salud/

Hospital Escuela de Salud Mental : http://www.hesm.gob.ar/

Audiovisual institucional Hospital Escuela de Salud Mental: https://www.youtube.com/watch?v=Jx08WyfKRIE&t=12s

GNU Health: https://www.gnuhealth.org

Categories: FLOSS Project Planets

Matt Layman: Locking Down Your Users' Secrets: Django Sessions 101

Planet Python - Fri, 2023-03-17 20:00
Django is a powerful and popular web framework that makes it easy to build robust and secure web applications. One of the key features of Django is its ability to manage user sessions, which are essential for many web applications. However, you may be wondering if Django sessions are secure. In this article, we’ll explore the security of Django sessions and see how they can be made even more secure.
Categories: FLOSS Project Planets

Ben's SEO Blog: The Metatag Module

Planet Drupal - Fri, 2023-03-17 19:19
The Metatag Module Set up your Drupal site's meta tags along with specifying how your social media will appear when linking to pages, along with Schema information in one powerful Drupal module. Anonymous (not verified) Fri, 03/17/2023 - 18:19
Categories: FLOSS Project Planets

Ben's SEO Blog: How to Add Default Metatags for a Specific Content Type

Planet Drupal - Fri, 2023-03-17 18:43
How to Add Default Metatags for a Specific Content Type

So you've got your Metatag module installed and defaults configured for the Global, Front page, and Content. However, you want slightly different meta tags for several of your content types. Here's how you do that.

Tracy Cooper Fri, 03/17/2023 - 17:43
Categories: FLOSS Project Planets

Ben's SEO Blog: How to Add Meta Tag Fields to Your Content Types in Drupal

Planet Drupal - Fri, 2023-03-17 17:50
How to Add Meta Tag Fields to Your Content Types in Drupal

So you have your Meta tag module installed and all the default meta tags configured. However, you still need to tweak the meta tags for a specific page, but the meta tag fields aren't showing up in the edit interface. Here's how to get those to show up for the various content types.

Go to Manage > Structure > Content Types. This displays the Content types listing page. 

Tracy Cooper Fri, 03/17/2023 - 16:50
Categories: FLOSS Project Planets

Stack Abuse: DBSCAN with Scikit-Learn in Python

Planet Python - Fri, 2023-03-17 12:22
Introduction

You are working in a consulting company as a data scientis. The project you were currently assigned to has data from students who have recently finished courses about finances. The financial company that conducts the courses wants to understand if there are common factors that influence students to purchase the same courses or to purchase different courses. By understanding those factors, the company can create a student profile, classify each student by profile and recommend a list of courses.

When inspecting data from different student groups, you've come across three dispositions of points, as in 1, 2 and 3 below:

Notice that in plot 1, there are purple points organized in a half circle, with a mass of pink points inside that circle, a little concentration of orange points outside of that semi-circle, and five gray points that are far from all others.

In plot 2, there is a round mass of purple points, another of orange points, and also four gray points that are far from all the others.

And in plot 3, we can see four concentrations of points, purple, blue, orange, pink, and three more distant gray points.

Now, if you were to choose a model that could understand new student data and determine similar groups, is there a clustering algorithm that could give interesting results to that kind of data?

When describing the plots, we mentioned terms like mass of points and concentration of points, indicating that there are areas in all graphs with greater density. We also referred to circular and semi-circular shapes, which are difficult to identify by drawing a straight line or merely examining the closest points. Additionally, there are some distant points that likely deviate from the main data distribution, introducing more challenges or noise when determining the groups.

A density-based algorithm that can filter out noise, such as DBSCAN (Density-Based Spatial Clustering of Applications with Noise), is a strong choice for situations with denser areas, rounded shapes, and noise.

About DBSCAN

DBSCAN is one of the most cited algorithms in research, it's first publication appears in 1996, this is the original DBSCAN paper. In the paper, researchers demonstrate how the algorithm can identify non-linear spatial clusters and handle data with higher dimensions efficiently.

The main idea behind DBSCAN is that there is a minimum number of points that will be within a determined distance or radius from the most "central" cluster point, called core point. The points within that radius are the neighborhood points, and the points on the edge of that neighborhood are the border points or boundary points. The radius or neighborhood distance is called epsilon neighborhood, ε-neighborhood or just ε (the symbol for Greek letter epsilon).

Additionally, when there are points that aren't core points or border points because they exceed the radius for belonging to a determined cluster and also don't have the minimum number of points to be a core point, they are considered noise points.

This means we have three different types of points, namely, core, border and noise. Furthermore, it is important to note that the main idea is fundamentally based on a radius or distance, which makes DBSCAN - like most clustering models - dependent on that distance metric. This metric could be Euclidean, Manhattan, Mahalanobis, and many more. Therefore, it is crucial to choose an appropriate distance metric that considers the context of the data. For instance, if you are using driving distance data from a GPS, it might be interesting to use a metric that takes the street layouts into consideration, such as Manhattan distance.

Note: Since DBSCAN maps the points that constitute noise, it can also be used as an outlier detection algorithm. For instance, if you are trying to determine which bank transactions may be fraudulent and the rate of fraudulent transactions is small, DBSCAN might be a solution to identify those points.

To find the core point, DBSCAN will first select a point at random, map all the points within its ε-neighborhood, and compare the number of neighbors of the selected point to the minimum number of points. If the selected point has an equal number or more neighbors than the minimum number of points, it will be marked as a core point. This core point and its neighborhood points will constitute the first cluster.

The algorithm will then examine each point of the first cluster and see if it has an equal number or more neighbor points than the minimum number of points within ε. If it does, those neighbor points will also be added to the first cluster. This process will continue until the points of the first cluster have fewer neighbors than the minimum number of points within ε. When that happens, the algorithm stops adding points to that cluster, identifies another core point outside of that cluster, and creates a new cluster for that new core point.

DBSCAN will then repeat the first cluster process of finding all points connected to a new core point of the second cluster until there are no more points to be added to that cluster. It will then encounter another core point and create a third cluster, or it will iterate through all the points that it hasn't previously looked at. If these points are at ε distance from a cluster, they are added to that cluster, becoming border points. If they aren't, they are considered noise points.

Advice: There are many rules and mathematical demonstrations involved in the idea behind DBSCAN, if you want to dig deeper, you may want to take a look at the original paper, which is linked above.

It is interesting to know how the DBSCAN algorithm works, although, fortunately, there is no need to code the algorithm, once Python's Scikit-Learn library already has an implementation.

Let's see how it works in practice!

Importing Data for Clustering

To see how DBSCAN works in practice, we will change projects a bit and use a small customer dataset that has the genre, age, annual income, and spending score of 200 customers.

The spending score ranges from 0 to 100 and represents how often a person spends money in a mall on a scale from 1 to 100. In other words, if a customer has a score of 0, they never spend money, and if the score is 100, they are the highest spender.

Note: You can download the dataset here.

After downloading the dataset, you will see that it is a CSV (comma-separated values) file called shopping-data.csv, we'll load it into a DataFrame using Pandas and store it into the customer_data variable:

import pandas as pd # Substitute the path_to_file content by the path to your csv file path_to_file = '../../datasets/dbscan/dbscan-with-python-and-scikit-learn-shopping-data.csv' customer_data = pd.read_csv(path_to_file)

To take a look at the first five rows of our data, you can execute customer_data.head():

This results in:

CustomerID Genre Age Annual Income (k$) Spending Score (1-100) 0 1 Male 19 15 39 1 2 Male 21 15 81 2 3 Female 20 16 6 3 4 Female 23 16 77 4 5 Female 31 17 40

By examining the data, we can see customer ID numbers, genre, age, incomes in k$, and spending scores. Keep in mind that some or all of these variables will be used in the model. For example, if we were to use Age and Spending Score (1-100) as variables for DBSCAN, which uses a distance metric, it's important to bring them to a common scale to avoid introducing distortions since Age is measured in years and Spending Score (1-100) has a limited range from 0 to 100. This means that we will perform some kind of data scaling.

We can also check if the data needs any more preprocessing aside from scaling by seeing if the type of data is consistent and verifying if there are any missing values that need to be treated by executing Panda's info() method:

customer_data.info()

This displays:

<class 'pandas.core.frame.DataFrame'> RangeIndex: 200 entries, 0 to 199 Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 CustomerID 200 non-null int64 1 Genre 200 non-null object 2 Age 200 non-null int64 3 Annual Income (k$) 200 non-null int64 4 Spending Score (1-100) 200 non-null int64 dtypes: int64(4), object(1) memory usage: 7.9+ KB

We can observe that there are no missing values because there are 200 non-null entries for each customer feature. We can also see that only the genre column has text content, as it is a categorical variable, which is displayed as object, and all other features are numeric, of the type int64. Thus, in terms of data type consistency and absence of null values, our data is ready for further analysis.

We can proceed to visualize the data and determine which features would be interesting to use in DBSCAN. After selecting those features, we can scale them.

This customer dataset is the same as the one used in our definitive guide to hierarchical clustering. To learn more about this data, how to explore it, and about distance metrics, you can take a look at Definitive Guide to Hierarchical Clustering with Python and Scikit-Learn!

Visualizing Data

By using Seaborn's pairplot(), we can plot a scatter graph for each combination of features. Since CustomerID is just an identification and not a feature, we will remove it with drop() prior to plotting:

import seaborn as sns # dropping CustomerID column from data customer_data = customer_data.drop('CustomerID', axis=1) sns.pairplot(customer_data);

This outputs:

When looking at the combination of features produced by pairplot, the graph of Annual Income (k$) with Spending Score (1-100) seems to display around 5 groups of points. This seems to be the most promising combination of features. We can create a list with their names, select them from the customer_data DataFrame, and store the selection in the customer_data variable again for use in our future model.

selected_cols = ['Annual Income (k$)', 'Spending Score (1-100)'] customer_data = customer_data[selected_cols]

After selecting the columns, we can perform the scaling discussed in the previous section. To bring the features to the same scale or standardize them, we can import Scikit-Learn's StandardScaler, create it, fit our data to calculate its mean and standard deviation, and transform the data by subtracting its mean and dividing it by the standard deviation. This can be done in one step with the fit_transform() method:

from sklearn.preprocessing import StandardScaler ss = StandardScaler() # creating the scaler scaled_data = ss.fit_transform(customer_data)

The variables are now scaled, and we can examine them by simply printing the content of the scaled_data variable. Alternatively, we can also add them to a new scaled_customer_data DataFrame along with column names and use the head() method again:

scaled_customer_data = pd.DataFrame(columns=selected_cols, data=scaled_data) scaled_customer_data.head()

This outputs:

Annual Income (k$) Spending Score (1-100) 0 -1.738999 -0.434801 1 -1.738999 1.195704 2 -1.700830 -1.715913 3 -1.700830 1.040418 4 -1.662660 -0.395980

This data is ready for clustering! When introducing DBSCAN, we mentioned the minimum number of points and the epsilon. These two values need to be selected prior to creating the model. Let's see how it's done.

Choosing Min. Samples and Epsilon

To choose the minimum number of points for DBSCAN clustering, there is a rule of thumb, which states that it has to be equal or higher than the number of dimensions in the data plus one, as in:

$$
\text{min. points} >= \text{data dimensions} + 1
$$

The dimensions are the number of columns in the dataframe, we are using 2 columns, so the min. points should be either 2+1, which is 3, or higher. For this example, let's use 5 min. points.

$$
\text{5 (min. points)} >= \text{2 (data dimensions)} + 1
$$

Now, to choose the value for ε there is a method in which a Nearest Neighbors algorithm is employed to find the distances of a predefined number of nearest points for each point. This predefined number of neighbors is the min. points we have just chosen minus 1. So, in our case, the algorithm will find the 5-1, or 4 nearest points for each point of our data. those are the k-neighbors and our k equals 4.

$$
\text{k-neighbors} = \text{min. points} - 1
$$

Advice: to learn more about Nearest Neighbors, read our K Nearest Neighbors algorithm in Python and Scikit-learn guide.

After finding the neighbors, we will order their distances from largest to smallest and plot the distances of the y-axis and the points on the x-axis. Looking at the plot, we will find where it resembles the bent of an elbow and the y-axis point that describes that elbow bent is the suggested ε value.

Note: it is possible that the graph for finding the ε value has either one or more "elbow bents", either big or mini, when that happens, you can find the values, test them and choose those with results that best describe the clusters, either by looking at metrics of plots.

To perform these steps, we can import the algorithm, fit it to the data, and then we can extract the distances and indices of each point with kneighbors() method:

from sklearn.neighbors import NearestNeighbors import numpy as np nn = NearestNeighbors(n_neighbors=4) # minimum points -1 nbrs = nn.fit(scaled_customer_data) distances, indices = nbrs.kneighbors(scaled_customer_data)

Note: Just like with DBSCAN, it is essential to choose a distance metric that suits your data when using the Nearest Neighbors algorithm, as it is also distance-based.

For a list of some metrics and explanations on when to use them, you can take a look at Definitive Guide to Hierarchical Clustering with Python and Scikit-Learn.

After finding the distances, we can sort them from largest to smallest. Since the distances array's first column is of the point to itself (meaning all are 0), and the second column contains the smallest distances, followed by the third column which has larger distances than the second, and so on, we can pick only the values of the second column and store them in the distances variable:

distances = np.sort(distances, axis=0) distances = distances[:,1] # Choosing only the smallest distances

Now that we have our sorted smallest distances, we can import matplotlib, plot the distances, and draw a red line on where the "elbow bend" is:

import matplotlib.pyplot as plt plt.figure(figsize=(6,3)) plt.plot(distances) plt.axhline(y=0.24, color='r', linestyle='--', alpha=0.4) # elbow line plt.title('Kneighbors distance graph') plt.xlabel('Data points') plt.ylabel('Epsilon value') plt.show();

This is the result:

Notice that when drawing the line, we will find out the ε value, in this case, it is 0.24.

We finally have our minimum points and ε. With both variables, we can create and run the DBSCAN model.

Creating a DBSCAN Model

To create the model, we can import it from Scikit-Learn, create it with ε which is the same as the eps argument, and the minimum points to which is the mean_samples argument. We can then store it into a variable, let's call it dbs and fit it to the scaled data:

from sklearn.cluster import DBSCAN # min_samples == minimum points ≥ dataset_dimensions + 1 dbs = DBSCAN(eps=0.24, min_samples=5) dbs.fit(scaled_customer_data)

Just like that, our DBSCAN model has been created and trained on the data! To extract the results, we access the labels_ property. We can also create a new labels column in the scaled_customer_data dataframe and fill it with the predicted labels:

labels = dbs.labels_ scaled_customer_data['labels'] = labels scaled_customer_data.head()

This is the final result:

Annual Income (k$) Spending Score (1-100) labels 0 -1.738999 -0.434801 -1 1 -1.738999 1.195704 0 2 -1.700830 -1.715913 -1 3 -1.700830 1.040418 0 4 -1.662660 -0.395980 -1

Observe that we have labels with -1 values; these are the noise points, the ones that don't belong to any cluster. To know how many noise points the algorithm found, we can count how many times the value -1 appears in our labels list:

labels_list = list(scaled_customer_data['labels']) n_noise = labels_list.count(-1) print("Number of noise points:", n_noise)

This outputs:

Number of noise points: 62

We already know that 62 points of our original data of 200 points were considered noise. This is a lot of noise, which indicates that perhaps the DBSCAN clustering didn't consider many points as part of a cluster. We will understand what happened soon, when we plot the data.

Initially, when we observed the data, it seemed to have 5 clusters of points. To know how many clusters DBSCAN has formed, we can count the number of labels that are not -1. There are many ways to write that code; here, we have written a for loop, which will also work for data in which DBSCAN has found many clusters:

total_labels = np.unique(labels) n_labels = 0 for n in total_labels: if n != -1: n_labels += 1 print("Number of clusters:", n_labels)

This outputs:

Number of clusters: 6

We can see that the algorithm predicted the data to have 6 clusters, with many noise points. Let's visualize that by plotting it with seaborn's scatterplot:

sns.scatterplot(data=scaled_customer_data, x='Annual Income (k$)', y='Spending Score (1-100)', hue='labels', palette='muted').set_title('DBSCAN found clusters');

This results in:

Looking at the plot, we can see that DBSCAN has captured the points which were more densely connected, and points that could be considered part of the same cluster were either noise or considered to form another smaller cluster.

If we highlight the clusters, notice how DBSCAN gets cluster 1 completely, which is the cluster with less space between points. Then it gets the parts of clusters 0 and 3 where the points are closely together, considering more spaced points as noise. It also considers the points in the lower left half as noise and splits the points in the lower right into 3 clusters, once again capturing clusters 4, 2, and 5 where the points are closer together.

We can start to come to a conclusion that DBSCAN was great for capturing the dense areas of the clusters but not so much for identifying the bigger scheme of the data, the 5 clusters' delimitations. It would be interesting to test more clustering algorithms with our data. Let's see if a metric will corroborate this hypothesis.

Evaluating the Algorithm

To evaluate DBSCAN we will use the silhouette score which will take into consideration the distance between points of a same cluster and the distances between clusters.

Note: Currently, most clustering metrics aren't really fitted to be used to evaluate DBSCAN because they aren't based on density. Here, we are using the silhouette score because it is already implemented in Scikit-learn and because it tries to look at cluster shape.

To have a more fitted evaluation, you can use or combine it with the Density-Based Clustering Validation (DBCV) metric, which was designed specifically for density-based clustering. There is an implementation for DBCV available on this GitHub.

First, we can import silhouette_score from Scikit-Learn, then, pass it our columns and labels:

from sklearn.metrics import silhouette_score s_score = silhouette_score(scaled_customer_data, labels) print(f"Silhouette coefficient: {s_score:.3f}")

This outputs:

Silhouette coefficient: 0.506

According to this score, it seems DBSCAN could capture approximately 50% of the data.

Conclusion DBSCAN Advantages and Disadvantages

DBSCAN is a very unique clustering algorithm or model.

If we look at its advantages, it is very good at picking up dense areas in data and points that are far from others. This means that the data doesn't have to have a specific shape and can be surrounded by other points, as long as they are also densely connected.

It requires us to specify minimum points and ε, but there is no need to specify the number of clusters beforehand, as in K-Means, for instance. It can also be used with very large databases since it was designed for high-dimensional data.

As for its disadvantages, we have seen that it couldn't capture different densities in the same cluster, so it has a hard time with large differences in densities. It is also dependent on the distance metric and scaling of the points. This means that if the data isn't well understood, with differences in scale and with a distance metric that doesn't make sense, it will probably fail to understand it.

DBSCAN Extensions

There are other algorithms, such as Hierarchical DBSCAN (HDBSCAN) and Ordering points to identify the clustering structure (OPTICS), which are considered extensions of DBSCAN.

Both HDBSCAN and OPTICS can usually perform better when there are clusters of varying densities in the data and are also less sensitive to the choice or initial min. points and ε parameters.

Categories: FLOSS Project Planets

Wim Leers: High concurrency Composer

Planet Drupal - Fri, 2023-03-17 11:45

On behalf of Acquia I’m currently working on Drupal’s next big leap: Automatic Updates & Project Browser — both are “strategic initiatives”.

In November, I started helping out the team led by Ted Bowman that’s been working on it non-stop for well over 1.5 years (!): see d.o/project/automatic_updates. It’s an enormous undertaking, with many entirely new challenges — as this post will show.

For a sense of scale: more people of Acquia’s “DAT” Drupal Acceleration Team have been working on this project than the entire original DAT/OCTO team back in 2012!

The foundation for both will be the (API-only, no UI!) package_manager module, which builds on top of the php-tuf/composer-stager library. We’re currently working hard to get that module committed to Drupal core before 10.1.0-alpha1.

Over the last few weeks, we managed to solve almost all of the remaining alpha blockers (which block the core issue that will add package_manager to Drupal core, as an alpha-experimental module. One of those was a random test failure on DrupalCI, whose failure frequency was increasing over time!

A rare random failure may be acceptable, but at this point, ~90% of test runs were failing on one or more of the dozens of Kernel tests … but always a different combination. Repeated investigations over the course of a month had not led us to the root cause. But now that the failure rate had reached new heights, we had to solve this. It brought the team’s productivity to a halt — imagine what damage this would have done to Drupal core’s progress!

A combination of prior research combined with the fact that suddenly the failure rate had gone up meant that there really could only be one explanation: this had to be a bug/race condition in Composer itself, because we were now invoking many more composer commands during test execution.

Once we changed focus to composer itself, the root cause became obvious: Composer tries to ensure the temporary directory is writable and avoids conflicts by using microtime(). That function confusingly can return the time at microsecond resolution, but defaults to mere millisecondssee for yourself.

With sufficiently high concurrency (up to 32 concurrent invocations on DrupalCI!), two composer commands could be executed on the exact same millisecond:

// Check system temp folder for usability as it can cause weird runtime issues otherwise Silencer::call(static function () use ($io): void { $tempfile = sys_get_temp_dir() . '/temp-' . md5(microtime()); if (!(file_put_contents($tempfile, __FILE__) && (file_get_contents($tempfile) === __FILE__) && unlink($tempfile) && !file_exists($tempfile))) { $io->writeError(sprintf('PHP temp directory (%s) does not exist or is not writable to Composer. Set sys_temp_dir in your php.ini', sys_get_temp_dir())); } }); — src/Composer/Console/Application.php in Composer 2.5.4

We could switch to microtime(TRUE) for microseconds (reduce collision probability 1000-fold) or hrtime() (reduce collision probability by a factor of a million). But more effective would be to avoid collisions altogether. And that’s possible: composer always runs in its own process.

Simply changing sys_get_temp_dir() . '/temp-' . md5(microtime()); to sys_get_temp_dir() . '/temp-' . getmypid() . '-' . md5(microtime()); is sufficient to safeguard against collisions when using Composer in high concurrency contexts.

So that single line change is what I proposed in a Composer PR a few days ago. Earlier today it was merged into the 2.5 branch — meaning it should ship in the next version!

Eventually we’ll be able to remove our work-around. But for now, this was one of the most interesting challenges along the way :)

Categories: FLOSS Project Planets

Axelerant Blog: How To Run Multiple Instances Of Mautic For Marketing Automation Needs

Planet Drupal - Fri, 2023-03-17 11:12
Introduction

Mautic is the world's largest free, open-source marketing software that automates marketing tasks like segmentation, lead scoring, campaigns & journey builders, and contact list management. It supports integration with all popular social media platforms like Twitter, LinkedIn, and Facebook and has integrations to connect with other marketing automation tools.

Categories: FLOSS Project Planets

Mike Driscoll: Python’s Built-in Functions – The all() Function (Video)

Planet Python - Fri, 2023-03-17 10:47

This is the next video in my Python Built-ins Series.

Did you know Python has an all() function? Do you know what to use the all() function for?

Find out today by watching this short video!

 

More Videos in the series

The post Python’s Built-in Functions – The all() Function (Video) appeared first on Mouse Vs Python.

Categories: FLOSS Project Planets

Pages