FLOSS Project Planets
LN Webworks: Supercharge Your Website With These 5 Must-Have Drupal Modules
Picture this: an intuitive administration toolbar, seamless configuration management, dynamic content generation, and tailored site settings—all at your fingertips. With over 40,000 Drupal modules and 2500+ themes available, the potential to enhance your digital journey and revolutionize your experience as a website administrator or owner is unparalleled.
Moreover, you can effortlessly optimize your website, personalize content and tailor the Drupal interface to match your unique requirements. From Admin tools to Site settings and labels, each module offers a unique set of features that will supercharge your Drupal experience. Read the article to learn about the top five Drupal modules you need to install right away.
Five Jars: How to visit DrupalCon efficiently: Tips from CEO Alex Schedrov
Qt for Windows on ARM
While the use of an ARM-based platform on desktops quickly became the "next big thing" in the macOS world a while ago, the situation in the Microsoft Windows ecosystem is a bit different. One of the values of the Windows platform is a long retention of established architectures. Due to this, an adoption of a "new" architecture is slower on Windows. Even though Windows on ARM provides emulation for running x64 binaries, there is a performance cost for this. Some of our users ask to provide native support for Windows on ARM (WoA). In this blogpost, you will learn what is available today and get insights on where we want to go.
Matt Brown: Calling time on DNSSEC: The costs exceed the benefits
I’m calling time on DNSSEC. Last week, prompted by a change in my DNS hosting setup, I began removing it from the few personal zones I had signed. Then this Monday the .nz ccTLD experienced a multi-day availability incident triggered by the annual DNSSEC key rotation process. This incident broke several of my unsigned zones, which led me to say very unkind things about DNSSEC on Mastodon and now I feel compelled to more completely explain my thinking:
For almost all domains and use-cases, the costs and risks of deploying DNSSEC outweigh the benefits it provides. Don’t bother signing your zones.
The .nz incident, while topical, is not the motivation or the trigger for this conclusion. Had it been a novel incident, it would still have been annoying, but novel incidents are how we learn so I have a small tolerance for them. The problem with DNSSEC is precisely that this incident was not novel, just the latest in a long and growing list.
It’s a clear pattern. DNSSEC is complex and risky to deploy. Choosing to sign your zone will almost inevitably mean that you will experience lower availability for your domain over time than if you leave it unsigned. Even if you have a team of DNS experts maintaining your zone and DNS infrastructure, the risk of routine operational tasks triggering a loss of availability (unrelated to any attempted attacks that DNSSEC may thwart) is very high - almost guaranteed to occur. Worse, because of the nature of DNS and DNSSEC these incidents will tend to be prolonged and out of your control to remediate in a timely fashion.
The only benefit you get in return for accepting this almost certain reduction in availability is trust in the integrity of the DNS data a subset of your users (those who validate DNSSEC) receive. Trusted DNS data that is then used to communicate across an untrusted network layer. An untrusted network layer which you are almost certainly protecting with TLS which provides a more comprehensive and trustworthy set of security guarantees than DNSSEC is capable of, and provides those guarantees to all your users regardless of whether they are validating DNSSEC or not.
In summary, in our modern world where TLS is ubiquitous, DNSSEC provides only a thin layer of redundant protection on top of the comprehensive guarantees provided by TLS, but adds significant operational complexity, cost and a high likelihood of lowered availability.
In an ideal world, where the deployment cost of DNSSEC and the risk of DNSSEC-induced outages were both low, it would absolutely be desirable to have that redundancy in our layers of protection. In the real world, given the DNSSEC protocol we have today, the choice to avoid its complexity and rely on TLS alone is not at all painful or risky to make as the operator of an online service. In fact, it’s the prudent choice that will result in better overall security outcomes for your users.
Ignore DNSSEC and invest the time and resources you would have spent deploying it improving your TLS key and certificate management.
Ironically, the one use-case where I think a valid counter-argument for this position can be made is TLDs (including ccTLDs such as .nz). Despite its many failings, DNSSEC is an Internet Standard, and as infrastructure providers, TLDs have an obligation to enable its use. Unfortunately this means that everyone has to bear the costs, complexities and availability risks that DNSSEC burdens these operators with. We can’t avoid that fact, but we can avoid creating further costs, complexities and risks by choosing not to deploy DNSSEC on the rest of our non-TLD zones.
But DNSSEC will save us from the evil CA ecosystem!Historically, the strongest motivation for DNSSEC has not been the direct security benefits themselves (which as explained above are minimal compared to what TLS provides), but in the new capabilities and use-cases that could be enabled if DNS were able to provide integrity and trusted data to applications.
Specifically, the promise of DNS-based Authentication of Named Entities (DANE) is that with DNSSEC we can be free of the X.509 certificate authority ecosystem and along with it the expensive certificate issuance racket and dubious trust properties that have long been its most distinguishing features.
Ten years ago this was an extremely compelling proposition with significant potential to improve the Internet. That potential has gone unfulfilled.
Instead of maturing as deployments progressed and associated operational experience was gained, DNSSEC has been beset by the discovery of issue after issue. Each of these has necessitated further changes and additions to the protocol, increasing complexity and deployment cost. For many zones, including significant zones like google.com (where I led the attempt to evaluate and deploy DNSSEC in the mid 2010s), it is simply infeasible to deploy the protocol at all, let alone in a reliable and dependable manner.
While DNSSEC maturation and deployment has been languishing, the TLS ecosystem has been steadily and impressively improving. Thanks to the efforts of many individuals and companies, although still founded on the use of a set of root certificate authorities, the TLS and CA ecosystem today features transparency, validation and multi-party accountability that comprehensively build trust in the ability to depend and rely upon the security guarantees that TLS provides. When you use TLS today, you benefit from:
- Free/cheap issuance from a number of different certificate authorities.
- Regular, automated issuance/renewal via the ACME protocol.
- Visibility into who has issued certificates for your domain and when through Certificate Transparency logs.
- Confidence that certificates issued without certificate transparency (and therefore lacking an SCT) will not be accepted by the leading modern browsers.
- The use of modern cryptographic protocols as a baseline, with a plausible and compelling story for how these can be steadily and promptly updated over time.
DNSSEC with DANE can match the TLS ecosystem on the first benefit (up front price) and perhaps makes the second benefit moot, but has no ability to match any of the other transparency and accountability measures that today’s TLS ecosystem offers. If your ZSK is stolen, or a parent zone is compromised or coerced, validly signed TLSA records for a forged certificate can be produced and spoofed to users under attack with minimal chances of detection.
Finally, in terms of overall trust in the roots of the system, the CA/Browser forum requirements continue to improve the accountability and transparency of TLS certificate authorities, significantly reducing the ability for any single actor (say a nefarious government) to subvert the system. The DNS root has a well established transparent multi-party system for establishing trust in the DNSSEC root itself, but at the TLD level, almost intentionally thanks to the hierarchical nature of DNS, DNSSEC has multiple single points of control (or coercion) which exist outside of any formal system of transparency or accountability.
We’ve moved from DANE being a potential improvement in security over TLS when it was first proposed, to being a definite regression from what TLS provides today.
That’s not to say that TLS is perfect, but given where we’re at, we’ll get a better security return from further investment and improvements in the TLS ecosystem than we will from trying to fix DNSSEC.
But TLS is not ubiquitous for non-HTTP applicationsThe arguments above are most compelling when applied to the web-based HTTP-oriented ecosystem which has driven most of the TLS improvements we’ve seen to date. Non-HTTP protocols are lagging in adoption of many of the improvements and best practices TLS has on the web. Some claim this need to provide a solution for non-HTTP, non-web applications provides a motivation to continue pushing DNSSEC deployment.
I disagree, I think it provides a motivation to instead double-down on moving those applications to TLS. TLS as the new TCP.
The problem is that costs of deploying and operating DNSSEC are largely fixed regardless of how many protocols you are intending to protect with it, and worse, the negative side-effects of DNSSEC deployment can and will easily spill over to affect zones and protocols that don’t want or need DNSSEC’s protection. To justify continued DNSSEC deployment and operation in this context means using a smaller set of benefits (just for the non-HTTP applications) to justify the already high costs of deploying DNSSEC itself, plus the cost of the risk that DNSSEC poses to the reliability to your websites. I don’t see how that equation can ever balance, particularly when you evaluate it against the much lower costs of just turning on TLS for the rest of your non-HTTP protocols instead of deploying DNSSEC. MTA-STS is a worked example of how this can be achieved.
If you’re still not convinced, consider that even DNS itself is considering moving to TLS (via DoT and DoH) in order to add the confidentiality/privacy attributes the protocol currently lacks. I’m not a huge fan of the latency implications of these approaches, but the ongoing discussion shows that clever solutions and mitigations for that may exist.
DoT/DoH solve distinct problems from DNSSEC and in principle should be used in combination with it, but in a world where DNS itself is relying on TLS and therefore has eliminated the majority of spoofing and cache poisoning attacks through DoT/DoH deployment the benefit side of the DNSSEC equation gets smaller and smaller still while the costs remain the same.
OK, but better software or more careful operations can reduce DNSSEC’s costSome see the current DNSSEC costs simply as teething problems that will reduce as the software and tooling matures to provide more automation of the risky processes and operational teams learn from their mistakes or opt to simply transfer the risk by outsourcing the management and complexity to larger providers to take care of.
I don’t find these arguments compelling. We’ve already had 15+ years to develop improved software for DNSSEC without success. What’s changed that we should expect a better outcome this year or next? Nothing.
Even if we did have better software or outsourced operations, the approach is still only hiding the costs behind automation or transferring the risk to another organisation. That may appear to work in the short-term, but eventually when the time comes to upgrade the software, migrate between providers or change registrars the debt will come due and incidents will occur.
The problem is the complexity of the protocol itself. No amount of software improvement or outsourcing addresses that.
After 15+ years of trying, I think it’s worth considering that combining cryptography, caching and distributed consensus, some of the most fundamental and complex computer science problems, into a slow-moving and hard to evolve low-level infrastructure protocol while appropriately balancing security, performance and reliability appears to be beyond our collective ability.
That doesn’t have to be the end of the world, the improvements achieved in the TLS ecosystem over the same time frame provide a positive counter example - perhaps DNSSEC is simply focusing our attention at the wrong layer of the stack.
Ideally secure DNS data would be something we could have, but if the complexity of DNSSEC is the price we have to pay to achieve it, I’m out. I would rather opt to remain with the simpler yet insecure DNS protocol and compensate for its short comings at higher transport or application layers where experience shows we are able to more rapidly improve and develop our security capabilities.
Summing upFor the vast majority of domains and use-cases there is simply no net benefit to deploying DNSSEC in 2023. I’d even go so far as to say that if you’ve already signed your zones, you should (carefully) move them back to being unsigned - you’ll reduce the complexity of your operating environment and lower your risk of availability loss triggered by DNS. Your users will thank you.
The threats that DNSSEC defends against are already amply defended by the now mature and still improving TLS ecosystem at the application layer, and investing in further improvements here carries far more return than deployment of DNSSEC.
For TLDs, like .nz whose outage triggered this post, DNSSEC is not going anywhere and investment in mitigating its complexities and risks is an unfortunate burden that must be shouldered. While the full incident report of what went wrong with .nz is not yet available, the interim report already hints at some useful insights. It is important that InternetNZ publishes a full and comprehensive review so that the full set of learnings and improvements this incident can provide can be fully realised by .nz and other TLD operators stuck with the unenviable task of trying to safely operate DNSSEC.
PostscriptAfter taking a few days to draft and edit this post, I’ve just stumbled across a presentation from the well respected Geoff Huston at last weeks RIPE86 meeting. I’ve only had time to skim the slides (video here) - they don’t seem to disagree with my thinking regarding the futility of the current state of DNSSEC, but also contain some interesting ideas for what it might take for DNSSEC to become a compelling proposition.
Probably worth a read/watch!
Drupal Association blog: Statement of the Drupal Association | Pride Month 2023
As we gather in June for DrupalCon North America, we are gathering during LGBTQ Pride Month in the U.S., a celebration of the 1969 Stonewall Uprising which was a tipping point for the gay liberation movement and spurred the growth of LGBT support organizations from 50 to 1,500 during the following year. As an important part of U.S. history, President Obama established the Stonewall National Monument on June 23, 2016. Pride is recognized in June in many countries around the world to continue the fight for LGBTQ+ equity globally.
But we are also gathering at a time in the U.S. in which transgender rights are coming under attack in many states and local communities.
For this reason, the Drupal Association felt it important to restate our values, which are the values that run through open source itself and are values that guide our internal policies and our work with the Drupal Community. At the core of our beliefs lies the principle that every individual, regardless of their sexual orientation, gender identity, or expression, deserves to be treated with dignity and respect.
- We believe that supporting diversity, equity, and inclusion is important because it is the right thing to do and because it is essential to the health and success of the project.
- We seek to support diversity, equity, and inclusion and reduce hatred, oppression, and violence, especially towards our LGBTQ+ community members.
- We will not accept intolerance towards LGBTQ+ community members. Every person is welcome (though every behavior is not).
- We acknowledge the unique experiences of our LGBTQ+ community members and commit to amplifying their voices.
The Drupal Association unequivocally supports legal protections that ensure LGBTQ+ individuals are afforded the same rights and opportunities as their heterosexual and cisgender counterparts, unimpaired by personal prejudices or systems of oppression. This includes marriage equality, right to gender affirming medical care, right to privacy and comprehensive anti-discrimination laws that encompass employment, housing, healthcare, and public accommodations.
FSF Blogs: May GNU Spotlight with Amin Bandali: Nineteen new GNU releases!
Luke Plant: Django and Sass/SCSS without Node.js or a build step
Although they are less necessary than in the past, I like to use a CSS pre-processor when doing web development. I used to use LessCSS, but recently I’ve found that I can use Sass without needing either a separate build step, or a package that requires Node.js and npm to install it. The heart of the functionality is provided by libsass, an implementation of Sass as a C++ library.
On Linux systems, this can be installed as a package libsass or similar, but even better is that you can pip install it as a Python package, libsass.
When it comes to using it from a Django project, the first step is to install django-compressor.
Then, you need to add django-libsass as per its instructions.
That’s about it. As per the django-libsass instructions, somewhere in your base HTML templates you’ll have something like this:
{% compress css %} <link rel="stylesheet" type="text/x-scss" href="{% static "myapp/css/main.scss" %}" /> {% endcompress %}You write your SCSS in that main.scss file (it doesn’t have to be called that), and it can @import other SCSS files of course.
Then, when you load a page, django-compressor will take care of running the SCSS files through libsass, saving the output CSS to a file and inserting the appropriate HTML that references that CSS file into your template output. It caches things very well so that you don’t incur any penalty if files haven’t changed — and libsass is a very fast implementation for when the processing does need to happen.
What this means is that you have eliminated both the need for Node.js/npm, and the need for a build step/process.
Of course, the SCSS → CSS compilation still has to happen, but it happens on demand in the same process that runs the web app, and it’s both fast enough and reliable enough that you simply never have to think about it again. So this is “build-less” in the same way that “server-less” means you don’t have to think about servers, and the same way that Python “doesn’t have a compilation step”.
Future proofingOn the Sass-lang page about libsass, they say it is “deprecated”, and on the project page page it says:
While it will continue to receive maintenance releases indefinitely, there are no plans to add additional features or compatibility with any new CSS or Sass features.
In other words, this is what I prefer to call “mature software” 😉. libsass already has everything I need. If it does eventually fail to be maintained or I need new features, it’s not a problem:
Switch to Dart Sass, which can be installed as a standalone binary.
Set your django-compressor settings like this:
COMPRESS_PRECOMPILERS = [ ("text/x-scss", "sass {infile} {outfile}"), ]
This covers the basic case. If you want all the features of django-libsass, which includes looking in your other static file folders for SCSS, you’ll probably need to fork the code and make it work by calling Dart Sass using subprocess — a small amount of work, and nothing that will fundamentally break this approach.
Gunnar Wolf: Cheatable e-voting booths in Coahuila, Mexico, detected at the last minute
It’s been a very long time I haven’t blogged about e-voting, although some might remember it’s been a topic I have long worked with; particularly, it was the topic of my 2018 Masters thesis, plus some five articles I wrote in the 2010-2018 period. After the thesis, I have to admit I got weary of the subject, and haven’t pursued it anymore.
So, I was saddened and dismayed to read that –once again, as it has already happened– the electoral authorities would set up a pilot e-voting program in the local elections this year, that would probably lead to a wider deployment next year, in the Federal elections.
This year (…this week!), two States will have elections for their Governors and local Legislative branches: Coahuila (North, bordering with Texas) and Mexico (Center, surrounding Mexico City). They are very different states, demographically and in their development level.
Pilot programs with e-voting booths have been seen in four states TTBOMK in the last ~15 years: Jalisco (West), Mexico City, State of Mexico and Coahuila. In Coahuila, several universities have teamed up with the Electoral Institute to develop their e-voting booth; a good thing that I can say about how this has been done in my country is that, at least, the Electoral Institute is providing their own implementations, instead of sourcing with e-booth vendors (which have their long, tragic story mostly in the USA, but also in other places). Not only that: They are subjecting the machines to audit processes. Not open audit processes, as demanded by academics in the field, but nevertheless, external, rigorous audit processes.
But still, what me and other colleagues with Computer Security background oppose to is not a specific e-voting implementation, but the adoption of e-voting in general. If for nothing else, because of the extra complexity it brings, because of the many more checks that have to be put in place, and… Because as programmers, we are aware of the ease with which bugs can creep in any given implementation… both honest bugs (mistakes) and, much worse, bugs that are secretly requested and paid for.
Anyway, leave this bit aside for a while. I’m not implying there was any ill intent in the design or implementation of these e-voting booths.
Two days ago, the Electoral Institute announced there was an important bug found in the Coahuila implementation. The bug consists, as far as I can understand from the information reported in newspapers, in:
- Each voter approaches their electoral authorities, who verify their identity and their authorization to vote in that precinct
- The voter is given an activation code, with which they go to the voting booth
- The booth is activated and enables each voter to cast a vote only once
The problem was that the activation codes remained active after voting, so a voter could vote multiple times.
This seems like an easy problem to be patched — It most likely is. However, given the inability to patch, properly test, and deploy in a timely manner the fix to all of the booths (even though only 74 e-voting booths were to be deployed for this pilot), the whole pilot for Coahuila was scratched; Mexico State is voting with a different implementation that is not affected by this issue.
This illustrates very well one of the main issues with e-voting technology: It requires a team of domain-specific experts to perform a highly specialized task (code and physical audits). I am happy and proud to say that part of the auditing experts were the professors of the Information Security Masters program of ESIME Culhuacán (the Masters program I was part of).
The reaction by the Electoral Institute was correct. As far as I understand, there is no evidence suggesting this bug could have been purposefully built, but it’s not impossible to rule it out.
A traditional, paper-and-ink-based process is not only immune to attacks (or mistakes!) based on code such as this one, but can be audited by anybody. And that is, I believe, a fundamental property of democracy: ensuring the process is done right is not limited to a handful of domain experts. Not only that: In Mexico, I am sure there are hundreds of very proficient developers that could perform a code and equipment audit such as this one, but the audits are open by invitation only, so being an expert is not enough to get clearance to do this.
In a democracy, the whole process should be observable and verifiable by anybody interested in doing so.
Some links about this news:
- INE cancela urnas electrónicas en Coahuila por error en programación que permitía repetir votos (Milenio)
- Ante fallas, INE cancela uso de urnas electrónicas en Coahuila (Eje Central)
- INE cancela urnas electrónicas de Coahuila (La Capital)
- Urnas electrónicas no se instalarán en Coahuila y será de manera tradicional (El Tiempo)
- Cancela INE voto de urnas electrónicas en Coahuila (Zócalo)
- Cancelan urnas electrónicas en Coahuila porque duplican votos (Alto Nivel)
FSF Events: Free Software Directory meeting on IRC: Friday, June 30, starting at 12:00 EDT (16:00 UTC)
FSF Events: Free Software Directory meeting on IRC: Friday, June 23, starting at 12:00 EDT (16:00 UTC)
FSF Events: Free Software Directory meeting on IRC: Friday, June 16, starting at 12:00 EDT (16:00 UTC)
FSF Events: Free Software Directory meeting on IRC: Friday, June 09, starting at 12:00 EDT (16:00 UTC)
FSF Events: Free Software Directory meeting on IRC: Friday, June 02, starting at 12:00 EDT (16:00 UTC)
PyCharm: PyCharm 2023.3 EAP 2: Live Templates for Django Forms and Models, Support for Polars DataFrames
The second Early Access Program build brings a bunch of features for both web developers and data scientists. Try new, time-saving live templates for Django forms, models, and views, as well as support for a super-fast Polars DataFrame library and initial GitLab integration.
You can get the latest build from our website, the free Toolbox App, or via snaps for Ubuntu.
If you want to catch up on the updates from the previous EAP build, you can refer to this blog post for more details.
UX Text search in Search EverywhereThe Search Everywhere (Double ⇧ / Double Shift) functionality, primarily utilized for searching through files, classes, methods, actions, and settings, now includes text search capabilities similar to Find in Files. With this enhancement, text search results are displayed when there are few or no other search results available for a given query. The feature is enabled by default and can be managed in Settings/Preferences | Advanced Settings | Search Everywhere.
Dedicated syntax highlighting for Python local variablesPyCharm 2023.2 will provide a dedicated syntax highlighting option for local variables. To use it, go to Settings | Editor | Color Scheme | Python and choose Local variables from the list of available options.
By default, the highlighting is set to inherit values from the Language Defaults identifiers. By unchecking this checkbox, you can choose the highlighting scheme that works best for you.
Syntax highlighting in inspection descriptionsIn Settings / Preferences | Editor | Inspections, we’ve implemented syntax highlighting for code samples, which facilitates comprehension of any given inspection and its purpose.
Support for Polars DataFramesPyCharm 2023.2 will allow you to work with a new, blazingly fast DataFrame library written in Rust – Polars.
In PyCharm, you can work with interactive Polars tables in Jupyter notebooks. In the Python console, you can inspect Polars DataFrames via the View as DataFrame option in the Special Variables list. Both Python and Jupyter debuggers work with Polars as well.
PyCharm will provide information about the type and dimensions of the tables, complete names and types of the columns, and allow you to use sorting for the tables.
Note that Polars DataFrames are not supported in Scientific mode.
Please try Polars support and share your feedback with us in the comments section, on Twitter, or in our issue tracker.
Web development New live templates for Django forms and modelsAs part of Django support, PyCharm has traditionally provided a list of live templates for Django template files. PyCharm 2023.2 will extend this functionality to Django forms, models, generic views, and admin. Live templates will let you insert common fields for Django views, forms, and models by typing short abbreviations.
You can find the new templates and settings for them in Settings | Editor | Live Templates | Django. To edit the existing templates or create a new one, refer to the PyCharm help page.
The list of live templates that can be used to quickly create Django tags in the template files has also been enlarged. You can find the updated list via Settings | Editor | Live Templates | Django Templates.
Frontend development Volar support for VueWe have some great news for those using Vue in PyCharm! We’ve implemented Volar support for Vue to support the changes in TypeScript 5.0. This should provide more accurate error detection, aligned with the Vue compiler. The new integration is still in early development and we would appreciate it if you could give it a try and provide us with any feedback you have.
To set the Vue service to use Volar integration on all TypeScript versions, go to Settings | Languages & Frameworks | TypeScript | Vue. By default, Volar will be used for TypeScript versions 5.0 and higher, and our own implementation will be used for TypeScript versions lower than 5.0.
In the future, we’ll consider enabling the Volar integration by default instead of our own implementation used for Vue and TypeScript.
CSS: Convert color to LCH and OKLCHIn PyCharm 2022.3, we added support for the new CSS color modification functions. This provided PyCharm users with a number of color conversion actions. For instance, you can change RGB to HSL, and vice versa. We are expanding this support in PyCharm 2023.2 to include conversion of LCH and OKLCH with other color functions.
Next.js custom documentation supportNext.js 13.1 now includes a plugin for the TypeScript Language Service specifically for the new app directory. This plugin offers suggestions for configuring pages and layouts, as well as helpful hints for using both Server and Client Components. It also comes with custom documentation, which adds extra information to the output of the TypeScript Language Service. It’s now possible to view this custom documentation in PyCharm.
VCS: GitLab integrationPyCharm 2023.2 EAP 2 introduces initial integration with GitLab, allowing you to work with the Merge Request functionality right from the IDE, streamlining your development workflow. To add your GitLab account go to Settings | Version Control | GitLab.
Notable bug fixesWe fixed the issue with debugging multiprocessing code on MacOS ARM that was caused by a missing dylib file. [PY-48163]
For PowerShell 7, venv is now activated correctly in the Terminal. [PY-58019]
These are the most notable updates for this week. To see the full list of changes in this EAP build, please refer to the release notes.
If you encounter any bugs while working with this build, please submit a report using our issue tracker. If you have any questions or feedback, let us know in the comments below or get in touch with our team on Twitter.
The Drop Times: Panel to Explore Empowering the Drupal Community | DrupalCon NA
Holger Levsen: 20230601-developers-reference-translations
I've just uploaded developers-reference 12.19, bringing the German translation status back to 100% complete, thanks to Carsten Schoenert. Some other translations however could use some updates:
$ make status for l in de fr it ja ru; do \ if [ -d source/locales/$l/LC_MESSAGES ] ; then \ echo -n "Stats for $l: " ; \ msgcat --use-first source/locales/$l/LC_MESSAGES/*.po | msgfmt --statistics - 2>&1 ; \ fi ; \ done Stats for de: 1374 translated messages. Stats for fr: 1286 translated messages, 39 fuzzy translations, 49 untranslated messages. Stats for it: 869 translated messages, 46 fuzzy translations, 459 untranslated messages. Stats for ja: 891 translated messages, 26 fuzzy translations, 457 untranslated messages. Stats for ru: 870 translated messages, 44 fuzzy translations, 460 untranslated messages.Russell Coker: Do Desktop Computers Make Sense?
Currently the smaller and cheaper USB-C docks start at about $25 and Dell has a new Vostro with 8G of RAM and 2*USB-C ports for $788. That gives a bit over $800 for a laptop and dock vs $795 for the cheapest Dell desktop which also has 8G of RAM. For every way of buying laptops and desktops (EG buying from Officeworks, buying on ebay, etc) the prices for laptops and desktops seem very similar. For all those comparisons the desktop will typically have a faster CPU and more options for PCIe cards, larger storage, etc. But if you don’t want to expand storage beyond the affordable 4TB NVMe/SSD devices, don’t need to add PCIe cards, and don’t need much CPU power then a laptop will do well. For the vast majority of the computer work I do my Thinkpad Carbon X1 Gen1 (from 2012) had plenty of CPU power.
If someone who’s not an expert in PC hardware was to buy a computer of a given age then laptops probably aren’t more expensive than desktops even disregarding the fact that a laptop works without the need to purchase a monitor, a keyboard, or a mouse. I can get regular desktop PCs for almost nothing and get parts to upgrade them very cheaply but most people can’t do that. I can also get a decent second-hand laptop and USB-C dock for well under $400.
Servers and Gaming SystemsFor people doing serious programming or other compute or IO intensive tasks some variation on the server theme is the best option. That may be something more like the servers used by the r/homelab people than the corporate servers, or it might be something in the cloud, but a server is a server. If you are going to have a home server that’s a tower PC then it makes sense to put a monitor on it and use it as a workstation. If your server makes so much noise that you can’t spend much time in the same room or if it’s hosted elsewhere then using a laptop to access it makes sense.
Desktop computers for PC gaming makes sense as no-one seems to be making laptops with moderately powerful GPUs. The most powerful GPUs draw 150W which is more than most laptop PSUs can supply and even if a laptop PSU could supply that much there would be the issue of cooling. The Steam Deck [1] and the Nintendo Switch [2] can both work with USB-C docks. The PlayStation 5 [3] has a 350W PSU and doesn’t support video over USB-C. The Steam Deck can do 8K resolution at 60Hz or 4K at 120Hz but presumably the newer Steam games will need a desktop PC with a more powerful GPU to properly use such resolutions.
For people who want the best FPS rates on graphics intensive games it could make sense to have a tower PC. Also a laptop that’s run at high CPU/GPU use for a long time will tend to have it’s vents clogged by dust and possibly have the cooling fan wear out.
Monitor ResolutionLaptop support for a single 4K monitor became common in 2012 with the release of the Ivy Bridge mobile CPUs from Intel in 2012. My own experience of setting up 4K monitors for a Linux desktop in 2019 was that it was unreasonably painful and that the soon to be released Debian/Bookworm will make things work nicely for 4K monitors with KDE on X11. So laptop hardware has handled the case of a single high resolution monitor since before such monitors were cheap or common and before software supported it well. Of course at that time you had to use either a proprietary dock or a mini-DisplayPort to HDMI adaptor to get 4K working. But that was still easier than getting PCIe video cards supporting 4K resolution which is something that according to spec sheets wasn’t well supported by affordable cards in 2017.
Since USB-C became a standard feature in laptops in about 2017 support of more monitors than most people would want through a USB-C dock became standard. My Thinkpad X1 Carbon Gen5 which was released in 2017 will support 2*FullHD monitors plus a 4K monitor via a USB-C dock, I suspect it would do at least 2*4K monitors but haven’t had a chance to test. Cheap USB-C docks supporting this sort of thing have only become common in the last year or so.
How Many Computers per HomeAmong middle class Australians it’s common to have multiple desktop PCs per household. One for each child who’s over the age of about 13 and one for the parents seems to be reasonably common. Students in the later years of high-school and university students are often compelled to have laptops so having the number of laptops plus the number of desktops be larger than the population of the house probably isn’t uncommon even among people who aren’t really into computers. As an aside it’s probably common among people who read my blog to have 2 desktops, a laptop, and a cloud server for their own personal use. But even among people who don’t do that sort of thing having computers outnumber people in a home is probably common.
A large portion of the computer users can do everything they need on a laptop. For gamers the graphics intensive games often run well on a console and that’s probably the most effective way of getting to playing the games. Of course the fact that there is “RGB RAM” (RAM with Red, Green, and Blue LEDs to light up) along with a lot of other wild products sold to gamers suggests that gaming PCs are not about what runs the game most effectively and that an art/craft project with the PC is more important than actually playing games.
Instead of having one desktop PC per bedroom and laptops for school/university as well it would make more sense to have a laptop per person and have a USB-C dock and monitor in each bedroom and a USB-C dock connected to a large screen TV in the lounge. This gives plenty of flexibility for moving around to do work and sharing what’s on your computer with other people. It also allows taking a work computer home and having work with your monitor, having a friend bring their laptop to your home to work on something together, etc.
For most people desktop computers don’t make sense. While I think that convergence of phones with laptops and desktops is the way of the future [4] for most people having laptops take over all functions of desktops is the best option today.
- [1] https://en.wikipedia.org/wiki/Steam_Deck
- [2] https://en.wikipedia.org/wiki/Nintendo_Switch
- [3] https://en.wikipedia.org/wiki/PlayStation_5
- [4] https://etbe.coker.com.au/2023/05/29/considering-convergence/
Related posts:
- Seatbelts and Transporting Computers I’ve just read an interesting post at Making Light about...
- Linux on the Desktop I started using Linux in 1993. I initially used it...
- Desktop Equivalent Augmented Reality Augmented reality is available on all relatively modern smart phones....
Jamie McClelland: Enough about the AI Apocalypse Already
After watching Democracy Now’s segment on artificial intelligence I started to wonder - am I out of step on this topic?
When people claim artificial intelligence will surpass human intelligence and thus threaten humanity with extinction, they seem to be referring specifically to advances made with large language models.
As I understand them, large language models are probability machines that have ingested massive amounts of text scraped from the Internet. They answer questions based on the probability of one series of words (their answer) following another series of words (the question).
It seems like a stretch to call this intelligence, but if we accept that definition then it follows that this kind of intelligence is nothing remotely like human intelligence, which makes the claim that it will surpass human intelligence confusing. Hasn’t this kind of machine learning surpassed us decades ago?
Or when we say “surpass” does that simply refer to fooling people into thinking an AI machine is a human via conversation? That is an important milestone, but I’m not ready to accept the turing test as proof of equal intelligence.
Furthermore, large language models “hallucinate” and also reflect the biases of their training data. The word “hallucinate” seems like a euphemism, as if it could be corrected with the right medication when in fact it seems hard to avoid when your strategy is to correlate words based on probability. But even if you could solve the “here is a completely wrong answer presented with sociopathic confidence” problem, reflecting the biases of your data sources seems fairly intractable. In what world would a system with built-in bias be considered on the brink of surpassing human intelligence?
The danger from LLMs seems to be their ability to convince people that their answers are correct, including their patently wrong and/or biased answers.
Why do people think they are giving correct answers? Oh right… terrifying right wing billionaires (with terrifying agendas have been claiming AI will exceed human intelligence and threaten humanity and every time they sign a hyperbolic statement they get front page mainstream coverage. And even progressive news outlets are spreading this narrative with minimal space for contrary opinions (thank you Tawana Petty from the Algorithmic Justice League for providing the only glimpse of reason in the segment).
The belief that artificial intelligence is or will soon become omnipotent has real world harms today: specifically it creates the misperception that current LLMs are accurate, which paves the way for greater adoption among police forces, social service agencies, medical facilities and other places where racial and economic biases have life and death consequences.
When the CEO of OpenAI calls the technology dangerous and in need of regulation, he gets both free advertising promoting the power and supposed accuracy of his product and the possibility of freezing further developments in the field that might challenge OpenAI’s current dominance.
The real threat to humanity is not AI, it’s massive inequality and the use of tactics ranging from mundane bureaucracy to deadly force and incarceration to segregate the affluent from the growing number of people unable to make ends meet. We have spent decades training bureaucrats, judges and cops to robotically follow biased laws to maintain this order without compassion or empathy. Replacing them with AI would be make things worse and should be stopped. But, let’s be clear, the narrative that AI is poised to surpass human intelligence and make humanity extinct is a dangerous distraction that runs counter to a much more important story about “the very real and very present exploitative practices of the [companies building AI], who are rapidly centralizing power and increasing social inequities.”.
Maybe we should talk about that instead?
Stack Abuse: Simple NLP in Python with TextBlob: Lemmatization
TextBlob is a package built on top of two other packages, one of them is called Natural Language Toolkit, known mainly in its abbreviated form as NLTK, and the other is Pattern. NLTK is a traditional package used for text processing or Natural Language Processing (NLP), and Pattern is built mainly for web mining.
TextBlob is designed to be easier to learn and manipulate than NLTK, while maintaining the same important NLP tasks such as lemmatization, sentiment analysis, stemming, POS-tagging, noun phrase extraction, classification, translation, and more. You can see a complete list of tasks on the PyPI's TextBlob page.
If you are looking for a practical overview of many NLP tasks that can be executed with TextBlob, take a look at our "Python for NLP: Introduction to the TextBlob Library" guide.
There are no special technical prerequisites needed for employing TextBlob. For instance, the package is applicable for both Python 2 and 3 (Python >= 2.7 or >= 3.5).
Also, in case you don't have any textual information at hand, TextBlob provides the necessary collections of language data (usually texts), called corpora, from the NLTK database.
Installing TextBlobLet's start by installing TextBlob. If you are using a terminal, command-line, or command prompt, you can enter:
$ pip install textblobOtherwise, if you are using a Jupyter Notebook, you can execute the command directly from the notebook by adding an exclamation mark ! at the beginning of the instruction:
!pip install textblobNote: This process can take some time due to the broad number of algorithms and corpora that this library contains.
After installing TextBlob, in order to have text examples, you can download the corpora by executing the python -m textblob.download_corpora command. Once again, you can execute it directly in the command line or in a notebook by preceding it with an exclamation mark.
When running the command, you should see the output below:
$ python -m textblob.download_corpora [nltk_data] Downloading package brown to /Users/csamp/nltk_data... [nltk_data] Package brown is already up-to-date! [nltk_data] Downloading package punkt to /Users/csamp/nltk_data... [nltk_data] Package punkt is already up-to-date! [nltk_data] Downloading package wordnet to /Users/csamp/nltk_data... [nltk_data] Package wordnet is already up-to-date! [nltk_data] Downloading package averaged_perceptron_tagger to [nltk_data] /Users/csamp/nltk_data... [nltk_data] Package averaged_perceptron_tagger is already up-to- [nltk_data] date! [nltk_data] Downloading package conll2000 to /Users/csamp/nltk_data... [nltk_data] Unzipping corpora/conll2000.zip. [nltk_data] Downloading package movie_reviews to [nltk_data] /Users/csamp/nltk_data... [nltk_data] Unzipping corpora/movie_reviews.zip. Finished.We have already installed the TextBlob package and its corpora. Now, let's understand more about lemmatization.
For more TextBlob content, check out our Simple NLP in Python with TextBlob: Tokenization, Simple NLP in Python with TextBlob: N-Grams Detection, and Sentiment Analysis in Python with TextBlob guides.
What is Lemmatization?Before going deeper into the field of NLP, you should be able to recognize some key terms:
Corpus (or corpora in plural) - is a specific collection of language data (e.g., texts). Corpora are typically used for training various models of text classification or sentiment analysis, for instance.
Lemma - is the word you would look for in a dictionary. For instance, if you want to look at the definition for the verb "runs", you would search for "run".
Stem - is a part of a word that never changes.
What is lemmatization itself?
Lemmatization is the process of obtaining the lemmas of words from a corpus.
An illustration of this could be the following sentence:
- Input (corpus): Alice thinks she is lost, but then starts to find herself
- Output (lemmas): | Alice | think | she | is | lost | but | then | start | to | find | herself |
Notice that each word in the input sentence is lemmatized according to its context in the original sentence. For instance, "Alice" is a proper noun, so it stays the same, and the verbs "thinks" and "starts" are referenced in their base forms of "think" and "start".
Lemmatization is one of the basic stages of language processing. It brings words to their root forms or lemmas, which we would find if we were looking for them in a dictionary.
In the case of TextBlob, lemmatization is based on a database called WordNet, which is developed and maintained by Princeton University. Behind the scenes, TextBlob uses WordNet's morphy processor to obtain the lemma for a word.
Note: For further reference on how lemmatization works in TextBlob, you can take a peek at the documentation.
You probably won't notice significant changes with lemmatization unless you're working with large amounts of text. In that case, lemmatization helps reduce the size of words we might be searching for while trying to preserve their context in the sentence. It can be applied further in developing models of machine translation, search engine optimization, or various business inquiries.
Implementing Lemmatization in CodeFirst of all, it's necessary to establish a TextBlob object and define a sample corpus that will be lemmatized later. In this initial step, you can either write or define a string of text to use (as in this guide), or we can use an example from the NLTK corpus we have downloaded. Let's go with the latter.
Choosing a Review from the NLTK CorpusFor example, let's try to obtain the lemmas for a movie review that is in the corpus. To do this, we import both the TextBlob library and the movie_reviews from the nltk.corpus package:
# importing necessary libraries from textblob import TextBlob from nltk.corpus import movie_reviewsAfter importing, we can take a look at the movie reviews files with the fileids() method. Since this code is running in a Jupyter Notebook, we can directly execute:
movie_reviews.fileids()This will return a list of 2,000 text file names containing negative and positive reviews:
['neg/cv000_29416.txt', 'neg/cv001_19502.txt', 'neg/cv002_17424.txt', 'neg/cv003_12683.txt', 'neg/cv004_12641.txt', 'neg/cv005_29357.txt', 'neg/cv006_17022.txt', 'neg/cv007_4992.txt', 'neg/cv008_29326.txt', 'neg/cv009_29417.txt', ...]Note: If you are running the code in another way, for instance, in a terminal or IDE, you can print the response by executing print(movie_reviews.fileids()).
By looking at the neg in the name of the file, we can assume that the list starts with the negative reviews and ends with the positive ones. We can look at a positive review by indexing from the end of the list. Here, we are choosing the 1,989th review:
movie_reviews.fileids()[-10]This results in:
'pos/cv990_11591.txt'To examine the review sentences, we can pass the name of the review to the .sents() method, which outputs a list of all review sentences:
movie_reviews.sents('pos/cv990_11591.txt') [['the', 'relaxed', 'dude', 'rides', 'a', 'roller', 'coaster', 'the', 'big', 'lebowski', 'a', 'film', 'review', 'by', 'michael', 'redman', 'copyright', '1998', 'by', 'michael', 'redman', 'the', 'most', 'surreal', 'situations', 'are', 'ordinary', 'everyday', 'life', 'as', 'viewed', 'by', 'an', 'outsider', '.'], ['when', 'those', 'observers', 'are', 'joel', 'and', 'ethan', 'coen', ',', 'the', 'surreal', 'becomes', 'bizarre', '.'], ...]Let's store this list in a variable called pos_review:
pos_review = movie_reviews.sents("pos/cv990_11591.txt") len(pos_review) #returns 63Here, we can see that there are 63 sentences. Now, we can select one sentence to lemmatize, for instance, the 15th sentence:
sentence = pos_review[16] type(sentence) # returns list Creating a TextBlob ObjectAfter selecting the sentence, we need to create a TextBlob object to be able to access the .lemmatize() method. TextBlob objects need to be created from strings. Since we have a list, we can convert it to a string with the string.join() method, joining based on blank spaces:
sentence_string = ' '.join(sentence)Now that we have our sentence string, we can pass it to the TextBlob constructor:
blob_object = TextBlob(sentence_string)Once we have the TextBlob object, we can perform various operations, such as lemmatization.
Lemmatization of a SentenceFinally, to get the lemmatized words, we simply retrieve the words attribute of the created blob_object. This gives us a list containing Word objects that behave very similarly to string objects:
# Word tokenization of the sentence corpus corpus_words = blob_object.words # To see all tokens print('sentence:', corpus_words) # To count the number of tokens number_of_tokens = len(corpus_words) print('\nnumber of tokens:', number_of_tokens)The output commands should give you the following:
sentence: ['the', 'carpet', 'is', 'important', 'to', 'him', 'because', 'it', 'pulls', 'the', 'room', 'together', 'not', 'surprisingly', 'since', 'it', 's', 'virtually', 'the', 'only', 'object', 'there'] number of tokens: 22To lemmatize the words, we can just use the .lemmatize() method:
corpus_words.lemmatize()This gives us a lemmatized WordList object:
WordList(['the', 'carpet', 'is', 'important', 'to', 'him', 'because', 'it', 'pull', 'the', 'room', 'together', 'not', 'surprisingly', 'since', 'it', 's', 'virtually', 'the', 'only', 'object', 'there'])Since this might be a little difficult to read, we can do a loop and print each word before and after lemmatization:
for word in corpus_words: print(f'{word} | {word.lemmatize()}')This results in:
the | the carpet | carpet is | is important | important to | to him | him because | because it | it pulls | pull the | the room | room together | together not | not surprisingly | surprisingly since | since it | it s | s virtually | virtually the | the only | only object | object there | thereNotice how "pulls" changed to "pull"; the other words, besides "it's," were also lemmatized as expected. We can also see that "it's" has been separated due to the apostrophe. This indicates we can further pre-process the sentence so that "it's" is considered a word instead of "it" and an "s".
Difference Between Lemmatization and StemmingLemmatization is often confused with another technique called stemming. This confusion occurs because both techniques are usually employed to reduce words. While lemmatization uses dictionaries and focuses on the context of words in a sentence, attempting to preserve it, stemming uses rules to remove word affixes, focusing on obtaining the stem of a word.
Let's quickly modify our for loop to look at these differences:
print('word | lemma | stem\n') for word in corpus_words: print(f'{word} | {word.lemmatize()} | {word.stem()}')This outputs:
the | the | the carpet | carpet | carpet is | is | is important | important | import to | to | to him | him | him because | because | becaus it | it | it pulls | pull | pull the | the | the room | room | room together | together | togeth not | not | not surprisingly | surprisingly | surprisingli since | since | sinc it | it | it s | s | s virtually | virtually | virtual the | the | the only | only | onli object | object | object there | there | thereWhen looking at the above output, we can see how stemming can be problematic. It reduces "important" to "import", losing all the meaning of the word, which can even be considered a verb now; "because" to "becaus", which is a word that doesn't exist, same for "togeth", "surprisingli", "sinc", "onli".
There are clear differences between lemmatization and stemming. Understanding when to utilize each technique is the key. Suppose you are optimizing a word search and the focus is on being able to suggest the maximum amount of similar words, which technique would you use? When word context doesn't matter, and we could retrieve "important" with "import", the clear choice is stemming. On the other hand, if you are working on document text comparison, in which the position of the words in a sentence matters, and the context "importance" needs to be maintained and not confused with the verb "import", the best choice is lemmatization.
In the last scenario, suppose you are working on a word search followed by a retrieved document text comparison, what will you use? Both stemming and lemmatization.
We have understood the differences between stemming and lemmatization; now let's see how we can lemmatize the whole review instead of just a sentence.
Lemmatization of a ReviewTo lemmatize the entire review, we only need to modify the .join(). Instead of joining words in a sentence, we will join sentences in a review:
# joining each sentence with a new line between them, and a space between each word corpus_words = '\n'.join(' '.join(sentence) for sentence in pos_review)After transforming the corpus into a string, we can proceed in the same way as it was for the sentence to lemmatize it:
blob_object = TextBlob(pos_rev) corpus_words = blob_object.words corpus_words.lemmatize()This generates a WordList object with the full review text lemmatized. Here, we are omitting some parts with an ellipsis (...) since the review is large, but you will be able to see it in its integral form. We can spot our sentence in the middle of it:
WordList(['the', 'relaxed', 'dude', 'rides', 'a', 'roller', 'coaster', 'the', 'big', 'lebowski', 'a', 'film', 'review', 'by', 'michael', 'redman', 'copyright', '1998', 'by', 'michael', 'redman', 'the', 'most', 'surreal', 'situations', 'are', 'ordinary', 'everyday', 'life', 'as', 'viewed', 'by', 'an', 'outsider', 'when', 'those', 'observers', 'are', 'joel', (...) 'the', 'carpet', 'is', 'important', 'to', 'him', 'because', 'it', 'pulls', 'the', 'room', 'together', 'not', 'surprisingly', 'since', 'it', 's', 'virtually', 'the', 'only', 'object', 'there' (...) 'com', 'is', 'the', 'eaddress', 'for', 'estuff']) ConclusionAfter lemmatizing the sentence and the review, we can see that both extract the corpus words first. This means lemmatization occurs at a word level, which also implies that it can be applied to a word, a sentence, or a full text. It works for a word or any collection of words.
This also suggests that it might be slower since it is necessary to break the text first into tokens to later apply it. And since lemmatization is context-specific, as we have seen, it is also crucial to have a good pre-processing of the text before using it, ensuring the correct breakdown into tokens and the appropriate part of speech tagging. Both will enhance results.
If you are not familiar with Part of Speech tagging (POS-tagging), check our Python for NLP: Parts of Speech Tagging and Named Entity Recognition guide.
We have also seen how lemmatization is different from stemming, another technique for reducing words that doesn't preserve their context. For this reason, it is usually faster.
There are many ways to perform lemmatization, and TextBlob is a great library for getting started with NLP. It offers a simple API that allows users to quickly begin working on NLP tasks. Leave a comment if you have used lemmatization in a project or plan to use it.
Happy coding!