Feeds
drunomics: Was mir das DrupalCamp Berlin über “Volunteering” beigebracht hat
Drupal Association blog: Governance in the Drupal Ecosystem
The Summary
To ensure Drupal’s stability and independence, the project is managed through a well-established, transparent governance system. Dries Buytaert, the Founder and Project Lead, helped design a model that distributes power and prevents any single person or entity — even himself — from making unilateral decisions that could alter the project unexpectedly. The independent Drupal Association oversees Drupal.org and other key infrastructure, free from commercial pressures. This approach ensures that Drupal.org is reliable and creates a fair playing field for all contributors, embodying true open-source leadership.
Just as the Drupal software has grown and changed significantly over its 23-year history, so has its governance. And, while there’s always room for improvement, it is safe to say that Drupal’s seasoned governance is what allows it to be one of the largest, independent open source projects in the world.
The Detail
Dries Buytaert, as the founder and project lead, ultimately guides the direction of Drupal, and is responsible for shaping the project’s philosophy and core principles.
While Dries started Drupal on his own, he has helped evolve the governance model over the years to be mature and resilient. To help govern the project's technical aspects, Dries established the core committer team and other supporting groups. To oversee non-technical areas, he co-founded the Drupal Association. These initiatives were intentional efforts to scale and strengthen Drupal’s governance.
On the technical side, the governance model for Drupal core is very mature, as described in the Drupal Project Governance. Technical decision-making is distributed among the core committers and other maintainers, promoting a transparent, structured, and collaborative approach to managing Drupal core.
Many other aspects of Drupal governance are managed by the Drupal Association, which is a U.S. 501(c)3 nonprofit organization formed in 2008 to support the Drupal project and the Drupal community. I am currently the Chief Executive Officer of the Association. Our mission is to drive innovation and adoption of Drupal as a high-impact digital public good, hand-in-hand with our open source community. A fundamental obligation of the Drupal Association is to ensure that Drupal is available to anyone, anywhere in the world free of charge. We primarily accomplish this task through Drupal.org.
The Drupal Association is a bona fide non-profit organization (not a pass-through), with assets of just over $3 million and an operating budget of over $4 million. We publish our finances annually (see: Find the reports in the Accountability section of D.org). The Association is not controlled or funded by any single entity nor does it pass revenues onto another entity. The Association’s revenue comes from hundreds of organizations and thousands of individuals. No single financial contributor accounts for more than 10% of our revenue. This diverse support base prevents any one entity from having too much influence.
The Drupal Association employs a full-time team of 19 professionals located throughout the world. These people include engineers, marketers, accountants, communication staff, and program administration team members. I say all this to demonstrate that we have the capacity to legitimately, and independently, carry out our mission.
The Drupal Association owns and controls important components of the Drupal ecosystem that allow Drupal to be one of the largest independent FOSS projects in the world.
The Drupal Association owns and/or controls the infrastructure that powers Drupal.org. The Drupal Association has complete control over who accesses Drupal.org, how they access it, and what they can do when accessing it. These are covered by our Terms of Service.
In administering Drupal.org, the Drupal Association controls a number of services, including:
- The database of Drupal.org users/project contributors
- A self-hosted GitLab instance that includes all of the Drupal code repositories for core and contrib, testing with GitLab CI and documentation through GitLab Pages
- Drupal software packaging (the actual .zip and .tar.gz files containing Drupal code)
- Drupal Updates (the Updates.xml feed, Automatic Updates endpoint, Secure Signing server, and Packages.Drupal.org- the composer endpoint for Drupal projects).
- The Drupal namespace on GitHub
- The Drupal namespace on Packagist
- The Drupal namespace on NPM
- The Drupal Infrastructure namespace on gitlab.com (separate from our self-hosted instance)
- The contribution credit system
- Usage data about Drupal core and extensions
The Drupal Association also owns and controls the primary means by which the community communicates and gathers. We organize DrupalCons and manage Drupal Slack. We issue The Drupal Association Newsletter and TheWeeklyDrop (together with Bob Kepford). We control and manage Mastodon, X/Twitter, YouTube, LinkedIn (Drupal, Drupal Association, Drupal Jobs), Facebook, Instagram, and TikTok.
Drupal has the Maker/Taker Problem that nearly all open source projects face. There are companies that profit off Drupal who don’t give back to help maintain the project. The Drupal Association has chosen to address this issue by restructuring our Drupal Certified Partner program to focus exclusively on those companies that give back to the community. The goal is to incentivize the creation of a culture of contribution within companies that work in Drupal that provide the Drupal Project with sufficient resources to innovate and grow. There is always work to be done in creating a more equitable program, but it is beginning to work as we have more than doubled the number of Drupal Certified Partners in the past 15 months.
The Drupal Association is governed by a 12-person Board of Directors that meets several times a year, including two public meetings at DrupalCons. Nine directors are selected by a Nominating Committee of the board and two directors are elected by members of the Drupal Association. The final seat is the “Founding Director”. This is a voting seat that can only be filled by Dries Buytaert. Like all board seats, this is an unpaid, voluntary role that carries with it a single vote on the board. It has to be approved annually by the Board of Directors. Except for the trademark licensing, the Drupal Association has no contracts or agreements with Dries Buytaert or the Drupal Project, and Dries receives no funding from the Drupal Association or its operation of Drupal.org.
Dries Buytaert owns the trademark “Drupal”. He has transparently communicated the Drupal Trademark and Logo Policy by which these are governed. Under the policy, any changes to the policy go into effect sixty (60) days after publication. Dries Buytaert also owns the domain names “drupal.org”, “drupal.com” and “drupalcon.org”.
Dries has granted the Drupal Association an exclusive license to use “Drupal”, “Drupal.org”, and “DrupalCon” and a non-exclusive license to use Drupal for non-commercial uses. This license allows the Drupal Association to support the Drupal Project by providing the infrastructure to host and maintain the official version of Drupal and to organize its contributors. It also allows the Association to support the Drupal Community in their work with Drupal.
The net effect of this arrangement is that Dries Buytaert retains ultimate control over what software can be named “Drupal” and what website can be named “Drupal.org.” He can thus ensure that any software that calls itself “Drupal” or website that uses “Drupal.org” conforms with his vision. This would likely cause the Drupal Association to fork the software and maintain it under a new name and url. The high cost of such an action to both parties makes this option highly unlikely and unable to execute quickly.
What the trademark does not allow him to do is to block any person or organization from using any component of Drupal core or any modules housed on Drupal.org. Those decisions are the sole discretion of the Drupal Association. To date, we have exercised this authority in a very limited manner to protect and safeguard the website and its content from attacks and misuse.
Twenty-three years ago, Dries chose to release Drupal under an open-source license, inspiring tens of thousands to build careers and champion an Open Web. However, fulfilling this vision required more than just a General Public License. By creating the Drupal Association, setting up Drupal core's governance, and licensing the trademark, Dries ensured Drupal remained open-source without commercial entanglements, securing a strong, independent foundation.
Along with Dries Buytaert and many contributors, the Drupal Association is focused on the future of Drupal (see: Starshot Initiative). How can we support its adoption through marketing and create sustainable revenue streams for Drupal to flourish? These are tough questions that confront many open source projects. Our governance allows us to move forward in this work with great certainty.
Bojan Mihelac: Building docker images with private python packages
Bojan Mihelac: Django app name translation in admin
Bojan Mihelac: Django-sites-ext
Bojan Mihelac: Django-simpleadmindoc updated
Bojan Mihelac: Rename uploaded files to ASCII charset in Django
Real Python: Python Dictionary Comprehensions: How and When to Use Them
Dictionary comprehensions are a concise and quick way to create, transform, and filter dictionaries in Python. They can significantly enhance your code’s conciseness and readability compared to using regular for loops to process your dictionaries.
Understanding dictionary comprehensions is crucial for you as a Python developer because they’re a Pythonic tool for dictionary manipulation and can be a valuable addition to your programming toolkit.
In this tutorial, you’ll learn how to:
- Create dictionaries using dictionary comprehensions
- Transform existing dictionaries with comprehensions
- Filter key-value pairs from dictionaries using conditionals
- Decide when to use dictionary comprehensions
To get the most out of this tutorial, you should be familiar with basic Python concepts, such as for loops, iterables, and dictionaries, as well as list comprehensions.
Get Your Code: Click here to download the free sample code that you’ll use to learn about dictionary comprehensions in Python.
Creating and Transforming Dictionaries in PythonIn Python programming, you’ll often need to create, populate, and transform dictionaries. To do this, you can use dictionary literals, the dict() constructor, and for loops. In the following sections, you’ll take a quick look at how to use these tools. You’ll also learn about dictionary comprehensions, which are a powerful way to manipulate dictionaries in Python.
Creating Dictionaries With Literals and dict()To create new dictionaries, you can use literals. A dictionary literal is a series of key-value pairs enclosed in curly braces. The syntax of a dictionary literal is shown below:
Python Syntax {key_1: value_1, key_2: value_2,..., key_N: value_N} Copied!The keys must be hashable objects and are commonly strings. The values can be any Python object, including other dictionaries. Here’s a quick example of a dictionary:
Python >>> likes = {"color": "blue", "fruit": "apple", "pet": "dog"} >>> likes {'color': 'blue', 'fruit': 'apple', 'pet': 'dog'} >>> likes["hobby"] = "guitar" >>> likes {'color': 'blue', 'fruit': 'apple', 'pet': 'dog', 'hobby': 'guitar'} Copied!In this example, you create dictionary key-value pairs that describe things people often like. The keys and values of your dictionary are string objects. You can add new pairs to the dictionary using the dict[key] = value syntax.
Note: To learn more about dictionaries, check out the Dictionaries in Python tutorial.
You can also create new dictionaries using the dict() constructor:
Python >>> dict(apple=0.40, orange=0.35, banana=0.25) {'apple': 0.4, 'orange': 0.35, 'banana': 0.25} Copied!In this example, you create a new dictionary using dict() with keyword arguments. In this case, the keys are strings and the values are floating-point numbers. It’s important to note that the dict() constructor is only suitable for those cases where the dictionary keys can be strings that are valid Python identifiers.
Using for Loops to Populate DictionariesSometimes, you need to start with an empty dictionary and populate it with key-value pairs dynamically. To do this, you can use a for loop. For example, say that you want to create a dictionary in which keys are integer numbers and values are powers of 2.
Here’s how you can do this with a for loop:
Python >>> powers_of_two = {} >>> for integer in range(1, 10): ... powers_of_two[integer] = 2**integer ... >>> powers_of_two {1: 2, 2: 4, 3: 8, 4: 16, 5: 32, 6: 64, 7: 128, 8: 256, 9: 512} Copied!In this example, you create an empty dictionary using an empty pair of curly braces. Then, you run a loop over a range of integer numbers from 1 to 9. Inside the loop, you populate the dictionary with the integer numbers as keys and powers of two as values.
The loop in this example is readable and clear. However, you can also use dictionary comprehension to create and populate a dictionary like the one shown above.
Introducing Dictionary Comprehensions Read the full article at https://realpython.com/python-dictionary-comprehension/ »[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
Mike Driscoll: ANN – The textual-cogs Package – Creating Reusable Dialogs for Textual
Textual-cogs is a collection of Textual dialogs that you can use in your Textual application. You can see a quick demo of the dialogs below:
Dialogs included so far:
- Generic MessageDialog – shows messages to the user
- SaveFileDialog – gives the user a way to select a location to save a file
- SingleChoiceDialog – gives the user a series of choices to pick from
- TextEntryDialog – ask the user a question and get their answer using an Input widget
- and more
You can check out textual-cogs on GitHub.
Installation
You can install textual-cog using pip:
python -m pip install textual-cogYou also need Textual to run these dialogs.
Example Usage
Here is an example of creating a small application that opens the MessageDialog immediately. You would normally open the dialog in response to a message or event that has occurred, such as when the application has an error or you need to tell the user something.
from textual.app import App from textual.app import App, ComposeResult from textual_cogs.dialogs import MessageDialog from textual_cogs import icons class DialogApp(App): def on_mount(self) -> ComposeResult: def my_callback(value: None | bool) -> None: self.exit() self.push_screen( MessageDialog( "What is your favorite language?", icon=icons.ICON_QUESTION, title="Warning", ), my_callback, ) if __name__ == "__main__": app = DialogApp() app.run()When you run this code, you will get something like the following:
Creating a SaveFileDialog
The following code demonstrates how to create a SaveFileDialog:
from textual.app import App from textual.app import App, ComposeResult from textual_cogs.dialogs import SaveFileDialog class DialogApp(App): def on_mount(self) -> ComposeResult: self.push_screen(SaveFileDialog()) if __name__ == "__main__": app = DialogApp() app.run()When you run this code, you will see the following:
Wrapping UpThe textual-cogs package is currently only a collection of reusable dialogs for your Textual application. However, this can help speed up your ability to add code to your TUI applications because the dialogs are taken care of for you.
Check it out on GitHub or the Python Package Index today.
The post ANN – The textual-cogs Package – Creating Reusable Dialogs for Textual appeared first on Mouse Vs Python.
qtatech.com blog: Managing Multilingual Content in Drupal 10 Multisites
In an increasingly globalized world, businesses are turning to multilingual solutions to reach an international audience. Drupal 10 offers a powerful multisite architecture that allows you to manage multiple sites from a single installation, ideal for organizations with a global reach.
Droptica: 5 Problems You May Encounter When Integrating Drupal with Third-Party Software
Integrating Drupal with other systems is a common part of creating or developing a website or web application. Although Drupal offers many tools to facilitate this process, encountering minor or major difficulties is simply inevitable. Based on our knowledge from several hundred projects for clients, we’ve compiled a list of the common problems. It’s worth familiarizing yourself with them to effectively avoid them and speed up the implementation of integration projects.
Real Python: Quiz: Basic Input and Output in Python
In this quiz, you’ll test your understanding of how to use Python’s built-in functions input() and print() for basic input and output operations.
You’ll also revisit how to use readline to improve the user experience when collecting input, and how to format output using the sep and end keyword arguments of print().
[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
Russell Coker: Modern Sleep
Julius wrote an insightful blog post about the “modern sleep” issue with Windows [1]. Basically Microsoft decided that the right way to run laptops is to never entirely sleep, which uses more battery but gives better options for waking up and doing things. I agree with Microsoft in concept and this is something that is a problem that can be solved. A phone can run for 24+ hours without ever fully sleeping, a laptop has a more power hungry CPU and peripherals but also has a much larger battery so it should be able to do the same. Some of the reviews for Snapdragon Windows laptops claim up to 22 hours of actual work without charging! So having suspend not really stop the system should be fine.
The ability of a phone to never fully sleep is a change in quality of the usage experience, it means that you can access it and immediately have it respond and it means that all manner of services can be checked for new updates which may require a notification to the user. The XMPP protocol (AKA Jabber) was invented in 1999 which was before laptops were common and Instant Message systems were common long before then. But using Jabber or another IM system on a desktop was a very different experience to using it on a laptop and using it on a phone is different again. The “modern sleep” allows laptops to act like phones in regard to such messaging services. Currently I have Matrix IM clients running on my Android phone and Linux laptop, if I get a notification that takes much typing for a response then I get out my laptop to respond. If I had an ARM based laptop that never fully shut down I would have much less need for Matrix on a phone.
Making “modern sleep” popular will lead to more development of OS software to work with it. For Linux this will hopefully mean that regular Linux distributions (as opposed to Android which while running a Linux kernel is very different to Debian etc) get better support for such things and therefore become more usable on phones. Debian on a Librem 5 or PinePhonePro isn’t very usable due to battery life issues.
A laptop with an LTE card can be used for full mobile phone functionality. With “modern sleep” this is a viable option. I am tempted to make a laptop with LTE card and bluetooth headset a replacement for my phone. Some people will say “what if someone tries to call you when it’s not convenient to have your laptop with you”, my response is “what if people learn to not expect me to answer the phone at any time as they managed that in the 90s”. Seriously SMS or Matrix me if you want an instant response and if you want a long chat schedule it via SMS or Matrix.
Dell has some useful advice about how to use their laptops (and probably most laptops from recent times) in this regard [2]. You can’t close the lid before unplugging the power cable you have to unplug first and then close. You shouldn’t put a laptop in a sealed bag for travel either. This is a terrible situation, you can put a tablet in a bag and don’t need to take any special precautions when unplugging and laptops should work the same. The end result of what Microsoft, Dell, Intel, and others are doing will be good but they are making some silly design choices along the way! I blame Intel mostly for selling laptop CPUs with TDPs >40W!
For an amusing take on this Linus Tech Tips has a video about being forced to use MacBooks by Microsoft’s implementation of Modern Sleep [3].
I’ll try out some ARM laptops in the near future and blog about how well they work on Debian.
- [1] https://blog.jeujeus.de/blog/hardware/laptops-will-not-sleep-anymore/
- [2] https://tinyurl.com/2b7mxaj7
- [3] https://www.youtube.com/watch?v=OHKKcd3sx2c
Related posts:
- Modern Laptops Suck One of the reasons why I’m moving from a laptop...
- Cooling a Thinkpad Late last year I wrote about the way that modern...
- Cheap Laptops for Children I was recently browsing an electronics store and noticed some...
Setting C++ Defines with CMake
When building C++ code with CMake, it is very common to want to set some pre-processor defines in the CMake code.
For instance, we might want to set the project’s version number in a single place, in CMake code like this:
project(MyApp VERSION 1.5)This sets the CMake variable PROJECT_VERSION to 1.5, which we can then use to pass -DMYAPP_VERSION_STRING=1.5 to the C++ compiler. The about dialog of the application can then use this to show the application version number, like this:
const QString aboutString = QStringLiteral("My App version: %1").arg(MYAPP_VERSION_STRING); QMessageBox::information(this, "My App", aboutString);Similarly, we might have a boolean CMake option like START_MAXIMIZED, which the user compiling the software can set to ON or OFF:
option(START_MAXIMIZED "Show the mainwindow maximized" OFF)If it’s ON, you would pass -DSTART_MAXIMIZED, otherwise nothing. The C++ code will then use #ifdef. (We’ll see that there’s a better way.)
#ifdef START_MAXIMIZED w.showMaximized(); #else w.show(); #endif The common (but suboptimal) solutionA solution that many people use for this is the CMake function add_definitions. It would look like this:
add_definitions(-DMYAPP_VERSION_STRING="${PROJECT_VERSION}") if (START_MAXIMIZED) add_definitions(-DSTART_MAXIMIZED) endif()Technically, this works but there are a number of issues.
First, add_definitions is deprecated since CMake 3.12 and add_compile_definitions should be used instead, which allows to remove the leading -D.
More importantly, there’s a major downside to this approach: changing the project version or the value of the boolean option will force CMake to rebuild every single .cpp file used in targets defined below these lines (including in subdirectories). This is because add_definitions and add_compile_definitions ask to pass -D to all cpp files, instead of only those that need it. CMake doesn’t know which ones need it, so it has to rebuild everything. On large real-world projects, this could take something like one hour, which is a major waste of time.
A first improvement we can do is to at least set the defines to all files in a single target (executable or library) instead of “all targets defined from now on”. This can be done like this:
target_compile_definitions(myapp PRIVATE MYAPP_VERSION_STRING="${PROJECT_VERSION}") if(START_MAXIMIZED) target_compile_definitions(myapp PRIVATE START_MAXIMIZED) endif()We have narrowed the rebuilding effect a little bit, but are still rebuilding all cpp files in myapp, which could still take a long time.
The recommended solutionThere is a proper way to do this, such that only the files that use these defines will be rebuilt; we simply have to ask CMake to generate a header with #define in it and include that header in the few cpp files that need it. Then, only those will be rebuilt when the generated header changes. This is very easy to do:
configure_file(myapp_config.h.in myapp_config.h)We have to write the input file, myapp_config.h.in, and CMake will generate the output file, myapp_config.h, after expanding the values of CMake variables. Our input file would look like this:
#define MYAPP_VERSION_STRING "${PROJECT_VERSION}" #cmakedefine01 START_MAXIMIZEDA good thing about generated headers is that you can read them if you want to make sure they contain the right settings. For instance, myapp_config.h in your build directory might look like this:
#define MYAPP_VERSION_STRING "1.5" #define START_MAXIMIZED 1For larger use cases, we can even make this more modular by moving the version number to another input file, say myapp_version.h.in, so that upgrading the version doesn’t rebuild the file with the showMaximized() code and changing the boolean option doesn’t rebuild the about dialog.
If you try this and you hit a “file not found” error about the generated header, that’s because the build directory (where headers get generated) is missing in the include path. You can solve this by adding set(CMAKE_INCLUDE_CURRENT_DIR TRUE) near the top of your CMakeLists.txt file. This is part of the CMake settings that I recommend should always be set; you can make it part of your new project template and never have to think about it again.
There’s just one thing left to explain: what’s this #cmakedefine01 thing?
If your C++ code uses #ifdef, you want to use #cmakedefine, which either sets or doesn’t set the define. But there’s a major downside of doing that — if you forget to include myapp_config.h, you won’t get a compile error; it will just always go to the #else code path.
We want a solution that gives an error if the #include is missing. The generated header should set the define to either 0 or 1 (but always set it), and the C++ code should use #if. Then, you get a warning if the define hasn’t been set and, because people tend to ignore warnings, I recommend that you upgrade it to an error by adding the compiler flag -Werror=undef, with gcc or clang. Let me know if you are aware of an equivalent flag for MSVC.
if(CMAKE_COMPILER_IS_GNUCXX OR CMAKE_CXX_COMPILER_ID MATCHES "Clang") target_compile_options(myapp PRIVATE -Werror=undef) endif()And these are all the pieces we need. Never use add_definitions or add_compile_definitions again for things that are only used by a handful of files. Use configure_file instead, and include the generated header. You’ll save a lot of time compared to recompiling files unnecessarily.
I hope this tip was useful.
For more content on CMake, we curated a collection of resources about CMake with or without Qt. Check out the videos.
About KDAB
If you like this article and want to read similar material, consider subscribing via our RSS feed.
Subscribe to KDAB TV for similar informative short video content.
KDAB provides market leading software consulting and development services and training in Qt, C++ and 3D/OpenGL. Contact us.
The post Setting C++ Defines with CMake appeared first on KDAB.
eGenix.com: eGenix PyRun - One file Python Runtime 2.6.0 GA
eGenix PyRun™ is our open source, one file, no installation version of Python, making the distribution of a Python interpreter to run Python based scripts and applications to Unix based systems simple and efficient.
eGenix PyRun's executable only needs 4-6MB on disk, but still supports most Python applications and scripts.
Compared to a regular Python installation of typically 100MB on disk, eGenix PyRun is ideal for applications and scripts that need to be distributed to containers, VMs, clusters, client installations, customers or end-users.
It makes "installing" Python on a Unix based system as simple as copying a single file.
eGenix has been using eGenix PyRun as run-time for the Linux version of mxODBC Connect Server product since 2008 with great success and decided to make it available as a stand-alone open-source product.
We provide the source archive to build your own eGenix PyRun on Github, as well as a few binary distributions to get you started on Linux x86_64. In the future, we will set up automated builds for several other platforms.Please see the product page for more details:
>>> eGenix PyRun - One file Python Runtime
NewsThis major release of eGenix PyRun comes with the following enhancements:
Enhancements / Changes- Added support for Python 3.12
- Added support for LTO release builds
- Added dev build targets for development; these don't use PGO and thus build faster
Please visit the eGenix PyRun product page for downloads, instructions on installation and documentation of the product.
Commercial support for this product is available directly from eGenix.com.
Please see the support section of our website for details.
For more information on eGenix PyRun, licensing and download instructions, please write to sales@egenix.com.
Enjoy !
Marc-Andre Lemburg, eGenix.com
Zato Blog: Web scraping as an API service
In systems-to-systems integrations, there comes an inevitable time when we have to employ some kind of a web scraping tool to integrate with a particular application. Despite its not being our first choice, it is good to know what to use at such a time - in this article, I provide a gentle introduction to my favorite tool of this kind, called Playwright, followed by sample Python code that integrates it with an API service.
Naturally, in the context of backend integrations, web scraping should be avoided and, generally, it should be considered the last resort. The basic issue here is that while the UI term contains the "interface" part, it is not really the "Application Programming" Interface that we would like to have.
It is not that the UI cannot be programmed against. After all, a web browser does just that, it takes a web page and renders it as expected. Same goes for desktop or mobile applications. Also, anyone integrating with mainframe computers will recognize that this is basically what 3270 can be used for too.
Rather, the fundamental issue is that web scraping goes against the principles of separation of layers and roles across frontend, middleware and backend, which in turn means that authors of resources (e.g. HTML pages) do not really expect for many people to access them in automated ways.
Perhaps they actually should expect it, and web pages should finally start to resemble genuine knowledge graphs, easy to access by humans, be it manually or through automation tools, but the reality today is that it is not the case and, in comparison with backend systems, the whole of the web scraping space is relatively brittle, which is why we shun this approach in integrations.
Yet, another part of reality, particularly in enterprise integrations, is that people may be sometimes given access to a frontend application on an internal network and that is it. No API, no REST, no JSON, no POST data, no real data formats, and one is simply supposed to fill out forms as part of a business process.
Typically, such a situation will result in an integration gap. There will be fully automated parts in the business process preceding this gap, with multiple systems coordinated towards a specific goal and there will be subsequent steps in the process, also fully automated.
Or you may be given access only to a specific frontend and only through VPN via a single remote Windows desktop. Getting access to a REST API may take months or may be never realized because of some high level licensing issues. This is not uncommon in the real life.
Such a gap can be a jarring and sore point, truly ruining the whole, otherwise fluid, integration process. This creates a tension and to resolve the tension, we can, should all the attempts to find a real API fail, finally resort to web scraping.
It is mostly in this context that I am looking at Playwright below - the tool is good and it has many other uses that go beyond the scope of this text, and it is well worth knowing it, for instance for frontend testing of your backend systems, but, when we deal with API integrations, we should not overdo with web scraping.
Needless to say, if web scraping is what you do primarily, your perspective will be somewhat different - you will not need any explanation of why it is needed or when, and you may be only looking for a way to enclose up your web scraping code in API services. This article will explain that too.
Introducing PlaywrightThe nice part of Playwright is that we can use it to visually prepare a draft of Python code that will scrape a given resource. That is, instead of programming it in Python, we go to an address, fill out a form, click buttons and otherwise use everything as usually and Playwright generates for us code that will be later used in integrations.
That code will require a bit of clean-up work, which I will talk about below, but overall it works very nicely and is certainly useful. The result is not one of these do-not-touch auto-generated pieces of code that are better left to their own.
While there are better ways to integrate with Jira, I chose that application as an example of Playwright's usage simply because I cannot show you any internal application in a public blog post.
Below, there are two windows. One is Playwright's emulating a Blackberry device to open a resource. I was clicking around, I provided an email address and then I clicked the same email field once more. To the right, based on my actions, we can find the generated Python code, which I consider quite good and readable.
The Playwright Inspector, the tool that gave us the code, will keep recording all of our actions until we click the "Record" button which then allows us to click the button next to "Record" which is "Copy code to clipboard". We can then save the code to a separate file and run it on demand, automatically.
But first, we will need to install Playwright.
Installing and starting PlaywrightThe tools is written in TypeScript and can be installed using npx, which in turn is part of NodeJS.
Afterwards, the "playwright install" call is needed as well because that will potentially install runtime dependencies, such as Chrome libraries.
Finally, we install Playwright using pip as well because we want to access with Python. Note that if you are installing Playwright under Zato, the "/path/to/pip" will be typically "/opt/zato/code/bin/pip".
npx -g --yes playwright install playwright install /path/to/pip install playwrightWe can now start it as below. I am using BlackBerry as an example of what Playwright is capable of. Also, it is usually more convenient to use a mobile version of a site when the main window and Inspector are opened side by side, but you may prefer to use Chrome, Firefox or anything else.
playwright codegen https://example.atlassian.net/jira --device "BlackBerry Z30"That is practically everything as using Playwright to generate code in our context goes. Open the tool, fill out forms, copy code to a Python module, done.
What is still needed, though, is cleaning up the resulting code and embedding it in an API integration process.
Code clean-upAfter you keep using Playwright for a while with longer forms and pages, you will note that the generated code tends to accumulate parts that repeat.
For instance, in the module below, which I already cleaned up, the same "[placeholder=\"Enter email\"]" reference to the email field is used twice, even if a programmer developing this could would prefer to introduce a variable for that.
There is not a good answer to the question of what to do about it. On the one hand, obviously, being programmers we would prefer not to repeat that kind of details. On the other hand, if we clean up the code too much, this may result in too much of a maintenance burden because we need to keep it mind that we do not really want to invest to much in web scraping and, should there be a need to repeat the whole process, we do not want to end up with Playwright's code auto-generated from scratch once more, without any of our clean-up.
A good compromise position is to at least extract any kind of credentials from the code to environment variables or a similar place and to remove some of the code comments that Playwright generates. The result as below is what it should like at the end. Not too much effort without leaving the whole code as it was originally either.
Save the code below as "play1.py" as this is what the API service below will use.
# -*- coding: utf-8 -*- # stdlib import os # Playwright from playwright.sync_api import Playwright, sync_playwright class Config: Email = os.environ.get('APP_EMAIL', 'zato@example.com') Password = os.environ.get('APP_PASSWORD', '') Headless = bool(os.environ.get('APP_HEADLESS', False)) def run(playwright: Playwright) -> None: browser = playwright.chromium.launch(headless=Config.Headless) # type: ignore context = browser.new_context() # Open new page page = context.new_page() # Open project boards page.goto("https://example.atlassian.net/jira/software/projects/ABC/boards/1") page.goto("https://id.atlassian.com/login?continue=https%3A%2F%2Fexample.atlassian.net%2Flogin%3FredirectCount%3D1%26dest-url%3D%252Fjira%252Fsoftware%252Fprojects%252FABC%252Fboards%252F1%26application%3Djira&application=jira") # Fill out the email page.locator("[placeholder=\"Enter email\"]").click() page.locator("[placeholder=\"Enter email\"]").fill(Config.Email) # Click #login-submit page.locator("#login-submit").click() with sync_playwright() as playwright: run(playwright) Web scraping as a standalone activityWe have the generated code so the first thing to do with it is to run it from command line. This will result in a new Chrome window's accessing Jira - it is Chrome, not Blackberry, because that is the default for Playwright.
The window will close soon enough but this is fine, that code only demonstrates a principle, it is not a full integration task.
python /path/to/play1.pyIt is also useful that we can run the same Python module from our IDE, giving us the ability to step through the code line by line, observing what changes when and why.
Web scraping as an API service
Finally, we are ready to invoke the standalone module from an API service, as in the following code that we are also going to make available as a REST channel.
A couple of notes about the Python service below:
- We invoke Playwright in a subprocess, as a shell command
- We accept input through data models although we do not provide any output definition because it is not needed here
- When we invoke Playwright, we set the APP_HEADLESS to True which will ensure that it does not attempt to actually display a Chrome window. After all, we intend for this service to run on Linux servers, in backend, and such a thing will be unlikely to work in this kind of an environment.
Other than that, this is a straightforward Zato service - it receives input, carries out its work and a reply is returned to the caller (here, empty).
# -*- coding: utf-8 -*- # stdlib from dataclasses import dataclass # Zato from zato.server.service import Model, Service # ########################################################################### @dataclass(init=False) class WebScrapingDemoRequest(Model): email: str password: str # ########################################################################### class WebScrapingDemo(Service): name = 'demo.web-scraping' class SimpleIO: input = WebScrapingDemoRequest def handle(self): # Path to a Python installation that Playwright was installed under py_path = '/path/to/python' # Path to a Playwright module with code to invoke playwright_path = '/path/to/the-playwright-module.py' # This is a template script that we will invoke in a subprocess command_template = """ APP_EMAIL={app_email} APP_PASSWORD={app_password} APP_HEADLESS=True {py_path} {playwright_path} """ # This is our input data input = self.request.input # type: WebScrapingDemoRequest # Extract credentials from the input .. email = input.email password = input.password # .. build the full command, taking all the config into account .. command = command_template.format( app_email = email, app_password = password, py_path = py_path, playwright_path = playwright_path, ) # .. invoke the command in a subprocess .. result = self.commands.invoke(command) # .. if it was not a success, log the details received .. if not result.is_ok: self.logger.info('Exit code -> %s', result.exit_code) self.logger.info('Stderr -> %s', result.stderr) self.logger.info('Stdout -> %s', result.stdout) # ###########################################################################Now, the REST channel:
The last thing to do is to invoke the service - I am using curl from the command line below but it could very well be Postman or a similar option.
curl localhost:17010/demo/web-scraping -d '{"email":"hello@example.com", "password":"abc"}' ; echoThere will be no Chrome window this time around because we run Playwright in the headless mode. There will be no output from curl either because we do not return anything from the service but in server logs we will find details such as below.
We can learn from the log that the command took close to 4 seconds to complete, that the exit code was 0 (indicating success) and that is no stdout or stderr at all.
INFO - Command ` APP_EMAIL=hello@example.com APP_PASSWORD=abc APP_HEADLESS=True /path/to/python /path/to/the-playwright-module.py ` completed in 0:00:03.844157, exit_code -> 0; len-out=0 (0 Bytes); len-err=0 (0 Bytes); cid -> zcmdc5422816b2c6ff9f10742134We are now ready to continue to work on it - for instance, you will notice that the password is visible in logs and this should not be allowed.
But, all such works are extra in comparison with the main theme - we have Playwright, which is a a tool that allows us to quickly integrate with frontend applications and we can automate it through API services. Just as expected.
More resources➤ Python API integration tutorial
➤ What is an integration platform?
➤ Python Integration platform as a Service (iPaaS)
➤ What is an Enterprise Service Bus (ESB)? What is SOA?
Tag1 Consulting: Migrating Your Data from D7 to D10: Avoiding entity ID conflicts with AUTO_INCREMENT
Previously, we wrapped up migrating configuration to match the content model we specified in our upgrade plan. Ready to start migrating content? Hang in there — it’s coming up next. But first, let's address one of the hardest issues to resolve when they arise - entity ID conflicts in content migrations.
mauricio Tue, 11/12/2024 - 23:18ClearlyDefined v2.0 adds support for LicenseRefs
One of the major focuses of the ClearlyDefined Technical Roadmap is the improvement in the quality of license data. As such, we are excited to announce the release of ClearlyDefined v2.0 which adds over 2,000 new well-known licenses it can identify. You can see the complete list of new non-SPDX licenses in ScanCode LicenseDB.
A little historical background, when Clearly Defined was first created, it was initially decided to limit the reported licenses to only those on the SPDX License List. As teams worked with the Clearly Defined data, it became clear that additional license discovery is important to give users a fuller picture of the projects they depend on. In previous releases of ClearlyDefined, licenses not on the SPDX License List were represented in the definition as NOASSERTION or OTHER. (See the breakdown of licenses in The most popular licenses for each language in 2023.)The v2.0 release of ClearlyDefined includes an update of ScanCode to v32 and the support of LicenseRefs to identify non-SPDX licenses. The license in the definition will now be a LicenseRef with prefix LicenseRef-scancode- if ScanCode identifies a non-SPDX license. This improves the license coverage in the ClearlyDefined definitions and consumers ability to accurately construct license compliance policies.
ClearlyDefined identifies licenses in definitions using SPDX expressions. The SPDX specification has a way to include non-SPDX licenses in license expressions.
A license expression could be a single license identifier found on the SPDX License List; a user defined license reference denoted by the LicenseRef-[idString]; a license identifier combined with an SPDX exception; or some combination of license identifiers, license references and exceptions constructed using a small set of defined operators (e.g., AND, OR, WITH and +)
— excerpt from SPDX Annexes: SPDX license expressions
Example change of a definition:
CoordinatesLicense BEFORELicense AFTERnpm/npmjs/@alexa-games/sfb-story-debugger/2.1.0NOASSERTIONLicenseRef-.amazon.com.-AmznSL-1.0Note: ClearlyDefined v2.0 also includes an update to ScanCode v32.
What does this mean for definitions?This section includes a simplified description of what happens when you request a definition from ClearlyDefined. These examples only refer to the ScanCode tool. Other tools are run as well and are handled in similar ways.
When the definition already existsAny request for a definition through the /definitions API makes a couple of checks before returning the definition:
If the definition exists, it checks whether the definition was created using the latest version of the ClearlyDefined service.
- If yes, it returns the definition as is.
- If not, it recomputes the definition using the existing raw results from the tools run during the previous harvest for the existing definition. In this case, the tool version will be earlier than ScanCode v32.
NOTE: ClearlyDefined does not support LicenseRefs from ScanCode prior to v32. For earlier versions of ScanCode, ClearlyDefined stores any LicenseRefs as NOASSERTION. In some cases, you may see OTHER when the definition was curated.
When the definition does not existIf the definition does not exist:
- It will send a harvest request which will run the latest version of all the tools and produce raw results.
- From these raw results, it will compute a definition which might include a LicenseRef.
If you see NOASSERTION in the license expression, you can check the definition to determine the version of ScanCode in the “described”: “tools” section.
If ScanCode is a version earlier than v32, you can submit a harvest API request. This will run any tools for which ClearlyDefined now supports a later version. Once the tools complete, the definition will be recomputed based on the new results.
In some cases, even when the results are from ScanCode v32, you may still see NOASSERTION. Reharvesting when the ScanCode version is already v32 will not change the definition.
What does this mean for tools?When adding ScanCode licenses to allow/deny lists, note the ScanCode LicenseDB lists licenses without the LicenseRef prefix. All LicenseRefs coming from ScanCode will start with LicenseRef-scancode-.
Tools using an Allow ListA recomputed definition may change the license to include a LicenseRef that you want to allow. All new LicenseRefs that are acceptable will need to be added to your allow list. We are taking the approach of adding them as they appear in flagged package-version licenses. An alternative is to review the ScanCode LicenseDB to proactively add LicenseRefs to your allow list.
Tools using a Deny ListDeny lists need to be exhaustive to prevent a new license from being allowed by default. It is recommended that you review the ScanCode LicenseDB to determine if there are LicenseRefs you want to add to the deny list.
Note: The SPDX License List also changes over time. A periodic review to maintain the Deny list is always a good idea.
Providing FeedbackAs with any major version change, there can be unexpected behavior. You can reach out with questions, feedback, or requests. Find how to get in touch with us in the Get Involved doc.
If you have comments or questions on the actual LicenseRefs, you should reach out to ScanCode License DB maintainers.
AcknowledgementsA huge thank you to the contributing developers and their organizations for supporting the work of ClearlyDefined.
In alphabetical order, contributors were…
- ajhenry (GitHub)
- brifl (Microsoft)
- elrayle (GitHub)
- jeff-luszcz (GitHub)
- ljones140 (GitHub)
- lumaxis (GitHub)
- mpcen (Microsoft)
- nickvidal (Open Source Initiative)
- qtomlinson (SAP)
- RomanIakovlev (GitHub)
- yashkohli88 (SAP)
See something you’d like ClearlyDefined to do or could do better? If you have resources to help out, we have work to be done to further improve data quality, performance, and sustainability. We’d love to hear from you.
References