Feeds
Oliver Davies' daily list: Next week is DrupalCon Barcelona
Next week is DrupalCon in Barcelona.
I'm not speaking this year but if you're there and want to discuss automated testing, test-driven development, static analysis, Sculpin, Tailwind CSS, Nix, or anything else, I'll be there all week and would love to meet up and chat.
Open Source AI Definition – Weekly update september 16
- OSI invites individuals and organizations to endorse the Open Source AI Definition (OSAID). Endorsers will have their name and affiliation listed in the press release for Release Candidate 1 (RC1), which is expected to be finalized by the end of September. Those endorsing version 0.0.9 will be contacted again to confirm their support if there are any changes leading up to RC1.
- @mjbommar encourages reviewing the U.S. Copyright Office’s guidance on text and data mining (TDM) exceptions, which provides clear explanations and limitations, especially focusing on non-commercial, scholarly, and teaching uses. He emphasizes that the TDM guidance operates within narrow parameters that are often misunderstood or overlooked.
- @quaid proposes adding nuance to the Open Source AI (OSAI) Definition by introducing two designations: OSAI D+ (with open data) and OSAI D- (without open data, due to legitimate reasons beyond the creator’s control). He suggests using a dataset certificate of origin (dataset DCO) for self-verification to ensure compliance.
- @kjetilk agrees that verification is key but questions whether data information alone is sufficient for verification. He highlights that verifying rights to the data may not always be possible.
- @stefano appreciates the quadrant system’s clarity and confirms @quaid’s proposal for OSAI D- to be reserved for those with legitimate reasons for not sharing data.
- @thesteve0 expresses skepticism about broadening the “Open Source” label. He argues that without access to both data and code, AI models cannot truly be Open Source and suggests labeling such models as “open weights” instead.
- @shujisado notes the importance of data access in AI, pointing out that OSAID requires detailed information about how data is sourced, including provenance and selection criteria. He also discusses potential legal and ethical reasons for not sharing datasets.
- @Shamar raises concerns about “openwashing” in AI, where developers might distribute a model with a different dataset, undermining trust. He argues that distinguishing between OSAI D+ and D- risks legal complications for derivative works, suggesting that models without open data should not be considered truly open.
- @zack supports the idea of a tiered system (D+ and D-) as an improvement over the current situation, as it incentivizes progress from D- to D+. He is skeptical about verifiability but sees potential in the branding aspect of the proposal.
- @stefano asks @arandal about suggested edits, which include renaming data as “source data,” allowing open-source AI developers to require downstream modifications with open data, and permitting downstream developers to use open data to fine-tune models trained on non-public data. He further asks if arandal compares training data to model weights as source code is to binary code.
- @shujisado agrees with @stefano and points out that while many interpret OSD-compliant licenses to include CC4 and CC0, OSI has not officially evaluated Creative Commons licenses for compliance. He highlights concerns about CC0’s patent defense, which could be crucial for datasets.
- @mjbommar echoes the concerns about patent defense, noting it as a critical issue in both software and data licensing.
- @Shamar supports the first two suggestions but argues that models trained on non-public data cannot meet an “Open Source AI” definition, as they limit the freedom to study and modify, which are core principles of Open Source.
- @nick shares an article by Nathan Lambert, reviewed by key figures in the Open Source AI space, discussing the challenges of training data and the current Open Source AI definition. @Percy Liang (on X) view is highlighted, where he suggests that releasing an entire dataset is neither sufficient nor necessary for Open Source AI. He emphasizes the need for detailed code of the data processing pipeline for transparency, beyond just releasing the dataset.
- @shujisado discusses the legal nuances of using U.S. government documents in AI training, emphasizing that while they may be used in the U.S., legal complications arise in other jurisdictions.
- @Shamar stresses that Open Source AI should provide all the necessary data and processing information to recreate a system, otherwise, calling it Open Source is “open washing.”
- @Shamar proposes a clearer distinction between “source data” and “processing information” in the Open Source AI definition to ensure transparency and reproducibility. He suggests source data should be publicly available under the same terms that allowed its original use, while the process used to train the system should be shared under an Open Source license. His formulation aims to prevent loopholes that could lead to open-washing and emphasizes the importance of granting all four freedoms (study, modify, distribute, and use) to qualify as Open Source AI.
- @nick disagrees, arguing that @Shamar proposal misunderstands the difference between the rights to use data for training and the rights to distribute it. He also challenges the claim that exact replication of AI systems can be guaranteed, even with access to the same data.
- The sixteenth edition of our town hall meetings was held on the 13th of September. If you missed it, the recording and slides can be found here.
Nonprofit Drupal posts: September Drupal for Nonprofits Chat
Join us THURSDAY, September 19 at 1pm ET / 10am PT, for our regularly scheduled call to chat about all things Drupal and nonprofits. (Convert to your local time zone.)
We don't have anything specific on the agenda this month, so we'll have plenty of time to discuss anything that's on our minds at the intersection of Drupal and nonprofits. Got something specific you want to talk about? Feel free to share ahead of time in our collaborative Google doc: https://nten.org/drupal/notes!
All nonprofit Drupal devs and users, regardless of experience level, are always welcome on this call.
This free call is sponsored by NTEN.org and open to everyone.
-
Join the call: https://us02web.zoom.us/j/81817469653
-
Meeting ID: 818 1746 9653
Passcode: 551681 -
One tap mobile:
+16699006833,,81817469653# US (San Jose)
+13462487799,,81817469653# US (Houston) -
Dial by your location:
+1 669 900 6833 US (San Jose)
+1 346 248 7799 US (Houston)
+1 253 215 8782 US (Tacoma)
+1 929 205 6099 US (New York)
+1 301 715 8592 US (Washington DC)
+1 312 626 6799 US (Chicago) -
Find your local number: https://us02web.zoom.us/u/kpV1o65N
-
- Follow along on Google Docs: https://nten.org/drupal/notes
drunomics: Why we don't use GraphQL
At drunomics we are building decoupled Drupal sites for more than five years. During this time, GraphQL has always been a popular choice for decoupled Drupal sites among professional or enterprise projects, thanks to the well maintained GraphQL contrib module. Still, I've vetted against using GraphQL for various enterprise projects, even though sometimes it was appealing to customers. In this blog post, I'd like to summarize why we don't use GraphQL:
General complexityGraphQL is not only a new query language to learn for both frontenders and backenders, moreover the backend has to support any kind of queries the frontenders make. On the frontend side of things, additional libraries and tooling is needed to handle the protocol.
Loose contractsGraphQL gives a lot of power to frontend developers, but that comes with a huge price: No defined or a very loosely defined contract, i.e. the data model or more specifically the GraphQL schema layered on top. Based upon this loose contract the frontend may compose any kind of queries, which the backend has to support. What leads to the next point:
Complex queriesWhen the backend is exposing the Drupal data schema directly, potentiallly a lot of things become leaked unwanted and changing things might became hard, because: Who knows what data properties the frontend uses and queries for? It's quite hard to optimize for every use-case.
However, the backend may compose it's own GraphQL schema and provide exactly the data model as needed by the client, the frontend. That's indeed, a great option to have, but it requires additional work and code to translate between the schema and the real data model behind. It makes it possible to change the underlying data model and schema mapping, while staying with the same or compatible GraphQL output and schema. But is that code performing the mapping performant enough? Does it work correctly? That's quite hard to tell without knowing exactly the queries one has to optimize and test for. So things are or become complex.
PerformanceFirst of all, GraphQL is bad for caching since it makes use of POST requests. The typical work-around is to use shortened, hashed queries and to access them via GET requests, what can help to mitigate the issue. But this comes at the cost of tying the deployed frontend and backend versions, thus increasing overall system and deployment complexity. That way, the main GraphQL advantage - flexibility at the frontend - gets lost. So not an easy or great compromise to make.
Client driven data fetchingWith GraphQL, the web browser (or generally the client) sends a query to the server, specifying the exact data it needs. While this can help to reduce payload size, it puts the client in the "driving seat". That often leads to additional round trips being required: Based upon the first request, often additional data is required for rendering it. This additional data often has to be fetched in additional requests, thus requiring another or multiple round-trips to the server and thus increasing latency.
In contrast, when the server is in the "driving seat", it may efficiently do all queries and resolve additional data, and then send the resulting data over the slower network once.
SecurityGraphQL queries can expose sensitive data if not properly secured. This can be mitigated by implementing proper authentication and authorization mechanisms. However, this can get very complex easily: Since the server does not know the queries needed by the client, it needs to handle every possible combination a client may request. Unfortunately, it's commonly rather easy for hackers to purposely write computationally very expensive (GraphQL) queries and to send them to the server, thus opening the door for DDOS or even DOS attacks.
Besides that, due to the complexity of the backend having to cover all possible combinations, the danger for data leaking accidentially becomes rather high.
The conclusionGraphQL comes with a couple of issues, which are - as usual - solvable. That's a price one might want to pay in certain situations, if the benefits are worth it. Thus, is using GraphQL a good idea? As so often, it depends. But in my experience, it's more often not, than it is.
Alternatives are RESTfulThe typical alternative to GraphQL is a RESTful API. As usual, with Drupal there are a couple of good options:
- Drupal comes with the JSON-API out-of-the box, which is a great feature to have. While it's good fit in certain situations, it also faces some of the issues mentioned above, most notable "Client driven data fetching" and "Loose contracts".
- Developers may use Drupal's API to provide custom-coded RESTful endpoints for the client. That addresses all mentioned concerns, but requires backend development time for every feature and most notable careful planing. This comes with the downside of frontend developers loosing the flexibility. (By the way, this is what GraphQL is loved for!)
- Configurable RESTful endpoints. In order to improve the development process and gain flexibility in the frontend, we developed a solution for providing custom RESTful endpoints that are configurable via Drupal, by frontend developers. For that, we improved the Custom Elements module, which is part of Lupus Decoupled Drupal, such that it integrates with Drupal's configuration sytem and provides an UI for customizing output by entity view-mode. That way, in many situations, we can tick all the boxes, while enabling the frontend developer to work efficiently. I'll share more details about the new Custom Elements UI in a dedicated blog post later this week.
Free software in the EU needs your help! Join the international effort before September 20
FSF Blogs: Free software in the EU needs your help! Join the international effort before September 20
FSF Events: Free Software Directory meeting on IRC: Friday, September 20, starting at 12:00 EDT (16:00 UTC)
Talking Drupal: Talking Drupal #467 - Config Actions System
Today we are talking about The Config Actions System, What it does, and how it helps with Drupal Recipes with guests Alex Pott and Adam Globus-Hoenich. We’ll also cover the Events recipe as our module of the week.
For show notes visit: www.talkingDrupal.com/467
Topics- Explain Config Actions
- Is this related to the Actions UI
- How are config actions used in Drupal
- How will the average user interact with Config Actions
- What does non-desctructive mean
- Where did the Config Action system come from
- Future of the Config Action system
- How can people help out
- How does the Config Action system help with Drupal CMS
Alex Pott - alexpott Adam Globus-Hoenich - phenaproxima
HostsNic Laflin - nLighteneddevelopment.com nicxvan John Picozzi - epam.com johnpicozzi Nate Dentzau - dentzau.com nathandentzau
MOTW CorrespondentMartin Anderson-Clutz - mandclu.com mandclu
- Brief description:
- Have you ever wanted to set up and configure a robust events system in your Drupal website, in just a few seconds? There’s a recipe for that.
- Module name/project name:
- Brief history
- How old: originally created in Mar 2013 as a distribution, but reborn as a recipe in July 2024
- Versions available: 1.0.0-alpha3, compatible with Drupal 10.3 and 11
- Maintainership
- Actively maintained
- Security coverage? - no stable release
- Documentation in the works
- Number of open issues: 1 open issue, which is a bug
- Usage stats: not tracked for recipes
- Maintainer(s): mandclu
- Module features and usage
- Listeners probably won’t be surprised to hear that Smart Date is at the heart of what you’ll get when you apply the Events recipe
- You will have an Event content type, and a view to list upcoming and past events
- The recipe will also set up add-to-calendar links on your event page, making it easy for your site visitors to be reminded of when your event will take place
- There are companion recipes to add a calendar view, to be able to associate locations (with maps), and to add event registration
- A modified version of the Events recipe has already been integrated into Drupal CMS, so it will be even easier to apply for a site based on that
- Internally it makes use of the createIfNotExists and setComponents config actions, which is why I thought it would be relevant to today’s discussion
FSF Blogs: Free software in the EU needs your help! Join the ongoing Digital Europe Freedom Programme consultation by September 20
Free software in the EU needs your help! Join the ongoing Digital Europe Freedom Programme consultation by September 20
mark.ie: Live Previews for LocalGov Microsites Design Changes
I had great fun today expanding what was a proof-of-concept module from last year into a very usable live preview module for LGD Microsites.
Plasma 6.2 Beta in KDE neon Testing Edition
Back from the fun of Akademy in Würzburg we can now get to the important task of testing Plasma 6.2 beta. It’s now in KDE neon testing edition which we build from the Git branches which will be used to make the 6.2 final release in 2.5 weeks time. Grab it now or if you don’t have a machine to install it on you can try the Docker images using the simple command `neondocker -p -e testing`.
Real Python: Using Python's pip to Manage Your Projects' Dependencies
The standard package manager for Python is pip. It allows you to install and manage packages that aren’t part of the Python standard library. If you’re looking for an introduction to pip, then you’ve come to the right place!
In this tutorial, you’ll learn how to:
- Set up pip in your working environment
- Fix common errors related to working with pip
- Install and uninstall packages with pip
- Manage projects’ dependencies using requirements files
You can do a lot with pip, but the Python community is very active and has created some neat alternatives to pip. You’ll learn about those later in this tutorial.
Get Your Cheat Sheet: Click here to download a free pip cheat sheet that summarizes the most important pip commands.
Getting Started With pipSo, what exactly does pip do? pip is a package manager for Python. That means it’s a tool that allows you to install and manage libraries and dependencies that aren’t distributed as part of the standard library. The name pip was introduced by Ian Bicking in 2008:
I’ve finished renaming pyinstall to its new name: pip. The name pip is [an] acronym and declaration: pip installs packages. (Source)
Package management is so important that Python’s installers have included pip since versions 3.4 and 2.7.9, for Python 3 and Python 2, respectively. Many Python projects use pip, which makes it an essential tool for every Pythonista.
The concept of a package manager might be familiar to you if you’re coming from another programming language. JavaScript uses npm for package management, Ruby uses gem, and the .NET platform uses NuGet. In Python, pip has become the standard package manager.
Finding pip on Your SystemThe Python installer gives you the option to install pip when installing Python on your system. In fact, the option to install pip with Python is checked by default, so pip should be ready for you to use after installing Python.
Note: On some Linux (Unix) systems like Ubuntu, pip comes in a separate package called python3-pip, which you need to install with sudo apt install python3-pip. It’s not installed by default with the interpreter.
You can verify that pip is available by looking for the pip3 executable on your system. Select your operating system below and use your platform-specific command accordingly:
Windows PowerShell PS> where pip3 Copied!The where command on Windows will show you where you can find the executable of pip3. If Windows can’t find an executable named pip3, then you can also try looking for pip without the three (3) at the end.
Shell $ which pip3 Copied!The which command on Linux systems and macOS shows you where the pip3 binary file is located.
On Windows and Unix systems, pip3 may be found in more than one location. This can happen when you have multiple Python versions installed. If you can’t find pip in any location on your system, then you may consider reinstalling pip.
Instead of running your system pip directly, you can also run it as a Python module. In the next section, you’ll learn how.
Running pip as a ModuleWhen you run your system pip directly, the command itself doesn’t reveal which Python version pip belongs to. This unfortunately means that you could use pip to install a package into the site-packages of an old Python version without noticing. To prevent this from happening, you should run pip as a Python module:
Shell $ python -m pip Copied!Notice that you use python -m to run pip. The -m switch tells Python to run a module as an executable of the python interpreter. This way, you can ensure that your system default Python version runs the pip command. If you want to learn more about this way of running pip, then you can read Brett Cannon’s insightful article about the advantages of using python -m pip.
Note: Depending on how you installed Python, your Python executable may have a different name than python. You’ll see python used in this tutorial, but you may have to adapt the commands to use something like py or python3 instead.
Sometimes you may want to be more explicit and limit packages to a specific project. In situations like this, you should run pip inside a virtual environment.
Using pip in a Python Virtual Environment Read the full article at https://realpython.com/what-is-pip/ »[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
ThinkDrop Consulting: Reflections on OpenDevShop and the hidden costs of open source maintainership.
#OpenDevShop failed because it tried to solve too many problems at the same time.
This directed the energy away from designing for the future. When the future arrived, it was wholly unprepared.
I saw the potential to make #Aegir an all-in-one management console for all your web tech, so I created server management things, and local CLI things, and other silly, not so useful things.
DevShop became a huge burden. Unmaintainable. Un-upgradable. Working untold unpaid hours, self-funded travel and speaking took a major toll on my life, financially and personally.
Qt Tools for Android Studio 3.0 Released
We are happy to announce the release of Qt Tools for Android Studio 3.0. It can be downloaded from the JetBrains marketplace.
The Drop Times: QED42's Journey in Shaping Digital Experiences: Insights from Piyuesh Kumar
TechBeamers Python: How to Create Dynamic QR Code in Python
This tutorial guides you on how to create dynamic QR codes in Python. It involves a bit more than just generating the QR code itself. Before reading this, you must know how a QR code generator works. Steps to Create Dynamic QR Codes Dynamic QR codes require the ability to track and update the information […]
The post How to Create Dynamic QR Code in Python appeared first on TechBeamers.
PyCharm: 7 Ways To Use Jupyter Notebooks inside PyCharm
Jupyter notebooks allow you to tell stories by creating and sharing data, equations, and visualizations sequentially, with a supporting narrative as you go through the notebook.
Jupyter notebooks in PyCharm Professional provide functionality above and beyond that of browser-based Jupyter notebooks, such as code completion, dynamic plots, and quick statistics, to help you explore and work with your data quickly and effectively.
Let’s take a look at 7 ways you can use Jupyter notebooks in PyCharm to achieve your goals. They are:
- Creating or connecting to an existing notebook
- Importing your data
- Getting acquainted with your data
- Using JetBrains AI Assistant
- Exploring your code with PyCharm
- Getting insights from your code
- Sharing your insights and charts
The Jupyter notebook that we used in this demo is available on GitHub.
1. Creating or connecting to an existing notebookYou can create and work on your Jupyter notebooks locally or connect to one remotely with PyCharm. Let’s take a look at both options so you can decide for yourself.
Creating a new Jupyter notebookTo work with a Jupyter notebook locally, you need to go to the Project tool window inside PyCharm, navigate to the location where you want to add the notebook, and invoke a new file. You can do this by using either your keyboard shortcuts ⌘N (macOS) / Alt+Ins (Windows/Linux) or by right-clicking and selecting New | Jupyter Notebook.
Give your new notebook a name, and PyCharm will open it ready for you to start work. You can also drag local Jupyter notebooks into PyCharm, and the IDE will automatically recognise them for you.
Connecting to a remote Jupyter notebookAlternatively, you can connect to a remote Jupyter notebook by selecting Tools | Add Jupyter Connection. You can then choose to start a local Jupyter server, connect to an existing running local Jupyter server, or connect to a Jupyter server using a URL – all of these options are supported.
Now you have your Jupyter notebook, you need some data!
2. Importing your dataData generally comes in two formats, CSV or database. Let’s look at importing data from a CSV file first.
Importing from a CSV filePolars and pandas are the two most commonly used libraries for importing data into Jupyter notebooks. I’ll give you code for both in this section, and you can check out the documentation for both Polars and pandas and learn how Polars is different to pandas.
You need to ensure your CSV is somewhere in your PyCharm project, perhaps in a folder called `data`. Then, you can invoke import pandas and subsequently use it to read the code in:
import pandas as pd df = pd.read_csv("../data/airlines.csv")In this example, airlines.csv is the file containing the data we want to manipulate. To run this and any code cell in PyCharm, use ⇧⏎ (macOS) / Shift+Enter (Windows/Linux). You can also use the green run arrows on the toolbar at the top.
If you prefer to use Polars, you can use this code:
import polars as pl df = pl.read_csv("../data/airlines.csv") Importing from a databaseIf your data is in a database, as is often the case for internal projects, importing it into a Jupyter notebook will require just a few more lines of code. First, you need to set up your database connection. In this example, we’re using postgreSQL.
For pandas, you need to use this code to read the data in:
import pandas as pd engine = create_engine("postgresql://jetbrains:jetbrains@localhost/demo") df = pd.read_sql(sql=text("SELECT * FROM airlines"), con=engine.connect())And for Polars, it’s this code:
import polars as pl engine = create_engine("postgresql://jetbrains:jetbrains@localhost/demo") connection = engine.connect() query = "SELECT * FROM airlines" df = pl.read_database(query, connection) 3. Getting acquainted with your dataNow we’ve read our data in, we can take a look at the DataFrame or `df` as we will refer to it in our code. To print out the DataFrame, you only need a single line of code, regardless of which method you used to read the data in:
df DataFramesPyCharm displays your DataFrame as a table firstly so you can explore it. You can scroll horizontally through the DataFrame and click on any column header to order the data by that column. You can click on the Show Column Statistics icon on the right-hand side and select Compact or Detailed to get some helpful statistics on each column of data.
Dynamic chartsYou can use PyCharm to get a dynamic chart of your DataFrame by clicking on the Chart View icon on the left-hand side. We’re using pandas in this example, but Polars DataFrames also have the same option.
Click on the Show Series Settings icon (a cog) on the right-hand side to configure your plot to meet your needs:
In this view, you can hover your mouse over your data to learn more about it and easily spot outliers:
You can do all of this with Polars, too.
4. Using JetBrains AI AssistantJetBrains AI Assistant has several offerings that can make you more productive when you’re working with Jupyter notebooks inside PyCharm. Let’s take a closer look at how you can use JetBrains AI Assistant to explain a DataFrame, write code, and even explain errors.
Explaining DataFramesIf you’ve got a DataFrame but are unsure where to start, you can click the purple AI icon on the right-hand side of the DataFrame and select Explain DataFrame. JetBrains AI Assistant will use its context to give you an overview of the DataFrame:
You can use the generated explanation to aid your understanding.
Writing CodeYou can also get JetBrains AI Assistant to help you write code. Perhaps you know what kind of plot you want, but you’re not 100% sure what the code should look like. Well, now you can use JetBrains AI Assistant to help you. Let’s say you want to use ‘matplotlib’ to create a chart that finds the relationship between ‘TimeMonthName’ and ‘MinutesDelayedWeather’. By specifying the column names, we’re giving more context to the request which improves the reliability of the generated code. Try it with the following prompt:
Give me code using matplotlib to create a chart which finds the relationship between ‘TimeMonthName’ and ‘MinutesDelayedWeather’ for my dataframe df
If you like the resulting code, you can use the Insert Snippet at Caret button to insert the code and then run it:
import matplotlib.pyplot as plt # Assuming your data is in a DataFrame named 'df' # Replace 'df' with the actual name of your DataFrame if different # Plotting plt.figure(figsize=(10, 6)) plt.bar(df['TimeMonthName'], df['MinutesDelayedWeather'], color='skyblue') plt.xlabel('Month') plt.ylabel('Minutes Delayed due to Weather') plt.title('Relationship between TimeMonthName and MinutesDelayedWeather') plt.xticks(rotation=45) plt.grid(axis='y', linestyle='--', alpha=0.7) plt.tight_layout() plt.show()If you don’t want to open the AI Assistant tool window, you can use the AI cell prompt to ask your questions. For example, we can ask the same question here and get the code we need:
Explaining errors
You can also get JetBrains AI Assistant to explain errors for you. When you get an error, click Explain with AI:
You can use the resulting output to further your understanding of the problem and perhaps even get some code to fix it!
5. Exploring your codePyCharm can help you get an overview of your Jupyter notebook, complete parts of your code to save your fingers, refactor it as required, debug it, and even add integrations to help you take it to the next level.
Tips for navigating and optimizing your codeOur Jupyter notebooks can grow large quite quickly, but thankfully you can use PyCharm’s Structure view to see all your notebook’s headings by clicking ⌘7 (macOS) / Alt+7 (Windows/Linux).
Code completionAnother helpful feature that you can take advantage of when using Jupyter notebooks inside PyCharm is code completion. You get both basic and type-based code completion out of the box with PyCharm, but you can also enable Full Line Code Completion in PyCharm Professional, which uses a local AI model to provide suggestions. Lastly, JetBrains AI Assistant can also help you write code and discover new libraries and frameworks.
RefactoringSometimes you need to refactor your code, and in that case, you only need to know one keyboard shortcut ⌃T (macOS) / Shift+Ctrl+Alt+T (Windows/Linux) then you can choose the refactoring you want to invoke. Pick from popular options such as Rename, Change Signature, and Introduce Variable, or lesser-known options such as Extract Method, to change your code without changing the semantics:
As your Jupyter notebook grows, it’s likely that your import statements will also grow. Sometimes you might import a package such as polars and numpy, but forget that numpy is a transitive dependency of the Polars library and as such, we don’t need to import it separately.
To catch these cases and keep your code tidy, you can invoke Optimize Imports ⌃⌥O (macOS) / Ctrl+Alt+O (Windows/Linux) and PyCharm will remove the ones you don’t need.
Debugging your codeYou might not have used the debugger in PyCharm yet, and that’s okay. Just know that it’s there and ready to support you when you need to better understand some behavior in your Jupyter notebook.
Place a breakpoint on the line you’re interested in by clicking in the gutter or by using ⌘F8 (macOS) / Ctrl+F8 (Windows/Linux), and then run your code with the debugger attached with the debug icon on the top toolbar:
You can also invoke PyCharm’s debugger in your Jupyter notebook with ⌥⇧⏎ (macOS) / Shift+Alt+Enter (Windows/Linux). There are some restrictions when it comes to debugging your code in a Jupyter notebook, but please try this out for yourself and share your feedback with us.
Adding integrations into PyCharmIDEs wouldn’t be complete without the integrations you need. PyCharm Professional 2024.2 brings two new integrations to your workflow: DataBricks and HuggingFace.
You can enable the integrations with both Databricks and HuggingFace by going to your Settings <kbd>⌘</kbd> (macOS) / <kbd>Ctrl+Alt+S</kbd> (Windows/Linux), selecting Plugins and searching for the plugin with the corresponding name on the Marketplace tab.
6. Getting insights from your codeWhen analyzing your data, there’s a difference between categorical and continuous variables. Categorical data has a finite number of discrete groups or categories, whereas continuous data is one continuous measurement. Let’s look at how we can extract different insights from both the categorical and continuous variables in our airlines dataset.
Continuous variablesWe can get a sense of how continuous data is distributed by looking at measures of the average value in that data and the spread of the data around the average. In normally distributed data, we can use the mean to measure the average and the standard deviation to measure the spread. However, when data is not distributed normally, we can get more accurate information using the median and the interquartile range (this is the difference between the seventy-fifth and twenty-fifth percentiles). Let’s look at one of our continuous variables to understand the difference between these measurements.
In our dataset, we have lots of continuous variables, but we’ll work with `NumDelaysLateAircraft` to see what we can learn. Let’s use the following code to get some summary statistics for just that column:
df['NumDelaysLateAircraft'].describe()Looking at this data, we can see that there is a big difference between the `mean` of ~789 and the ‘median’ (our fiftieth percentile, indicated by “50%” in the table below) of ~618.
This indicates a skew in our variable’s distribution, so let’s use PyCharm to explore it further. Click on the Chart View icon at the top left. Once the chart has been rendered, we’ll change the series settings represented by the cog on the right-hand side of the screen. Change your x-axis to `NumDelaysLateAircraft` and your y-axis to `NumDelaysLateAircraft`.
Now drop down the y-axis using the little arrow and select `count`. The final step is to change the chart type to Histogram using the icons in the top-right corner:
Now that we can see the skew laid out visually, we can see that most of the time, the delays are not too excessive. However, we have a number of more extreme delays – one aircraft is an outlier on the right and it was delayed by 4,509 minutes, which is just over three days!
In statistics, the mean is very sensitive to outliers because it’s a geometric average, unlike the median, which, if you ordered all observations in your variable, would sit exactly in the middle of these values. When the mean is higher than the median, it’s because you have outliers on the right-hand side of the data, the higher side, as we had here. In such cases, the median is a better indicator of the true average delay, as you can see if you look at the histogram.
Categorical variablesLet’s take a look at how we can use code to get some insights from our categorical variables. In order to get something that’s a little more interesting than just `AirportCode`, we’ll analyze how many aircraft were delayed by weather, `NumDelaysWeather`, in the different months of the year, `TimeMonthName`.
Use this code to group `NumDelaysWeather` with `TimeMonthName`:
result = df[['TimeMonthName', 'NumDelaysWeather']].groupby('TimeMonthName').sum() resultThis gives us the DataFrame again in table format, but click the Chart View icon on the left-hand side of the PyCharm UI to see what we can learn:
This is okay, but it would be helpful to have the months ordered according to the Gregorian calendar. Let’s first create a variable for the months that we expect:
month_order = [ "January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December" ]Now we can ask PyCharm to use the order that we’ve just defined in `month_order`:
# Convert the 'TimeMonthName' column to a categorical type with the specified order df["TimeMonthName"] = pd.Categorical(df["TimeMonthName"], categories=month_order, ordered=True) # Now you can group by 'TimeMonthName' and perform sum operation, specifying observed=False result = df[['TimeMonthName', 'NumDelaysWeather']].groupby('TimeMonthName', observed=False).sum() resultWe then click on the Chart View icon once more, but something’s wrong!
Are we really saying that there were no flights delayed in February? That can’t be right. Let’s check our assumption with some more code:
df['TimeMonthName'].value_counts()Aha! Now we can see that `Febuary` has been misspelt in our data set, so the correct spelling in our variable name does not match. Let’s update the spelling in our dataset with this code:
df["TimeMonthName"] = df["TimeMonthName"].replace("Febuary", "February") df['TimeMonthName'].value_counts()Great, that looks right. Now we should be able to re-run our earlier code and get a chart view that we can interpret:
From this view, we can see that there is a higher number of delays during the months of December, January, and February, and then again in June, July, and August. However, we have not standardized this data against the total number of flights, so there may just be more flights in those months, which would cause these results along with an increased number of delays in those summer and winter months.
7. Sharing your insights and chartsWhen your masterpiece is complete, you’ll probably want to export data, and you can do that in various ways with Jupyter notebooks in PyCharm.
Exporting a DataFrameYou can export a DataFrame by clicking on the down arrow on the right-hand side:
You have lots of helpful formats to choose from, including SQL, CSV, and JSON:
Exporting chartsIf you prefer to export the interactive plot, you can do that too by clicking on the Export to PNG icon on the right-hand side:
Viewing your notebook as a browserYou can view your whole Jupyter notebook at any time in a browser by clicking the icon in the top-right corner of your notebook:
Finally, if you want to export your Jupyter notebook to a Python file, 2024.2 lets you do that too! Right-click on your Jupyter notebook in the Project tool window and select Convert to Python File. Follow the instructions, and you’re done!
SummaryUsing Jupyter notebooks inside PyCharm Professional provides extensive functionality, enabling you to create code faster, explore data easily, and export your projects in the formats that matter to you.
Download PyCharm Professional to try it out for yourself! Get an extended trial today and experience the difference PyCharm Professional can make in your data science endeavors.
Use the promo code “PyCharmNotebooks” at checkout to activate your free 60-day subscription to PyCharm Professional. The free subscription is available for individual users only.
Activate your 60-day trialZato Blog: Smart IoT integrations with Akenza and Python
The Akenza IoT platform, on its own, excels in collecting and managing data from a myriad of IoT devices. However, it is integrations with other systems, such as enterprise resource planning (ERP), customer relationship management (CRM) platforms, workflow management or environmental monitoring tools that enable a complete view of the entire organizational landscape.
Complementing Akenza's capabilities, and enabling the smooth integrations, is the versatility of Python programming. Given how flexible Python is, the language is a natural choice when looking for a bridge between Akenza and the unique requirements of an organization looking to connect its intelligent infrastructure.
This article is about combining the two, Akenza and Python. At the end of it, you will have:
- A bi-directional connection to Akenza using Python and WebSockets
- A Python service subscribed to and receiving events from IoT devices through Akenza
- A Python service that will be sending data to IoT devices through Akenza
Since WebSocket connections are persistent, their usage enhances the responsiveness of IoT applications which in turn helps to exchange occurs in real-time, thus fostering a dynamic and agile integrated ecosystem.
Python and Akenza WebSocket connectionsFirst, let's have a look at full Python code - to be discussed later.
# -*- coding: utf-8 -*- # Zato from zato.server.service import WSXAdapter # ############################################################################################### # ############################################################################################### if 0: from zato.server.generic.api.outconn.wsx.common import OnClosed, \ OnConnected, OnMessageReceived # ############################################################################################### # ############################################################################################### class DemoAkenza(WSXAdapter): # Our name name = 'demo.akenza' def on_connected(self, ctx:'OnConnected') -> 'None': self.logger.info('Akenza OnConnected -> %s', ctx) # ############################################################################################### def on_message_received(self, ctx:'OnMessageReceived') -> 'None': # Confirm what we received self.logger.info('Akenza OnMessageReceived -> %s', ctx.data) # This is an indication that we are connected .. if ctx.data['type'] == 'connected': # .. for testing purposes, use a fixed asset ID .. asset_id:'str' = 'abc123' # .. build our subscription message .. data = {'type': 'subscribe', 'subscriptions': [{'assetId': asset_id, 'topic': '*'}]} ctx.conn.send(data) else: # .. if we are here, it means that we received a message other than type "connected". self.logger.info('Akenza message (other than "connected") -> %s', ctx.data) # ############################################################################################## def on_closed(self, ctx:'OnClosed') -> 'None': self.logger.info('Akenza OnClosed -> %s', ctx) # ############################################################################################## # ##############################################################################################Now, deploy the code to Zato and create a new outgoing WebSocket connection. Replace the API key with your own and make sure to set the data format to JSON.
Receiving messages from WebSocketsThe WebSocket Python services that you author have three methods of interest, each reacting to specific events:
-
on_connected - Invoked as soon as a WebSocket connection has been opened. Note that this is a low-level event and, in the case of Akenza, it does not mean yet that you are able to send or receive messages from it.
-
on_message_received - The main method that you will be spending most time with. Invoked each time a remote WebSocket sends, or pushes, an event to your service. With Akenza, this method will be invoked each time Akenza has something to inform you about, e.g. that you subscribed to messages, that
-
on_closed - Invoked when a WebSocket has been closed. It is no longer possible to use a WebSocket once it has been closed.
Let's focus on on_message_received, which is where the majority of action takes place. It receives a single parameter of type OnMessageReceived which describes the context of the received message. That is, it is in the "ctx" that you will both the current request as well as a handle to the WebSocket connection through which you can reply to the message.
The two important attributes of the context object are:
-
ctx.data - A dictionary of data that Akenza sent to you
-
ctx.conn - The underlying WebSocket connection through which the data was sent and through you can send a response
Now, the logic from lines 30-40 is clear:
-
First, we check if Akenza confirmed that we are connected (type=='connected'). You need to check the type of a message each time Akenza sends something to you and react to it accordingly.
-
Next, because we know that we are already connected (e.g. our API key was valid) we can subscribe to events from a given IoT asset. For testing purposes, the asset ID is given directly in the source code but, in practice, this information would be read from a configuration file or database.
-
Finally, for messages of any other type we simply log their details. Naturally, a full integration would handle them per what is required in given circumstances, e.g. by transforming and pushing them to other applications or management systems.
A sample message from Akenza will look like this:
INFO - WebSocketClient - Akenza message (other than "connected") -> {'type': 'subscribed', 'replyTo': None, 'timeStamp': '2023-11-20T13:32:50.028Z', 'subscriptions': [{'assetId': 'abc123', 'topic': '*', 'tagId': None, 'valid': True}], 'message': None} How to send messages to WebSocketsAn aspect not to be overlooked is communication in the other direction, that is, sending of messages to WebSockets. For instance, you may have services invoked through REST APIs, or perhaps from a scheduler, and their job will be to transform such calls into configuration commands for IoT devices.
Here is the core part of such a service, reusing the same Akenza WebSocket connection:
# -*- coding: utf-8 -*- # Zato from zato.server.service import Service # ############################################################################################## # ############################################################################################## class DemoAkenzaSend(Service): # Our name name = 'demo.akenza.send' def handle(self) -> 'None': # The connection to use conn_name = 'Akenza' # Get a connection .. with self.out.wsx[conn_name].conn.client() as client: # .. and send data through it. client.send('Hello') # ############################################################################################## # ##############################################################################################Note that responses to the messages sent to Akenza will be received using your first service's on_message_received method - WebSockets-based messaging is inherently asynchronous and the channels are independent.
Now, we have a complete picture of real-time, IoT connectivity with Akenza and WebSockets. We are able to establish persistent, responsive connections to assets, we can subscribe to and send messages to devices, and that lets us build intelligent automation and integration architectures that make use of powerful, emerging technologies.
More resources➤ Python API integration tutorial
➤ What is an integration platform?
➤ Python Integration platform as a Service (iPaaS)
➤ What is an Enterprise Service Bus (ESB)? What is SOA?
Django Weblog: Nominate a Djangonaut for the 2024 Malcolm Tredinnick Memorial Prize
Hello Everyone 👋 It is that time of year again when we recognize someone from our community in memory of our friend Malcolm.
Malcolm was an early core contributor to Django and had both a huge influence and impact on Django as we know it today. Besides being knowledgeable he was also especially friendly to new users and contributors. He exemplified what it means to be an amazing Open Source contributor. We still miss him to this day.
The prizeThe Django Software Foundation Prizes page summarizes it nicely:
The Malcolm Tredinnick Memorial Prize is a monetary prize, awarded annually, to the person who best exemplifies the spirit of Malcolm’s work - someone who welcomes, supports, and nurtures newcomers; freely gives feedback and assistance to others, and helps to grow the community. The hope is that the recipient of the award will use the award stipend as a contribution to travel to a community event -- a DjangoCon, a PyCon, a sprint -- and continue in Malcolm’s footsteps.Please make your nominations using our form: 2024 Malcolm Tredinnick Memorial Prize.
We will take nominations until Monday, September 30th, 2024, Anywhere on Earth, and will announce the winner(s) soon after the next DSF Board meeting in October. If you have any questions please reach out to the DSF Board at foundation@djangoproject.com.