FLOSS Project Planets

Real Python: The Real Python Podcast – Episode #95: What Is a JIT and How Can Pyjion Speed Up Your Python?

Planet Python - Fri, 2022-01-28 07:00

How can you can speed up Python? Have you thought of using a JIT (Just-In-Time Compiler)? This week on the show, we have Real Python author and previous guest Anthony Shaw to talk about his project Pyjion, a drop-in JIT compiler for CPython 3.10.

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Stack Abuse: Split Train, Test and Validation Sets with Tensorflow Datasets - tfds

Planet Python - Fri, 2022-01-28 06:30
Introduction

Tensorflow Datasets, also known as tfds is is a library that serves as a wrapper to a wide selection of datasets, with proprietary functions to load, split and prepare datasets for Machine and Deep Learning, primarily with Tensorflow.

Note: While the Tensorflow Datasets library is used to get data, it's not used to preprocess data. That job is delegated to the Tensorflow Data (tf.data) library.

All of the datasets acquired through Tensorflow Datasets are wrapped into tf.data.Dataset objects - so you can programmatically obtain and prepare a wide variety of datasets easily! One of the first steps you'll be taking after loading and getting to know a dataset is a train/test/validation split.

In this guide, we'll take a look at what training, testing and validation sets are before learning how to load in and perform a train/test/validation split with Tensorflow Datasets.

Training and Testing Sets

When working on supervised learning tasks - you'll want to obtain a set of features and a set of labels for those features, either as separate entities or within a single Dataset. Just training the network on all of the data is fine and dandy - but you can't test its accuracy on that same data, since evaluating the model like that would be rewarding memorization instead of generalization.

Instead - we train the models on one part of the data, holding off a part of it to test the model once it's done training. The ratio between these two is commonly 80/20, and that's a fairly sensible default. Depending on the size of the dataset, you might opt for different ratios, such as 60/40 or even 90/10. If there are many samples in the testing set, there's no need to have a large percentage of samples dedicated to it. For instance, if 1% of the dataset represents 1.000.000 samples - you probably don't need more than that for testing!

For some models and architectures - you won't have any test set at all! For instance, when training Generative Adversarial Networks (GANs) that generate images - testing the model isn't as easy as comparing the real and predicted labels! In most generative models (music, text, video), at least as of now, a human is typically required to judge the outputs, in which cases, a test set is totally redundant.

The test set should be held out from the model until the testing stage, and it should only ever be used for inference - not training. It's common practice to define a test set and "forget it" until the end stages where you validate the model's accuracy.

Validation Sets

A validation set is an extremely important, and sometimes overlooked set. Validation sets are oftentimes described as taken "out of" test sets, since it's convenient to imagine, but really - they're separate sets. There's no set rule for split ratios, but it's common to have a validation set of similar size to the test set, or slightly smaller - anything along the lines of 75/15/10, 70/15/15, and 70/20/10.

A validation set is used during training, to approximately validate the model on each epoch. This helps to update the model by giving "hints" as to whether it's performing well or not. Additionally, you don't have to wait for an entire set of epochs to finish to get a more accurate glimpse at the model's actual performance.

Note: The validation set isn't used for training, and the model doesn't train on the validation set at any given point. It's used to validate the performance in a given epoch. Since it does affect the training process, the model indirectly trains on the validation set and thus, it can't be fully trusted for testing, but is a good approximation/proxy for updating beliefs during training.

This is analogous to knowing when you're wrong, but not knowing what the right answer is. Eventually, by updating your beliefs after realizing you're not right, you'll get closer to the truth without explicitly being told what it is. A validation set indirectly trains your knowledge.

Using a validation set - you can easily interpret when a model has begun to overfit significantly in real-time, and based on the disparity between the validation and training accuracies, you could opt to trigger responses - such as automatically stopping training, updating the learning rate, etc.

Split Train, Test and Validation Sets using Tensorflow Datasets

The load() function of the tfds module loads in a dataset, given its name. If it's not already downloaded on the local machine - it'll automatically download the dataset with a progress bar:

import tensorflow_datasets as tfds # Load dataset dataset, info = tfds.load("cifar10", as_supervised=True, with_info=True) # Extract informative features class_names = info.features["label"].names n_classes = info.features["label"].num_classes print(class_names) # ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'] print(n_classes) # 10

One of the optional arguments you can pass into the load() function is the split argument.

The new Split API allows you to define which splits of the dataset you want to split out. By default, it only supports a 'train' and 'test' split - these are the "official" splits. There's no valid split!

They correspond to the tfds.Split.TRAIN and tfds.Split.TEST enums, which used to be exposed through the API in an earlier version. It's curious to note that tfds.Split.VALIDATION does exist, but doesn't have a string represented alias in the new API.

It's worth noting that the strings used to name these aren't really relevant, as long as you achieve the right proportions.

You can really slice a Dataset into any arbitrary number of sets, though, we typically do three - train_set, test_set, valid_set:

test_set, valid_set, train_set = tfds.load("cifar10", split=["train[:10%]", "train[10%:25%]", "train[25%:]"], as_supervised=True) print("Train set size: ", len(train_set)) # Train set size: 37500 print("Test set size: ", len(test_set)) # Test set size: 5000 print("Valid set size: ", len(valid_set)) # Valid set size: 7500

We've taken out the first 10% of the dataset, and extracted it into the test_set. The slice between 10% and 25% is assigned to the valid_set and everything beyond 25% is the train_set. This is validated through the sizes of the sets themselves as well.

Note: It's worth noting that we've used the train split, even though we split the dataset into other sets as well. Again, the only two accepted strings are train and test, but these don't really mean anything other than to let you know which parts are which.

Instead of percentages, you can use absolute values or a mix of percentage and absolute values:

# Absolute value split test_set, valid_set, train_set = tfds.load("cifar10", split=["train[:2500]", "train[2500:5000]", "train[5000:]"], as_supervised=True) print("Train set size: ", len(train_set)) # Train set size: 45000 print("Test set size: ", len(test_set)) # Test set size: 2500 print("Valid set size: ", len(valid_set)) # Valid set size: 2500 # Mixed notation split # 5000 - 50% (25000) left unassigned test_set, valid_set, train_set = tfds.load("cifar10", split=["train[:2500]", # First 2500 are assigned to `test_set` "train[2500:5000]", # 2500-5000 are assigned to `valid_set` "train[50%:]"], # 50% - 100% (25000) assigned to `train_set` as_supervised=True)

You can additionally do a union of sets, which is less commonly used, as sets are interleaved then:

train_and_test, half_of_train_and_test = tfds.load("cifar10", split=['train+test', 'train[:50%]+test'], as_supervised=True) print("Train+test: ", len(train_and_test)) # Train+test: 60000 print("Train[:50%]+test: ", len(half_of_train_and_test)) # Train[:50%]+test: 35000

These two sets are now heavily interleaved.

Even Splits for N Sets

Again, you can create any arbitrary number of splits, just by adding more splits to the split list:

split=["train[:10%]", "train[10%:20%]", "train[20%:30%]", "train[30%:40%]", ...]

However, if you're creating many splits, especially if they're even - the strings you'll be passing in are very predictable. This can be automated by creating a list of strings, with a given equal interval (such as 10%) instead. For exactly this purpose, the tfds.even_splits() function generates a list of strings, given a prefix string and the desired number of splits:

import tensorflow_datasets as tfds s1, s2, s3, s4, s5 = tfds.even_splits('train', n=5) # Each of these elements is just a string split_list = [s1, s2, s3, s4, s5] print(f"Type: {type(s1)}, contents: '{s1}'") # Type: <class 'str'>, contents: 'train[0%:20%]' for split in split_list: test_set = tfds.load("cifar10", split=split, as_supervised=True) print(f"Test set length for Split {split}: ", len(test_set))

This results in:

Test set length for Split train[0%:20%]: 10000 Test set length for Split train[20%:40%]: 10000 Test set length for Split train[40%:60%]: 10000 Test set length for Split train[60%:80%]: 10000 Test set length for Split train[80%:100%]: 10000

Alternatively, you can pass in the entire split_list as the split argument itself, to construct several split datasets outside of a loop:

ts1, ts2, ts3, ts4, ts5 = tfds.load("cifar10", split=split_list, as_supervised=True) Conclusion

In this guide, we've taken a look at what the training and testing sets are as well as the importance of validation sets. Finally, we've explored the new Splits API of the Tensorflow Datasets library, and performed a train/test/validation split.

Categories: FLOSS Project Planets

OpenSense Labs: The ideal Drupal 9 developer

Planet Drupal - Fri, 2022-01-28 06:06
The ideal Drupal 9 developer Maitreayee Bora Fri, 01/28/2022 - 16:36

Drupal can be seen emerging as one of the leading CMS for organizations today. Therefore, organizations are presently migrating to the latest released version of Drupal 9. Are you also planning to do the same? But for that, you’ll need the help of Drupal 9 experts. So, do you know exactly what is the expertise of a Drupal 9 developer based on which you can trust them to assist you with the migration process? This article helps you with all the required information and knowledge that can further enable you to choose the right Drupal 9 developer.  

Who can be your ideal Drupal 9 developer? 

This is when the real challenge begins. You’ll have to choose the best Drupal 9 developer for the migration process which is the need of the hour. So, how will you make the right choice? It’s something to ponder about. Well, without taking much of your time, let me just help you catch on to the expertise that a Drupal 9 developer should possess.  

Knowing the basics is a must 

If you want to experience a successful migration then you must find a Drupal 9 developer who is well versed with the very Drupal basics. To be more precise, he or she should be fully aware of the recent Drupal developments, latest and upcoming Drupal releases such as Drupal 9.1, Drupal 9.2, and Drupal 9.3 which were released recently and about the upcoming release of Drupal 9.4 and Drupal 10 in  2022. Keeping in mind about the new Olivero frontend theme which was added in Drupal 9.1.0 can be of great benefit for the Drupal 9 developers. Improvements in installer performance and frontend performance, enhanced security provisions, various additions to Claro administration theme, support for WebP image format, and support for CKEditor 5 are some of the recent developments that have happened in Drupal 9 minor releases and that every Drupal 9 developer should know about when building and maintaining a Drupal 9 site.

Moving forward let me now talk about the upgrade tools. So, it is essentially important for a Drupal 9 developer to be well equipped with knowledge on the right tools that are required in the Drupal 9 upgrade process. You can read more about Drupal 9 upgrade tools here.

And finally, how can we forget about the significant new Drupal 9 features? The new capabilities added to Drupal 9 such as replacing Symfony 3 with Symfony 4, removing the support for deprecated codes in D8, and replacing Panelizer with the Layout Builder and others should be in the know-how of the Drupal 9 developers at the time of migration. 

Well certified professional

It’s quite obvious that Acquia-certified professionals are mostly preferred. So, you’ll have to prioritize Drupal 9 developers who are awarded Acquia’s Drupal development, site building, front-end development, and backend development certifications for their great work efficiency.

Let me also tell you that we at OpenSense Labs have our own proficient Acquia-certified Drupal developers. And most proudly, we have an Acquia Certified Drupal 9 developer, Pritish Kumar, who is a Technical Lead at OpenSense Labs. 


Good knowledge of PHP, & PHP Frameworks

As we are familiar with the fact that Drupal 7 was built using PHP, and Drupal 8 was built using Symfony, (a PHP framework), therefore, it’s important for a Drupal 9 developer to be well aware of PHP and PHP frameworks. Also, to set up a Drupal database, it is essential to have a proper understanding of MySQL. So, these are some of the expertise which a Drupal 9 developer cannot afford to miss on. Check out this guide on updating PHP to get the download of PHP and its frameworks and why you need to be updated.

Hands-on experience in installing and creating Drupal modules

Since Drupal modules allow you to integrate many third-party services and tools with your website. In such a manner, your website can possibly become a hub for acquiring most of your routinely used tools from one place, additionally a much more highly customized tool for your website visitors to use. Well, there isn’t any particular set of lists for specific modules that your Drupal 9 developer should include in your website, but he or she should usually be familiar with installing the views module, panels module, and cTools module. Additionally, your Drupal 9 developer needs to know how to develop modules by himself. Learn more about the key modules to start your Drupal 9 site here.

Proper understanding of Drupal theme development

Your Drupal 9 developer should have a proper understanding of the Drupal theme development. Do you want to know why? It’s because the theme of your website plays a significant role in influencing the appearance of your site (or user interface) and also the user experience you offer. Therefore, having a good understanding of UI/UX design is highly recommended. The Drupal community facilitates with a thousand themes to choose from (over 2,000+ at the moment), you will find that most of the themes will meet your design requirements but not all of them. So, if you want a unique theme for your website then your Drupal 9 developer should be able to build a custom theme matching your expectations. 

Good hold on version control with Git

You must be knowing that Git and GitHub are used by developers to attain version control over project sizes and types. So, utilizing Git and GitHub have commonly become standard practice, and you will feel relief in knowing that your Drupal 9 developer uses these tools as well. It is seen that Git on its own helps developers to work more efficiently by making it simpler for them to manage their project files. Furthermore, by storing code in a GitHub repository, for instance, your developer is able to store the code for every version of your Drupal project he or she builds. In that particular way, even if in the future something goes wrong with the code, your Drupal 9 developer will be able to revert it back to a prior version. 

Proficient in debugging and updating a Drupal site

Along with utilizing Git version control, another significant skill that Drupal 9 developers must have is the capability to recognize bugs in their code while they write it. Even if no Drupal 9 developer is able to identify every potential error or issue during the coding procedure, but if he or she is able to at least address errors during the coding procedure, it can save a lot of time and prevent further issues from occurring in the future. Although debugging during the developing process is beneficial, addressing bugs that arise after the site goes live is even more essential. But again, not all the Drupal 9 developers provide post-launch maintenance, if they do then they can certainly ensure that your website is bug-free, up-to-date, and offering the perfect user experience for your users.

A strong grip on the frontend languages

Are you familiar with the fact that a Drupal 9 developer must be essentially well-versed with JavaScript? Well, there is a need to specifically know jQuery - a JavaScript library that helps in carrying out standard DOM (Document Object Model) and AJAX functions. Moreover, working experience with Angular, React, or Vue can be an added advantage for a Drupal 9 developer as one of these JavaScript frameworks can be used as a frontend of a headless Drupal solution. 

An active participant in the Drupal community 

Last but not the least, a Drupal 9 developer need to actively contribute to the Drupal community so that if ever an issue is being identified, he or she should be able to fix it and share the patch or even report the issue to the community where other people can help him/her to resolve it.

Learn more:

Conclusion

Well, you will find many Drupal 9 developers but the biggest challenge is to find the right one who can offer you the best quality services. I am still hoping that after understanding the above-mentioned skills, you will be able to choose the perfect Drupal 9 developer, or team of developers, who can make your Drupal project a huge success.

I think you can finish up your search here with us since, in OSL, you will find Drupal 9 experts who can successfully build your Drupal projects in accordance with your expectations and within the stipulated time period. So, what are you waiting for? Let’s talk.

Articles Off
Categories: FLOSS Project Planets

Web Review, Week 2022-04

Planet KDE - Fri, 2022-01-28 05:43

Let’s go for my web review for the week 2022-04.

Nvidia (NVDA) Quietly Prepares to Abandon $40 Billion Arm Bid - Bloomberg

Tags: tech, business, arm, nvidia

This is likely for the best.

https://www.bloomberg.com/news/articles/2022-01-25/nvidia-is-said-to-quietly-prepare-to-abandon-takeover-of-arm


How E Ink Developed Full-Color e-Paper

Tags: tech, display, e-ink

Very interesting story (even though it feels a bit like advertisements at times) about the quest to get color into e-ink displays. Goes on to explain a bit of the technology behind those.

https://spectrum.ieee.org/how-e-ink-developed-full-color-epaper


LogJ4 Security Inquiry – Response Required | daniel.haxx.se

Tags: tech, security, foss

This is… whoa… such a level of incompetency is incredible.

https://daniel.haxx.se/blog/2022/01/24/logj4-security-inquiry-response-required/


Over 90 WordPress themes, plugins backdoored in supply chain attack

Tags: tech, security, supply-chain, wordpress

Now this is a really bad supply-chain attack for the WordPress ecosystem.

https://www.bleepingcomputer.com/news/security/over-90-wordpress-themes-plugins-backdoored-in-supply-chain-attack/


Google Details Two Zero-Day Bugs Reported in Zoom Clients and MMR Servers

Tags: tech, zoom, security

Also a good reminder of why the fact that it’s proprietary makes things harder security wise.

https://thehackernews.com/2022/01/google-details-two-zero-day-bugs.html


Block Protocol - an open standard for data-driven blocks

Tags: tech, interoperability, api

That looks like an interesting way to share data between applications. Reminds a bit of the semantic web movement back in early 2000s (talking entities and aggregates), maybe less heavy on the schema side though. I’d need to look at the specification more.

https://blockprotocol.org/


Scrum Rant

Tags: tech, scrum, xp, craftsmanship

Or why I’m actually glad I’m not certified even though I could be. This is a good way to stay balanced about all this. At least I’m trying to do my part trying to help people also on the technical areas which are mostly ignored by the “Scrum Industrial Complex” (as Ron Jeffries puts it). Clearly the scrum organizations are not interested in taking up that mantle so it falls onto us.

https://ronjeffries.com/articles/-z022/01121/scrum-rant/


Pass-by-value, reference, and assignment | Pydon’t 🐍 | Mathspp

Tags: tech, programming, python

Good explanation of the Python object model and how parameters are passed to functions. This can look rather surprising or confusing if you come from another language. Always keep in mind: mutability and the fact that variable name are just labels play a big role in this. That means you might need to copy explicitly in the case of mutable objects… which makes the performance burden explicit as well (and this means you need to pick between shallow or deep copying).

PS: I really mean “label” here (like in the post), it’s a bit different from “pointer” since you don’t get the pointer geometry (you can only reassign). From the C++ perspective I’d say it behaves as if all variables were “reassignable non-const references” only, something like that.

https://mathspp.com/blog/pydonts/pass-by-value-reference-and-assignment


Don’t Wrap Instance Methods with ‘functools.lru_cache’ Decorator in Python · Redowan’s Reflections

Tags: tech, programming, python

Interesting caveat on how lru_cache is used in Python.

https://rednafi.github.io/reflections/dont-wrap-instance-methods-with-functoolslru_cache-decorator-in-python.html


systemd by example - the systemd playground

Tags: tech, linux, systemd

Nice way to learn systemd uses.

https://systemd-by-example.com/


Five Easy to Miss PostgreSQL Query Performance Bottlenecks

Tags: tech, databases, performance, postgresql

Interesting tips for potential bottlenecks in your queries.

https://pawelurbanek.com/postgresql-query-bottleneck


Improving end to end tests reliability / frantic.im

Tags: tech, tests

Good guidelines to improve end to end tests. I especially relate to the first one, the test API is very important for those, otherwise they become a chore to maintain and understand.

https://frantic.im/e2e-tests/


API development with type-safety across the entire stack

Tags: tech, services, api

A bit on the fence with this still… but that sounds like an interesting path to explore in dealing with service APIs. A DSL with a code generator allows to neatly separate concerns if done properly. I wonder where the catches are (apart from the obvious strong coupling to Golang in that particular case).

https://blog.lawrencejones.dev/goa/


Type-Driven Development - IMAP parsing

Tags: tech, type-systems, rust, imap

Interesting example of using a strong type system to avoid mistakes in code using a parsing and serialization library. The fact that it’s about IMAP and I’m still traumatized by it has nothing to do with my interest in that article, really.

https://duesee.dev/p/type-driven-development/


Static Typing Python Decorators · Redowan’s Reflections

Tags: tech, programming, python, type-systems, mypy, pyright

Type annotations become quickly complex with Python. This is in part because it’s rolled out incrementally on top or existing practices. Here it collides a bit with decorators. Nice to see this hole is getting plugged though. Also nice to discover an alternative to mypy which seems a bit more precise (at least for the time being).

https://rednafi.github.io/reflections/static-typing-python-decorators.html


Why Static Languages Suffer From Complexity

Tags: tech, programming, type-systems

Lots of food for thought in that article. Shows very well some of the trade offs in complexity languages have to deal with when they bring a strong type system to the table. Hence some limitations being in place… and that’s why it can get into nasty metaprogramming territory quickly. This show a couple of interesting examples on how this can be mitigated although we don’t have any perfect solution yet.

https://hirrolot.github.io/posts/why-static-languages-suffer-from-complexity


Stop paying tech debts, start maintaining code

Tags: tech, technical-debt, maintenance, complexity

Good reminder of why “tech debt” is not a so bright metaphor. I find it useful sometimes but clearly it became overused in the industry (often a sign of something loosing its meaning whatever it was). That’s why lately I’m talking about complexity, some of it being legitimate so you need to keep the illegitimate one at bay. I like the focus on maintenance in that post. Also there are a couple of good ideas on how to schedule the maintenance tasks in your project.

https://blog.testdouble.com/posts/2022-01-20-stop-paying-debts-start-maintaining-code/


Bye for now!

Categories: FLOSS Project Planets

Pages