Planet Python

Subscribe to Planet Python feed
Planet Python - http://planetpython.org/
Updated: 8 hours 4 min ago

BreadcrumbsCollector: Meet python-mockito and leave built-in mock & patch behind

Sun, 2021-04-18 13:40
Batteries included can give you headache

unittest.mock.[Magic]Mock and unittest.patch are powerful utilities in the standard library that can help us in writing tests. Although it is easy to start using them, there are several pitfalls waiting for unaware beginners. For example, forgetting about optional spec or spec_set can give us green tests for code that will fail in prod immediately. You can find several other examples + solutions in the second half of my other post – How to mock in Python? Almost definitive guide.

Last but not least – vocabulary used in the standard library stands at odds with the general testing nomenclature. This has a negative effect on learning effective testing techniques. Whenever a Pythonista needs to replace a dependency in tests, they use a mock. Generally, this type of replacement object is called Test Double. Mock is merely one specialized type of a Test Double. What is more, there are limited situations when it’s the right Test Double. You can find more details in Robert Martin’s post The Little Mocker. Or just stay with me for the rest of this article – I’ll guide you through. To summarise, if a philosopher Ludwig Wittgenstein was right by saying…

The limits of my language means the limits of my world

…then Pythonistas are missing A LOT by sticking to “mocking”.

python-mockito – a modern replacement for Python mock & patch

It is said that experience is the best teacher. However, experience does not have to be our own – if we can learn from others’ mistakes, then it’s even better. Developers of other programming languages also face the challenges of testing. The library I want to introduce to you – python-mockito – is a port of Java’s testing framework with the same name. It’s safe by default unlike mock from the standard library. python-mockito has a nice, easy to use API. It also helps you with the maintenance of your tests by being very strict about unexpected behaviours. Plus, it has a pytest integration – pytest-mockito for seamless use and automatic clean up.

Introduction to test double types

I must admit that literature is not 100% consistent on taxonomy of test doubles, but generally accepted definitions are:

  • Dummy – an object required to be passed around (e.g. to __init__) but often is not used at all during test execution
  • Stub – an object returning hardcoded data which was set in advance before test execution
  • Spy – an object recording interactions and exposing API to query it for (e.g. which methods were called and with what arguments)
  • Mock – an object with calls expectations set in advance before test execution
  • Fake – an object that behaves just like a production counterpart, but has a simpler implementation that makes it unusable outside tests (e.g. in-memory storage).

If that’s the first time you see test double types and you find it a bit imprecise or overlapping, that’s good. Their implementation can be similar at times. What makes a great difference is how they are used during the Assert phase of a test. (Quick reminder – a typical test consists of Arrange – Act – Assert phases).

A great rule of thumb I found recently gives following hints when to use which type:

  • use Dummy when a dependency is expected to remain unused
  • use Stub for read-only dependency
  • use Spy for write-only dependency
  • use Mock for write-only dependency used across a few tests (DRY expectation)
  • use Fake for dependency that’s used for both reading and writing.

plus mix features when needed or intentionally break the rules when you have a good reason to do so.

This comes from The Test Double Rule of Thumb article by Matt Parker, linked at the end of this post.

Of course we use test doubles only when we have to. Don’t write only unit-tests separately for each class/function, please.

python-mockito versus built-in mock and patch Installation

I’m using Python3.9 for the following code examples. unitttest.mock is included. To get python-mockito run

pip install mockito pytest-mockito

pytest-mockito will be get handy a bit later

Implementing Dummy

Sometimes Dummy doesn’t even require any test double library. When a dependency doesn’t really have any effect on the test and/or is not used during execution, we could sometimes pass just always None. If mypy (or other type checker) complains and a dependency is simple to create (e.g. it is an int), we create and pass it.

def test_sends_request_to_3rd_party(): # setting up spy (ommitted) interfacer = ThirdPartyInterfacer(max_returned_results=0) # "0" is a dummy interfacer.create_payment(...) # spy assertions (ommitted)

If a dependency is an instance of a more complex class, then we can use unittest.mock.Mock + seal or mockito.mock. In the following example, we’ll be testing is_healthy method of some Facade. Facades by design can get a bit incohesive and use dependencies only in some methods. Dummy is an ideal choice then:

from logging import Logger from unittest.mock import Mock, seal from mockito import mock class PaymentsFacade: def __init__(self, logger: Logger) -> None: self._logger = logger def is_healthy(self) -> bool: # uncomment this line if you want to see error messages # self._logger.info("Checking if is healthy!") return True def test_returns_true_for_healthcheck_stdlib(): logger = Mock(spec_set=Logger) seal(logger) facade = PaymentsFacade(logger) assert facade.is_healthy() is True def test_returns_true_for_healthcheck_mockito(): logger = mock(Logger) facade = PaymentsFacade(logger) assert facade.is_healthy() is True

python-mockito requires less writing and also error message is much better (at least in Python 3.9). Unittest Mock (part of a HUGE stack trace):

if self._mock_sealed: attribute = "." + kw["name"] if "name" in kw else "()" mock_name = self._extract_mock_name() + attribute > raise AttributeError(mock_name) E AttributeError: mock.info # WTH? /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/unittest/mock.py:1017: AttributeError

python-mockito:

self = <dummy.PaymentsFacade object at 0x7fb3880cba00> def is_healthy(self) -> bool: > self._logger.info("Checking if is healthy!") E AttributeError: 'Dummy' has no attribute 'info' configured # CLEAR dummy.py:12: AttributeError

Dummies are useful when we know they will not (or should not) be used during the test execution. As a side note, dependencies like logger are rarely problematic in tests and we could also write the same test scenario without using test double at all.

Implementing Stub

With stubs we are only interested in ensuring they will return some pre-programmed data. WE DO NOT EXPLICITLY VERIFY IF THEY WERE CALLED DURING ASSERT. Ideally, we should see if they were used or not purely by looking at the test itself.

In the following example, our PaymentsFacade has a dependancy on PaymentsProvider that is an interfacer to some external API. Obviously, we cannot use the real implementation it in the test. For this particular case, we have a read-only collaboration. Facade asks for payment status and interprets it to tell if the payment is complete.

from enum import Enum from unittest.mock import Mock, seal from mockito import mock class PaymentStatus(Enum): AUTHORIZED = 'AUTHORIZED' CAPTURED = 'CAPTURED' RELEASED = 'RELEASED' class PaymentsProvider: def __init__(self, username: str, password: str) -> None: self._auth = (username, password) def get_payment_status(self, payment_id: int) -> PaymentStatus: # make some requests using auth info raise NotImplementedError class PaymentsFacade: def __init__(self, provider: PaymentsProvider) -> None: self._provider = provider def is_paid(self, payment_id: int) -> None: status = self._provider.get_payment_status(payment_id) is_paid = status == PaymentStatus.CAPTURED return is_paid def test_returns_true_for_status_captured_stdlib(): provider = Mock(spec_set=PaymentsProvider) provider.get_payment_status = Mock(return_value=PaymentStatus.CAPTURED) seal(provider) facade = PaymentsFacade(provider) assert facade.is_paid(1) is True def test_returns_true_for_status_captured_mockito(when): provider = mock(PaymentsProvider) when(provider).get_payment_status(2).thenReturn(PaymentStatus.CAPTURED) facade = PaymentsFacade(provider) assert facade.is_paid(2) is True

python-mockito gives a test-specific api. when (coming from pytest-mockito) is called on a mock specifying the argument. Next, thenReturn defines what will be returned. Analogously, there is a method thenRaise for raising an exception. Notice a difference (except length) – if we called a mock with an unexpected argument, mockito raises an exception:

def test_returns_true_for_status_captured_mockito(when): provider = mock(PaymentsProvider) when(provider).get_payment_status(2).thenReturn(PaymentStatus.CAPTURED) facade = PaymentsFacade(provider) assert facade.is_paid(3) is True # stub is configured with 2, not 3 # stacktrace def is_paid(self, payment_id: int) -> None: > status = self._provider.get_payment_status(payment_id) E mockito.invocation.InvocationError: E Called but not expected: E E get_payment_status(3) E E Stubbed invocations are: E E get_payment_status(2) stub.py:28: InvocationError

If we don’t want this behaviour, we can always use ellipsis:

def test_returns_true_for_status_captured_mockito(when): provider = mock(PaymentsProvider) when(provider).get_payment_status(...).thenReturn(PaymentStatus.CAPTURED) facade = PaymentsFacade(provider) assert facade.is_paid(3) is True

If we want to remain safe in every case, we should also use type checker (e.g. mypy).

Digression – patching

when can be also used for patching. Let’s assume PaymentsFacade for some reason creates an instance of PaymentsProvider, so we cannot explicitly pass mock into __init__:

class PaymentsFacade: def __init__(self) -> None: self._provider = PaymentsProvider( os.environ["PAYMENTS_USERNAME"], os.environ["PAYMENTS_PASSWORD"], ) def is_paid(self, payment_id: int) -> None: status = self._provider.get_payment_status(payment_id) is_paid = status == PaymentStatus.CAPTURED return is_paid

Then, monkey patching is a usual way to go for Pythonistas:

def test_returns_true_for_status_captured_stdlib_patching(): with patch.object(PaymentsProvider, "get_payment_status", return_value=PaymentStatus.CAPTURED) as mock: seal(mock) facade = PaymentsFacade() assert facade.is_paid(1) is True def test_returns_true_for_status_captured_mockito_patching(when): when(PaymentsProvider).get_payment_status(...).thenReturn( PaymentStatus.CAPTURED ) facade = PaymentsFacade() assert facade.is_paid(3) is True

python-mockito implementation is even shorter with patching than without it. But do not treat this as an invitation for patching An important note – context manager with patch.object makes sure there is a cleanup. For pytest, I strongly recommend using fixtures provided by pytest-mockito. They will do cleanup automatically for you, Otherwise, one would have to call function mockito.unstub manually. More details in the documentation of pytest-mockito and python-mockito. Documentation of python-mockito states there is also a way to use it with context managers, but personally I’ve never done so.

Monkey patching is dubious practice at best – especially if done on unstable interfaces. It should be avoided because it tightly couples tests with the implementation. It can be your last resort, though. The frequent need for patching in tests is a strong indicator of untestable design or poor test or both.

Digression – pytest integration

For daily use with the standard library mocks, there is a lib called pytest-mock. It provides mocker fixture for easy patching and automatic cleanup. The outcome is similar to pytest-mockito.

Implementing Spy

Now, let’s consider a scenario of starting a new payment. PaymentsFacade calls PaymentsProvider after validating input and converting money amount to conform to API’s expectation.

from dataclasses import dataclass from decimal import Decimal from unittest.mock import Mock, seal from mockito import mock, verify @dataclass(frozen=True) class Money: amount: Decimal currency: str def __post_init__(self) -> None: if self.amount < 0: raise ValueError("Money amount cannot be negative!") class PaymentsProvider: def __init__(self, username: str, password: str) -> None: self._auth = (username, password) def start_new_payment(self, card_token: str, amount: int) -> None: raise NotImplementedError class PaymentsFacade: def __init__(self, provider: PaymentsProvider) -> None: self._provider = provider def init_new_payment(self, card_token: str, money: Money) -> None: assert money.currency == "USD", "Only USD are currently supported" amount_in_smallest_units = int(money.amount * 100) self._provider.start_new_payment(card_token, amount_in_smallest_units) def test_calls_provider_with_799_cents_stdlib(): provider = Mock(spec_set=PaymentsProvider) provider.start_new_payment = Mock(return_value=None) seal(provider) facade = PaymentsFacade(provider) facade.init_new_payment("nonsense", Money(Decimal(7.99), "USD")) provider.start_new_payment.assert_called_once_with("nonsense", 799) def test_calls_provider_with_1099_cents_mockito(when): provider = mock(PaymentsProvider) when(provider).start_new_payment(...).thenReturn(None) facade = PaymentsFacade(provider) facade.init_new_payment("nonsense", Money(Decimal(10.99), "USD")) verify(provider).start_new_payment("nonsense", 1099)

Here, a major difference between unittest.mock and mockito is that the latter:

  • lets us specify input arguments (not shown here, but present in previous examples)
  • (provided input arguments were specified) fails if there are any additional, unexpected interactions.

The second behaviour is added by pytest-mockito that apart from calling unstub automatically, it also calls verifyNoUnwantedInvocations.

Implementing Mock

Let’s consider identical test scenario as for Spy – but this time assume we have some duplication in verification and want to refactor Spy into Mock. Now, the funniest part – it turns out that standard library that has only classes called “Mock” does not really make it any easier to create mocks as understood by literature. On the other hand, it’s such a simple thing that we can do it by hand without any harm. To make this duel even, I’ll use pytest fixtures for both:

@pytest.fixture() def stdlib_provider(): provider = Mock(spec_set=PaymentsProvider) provider.start_new_payment = Mock(return_value=None) seal(provider) yield provider provider.start_new_payment.assert_called_once_with("nonsense", 799) def test_returns_none_for_new_payment_stdlib(stdlib_provider): facade = PaymentsFacade(stdlib_provider) result = facade.init_new_payment("nonsense", Money(Decimal(7.99), "USD")) assert result is None @pytest.fixture() def mockito_provider(expect): provider = mock(PaymentsProvider) expect(provider).start_new_payment("nonsense", 1099) return provider def test_returns_none_for_new_payment_mockito(mockito_provider): facade = PaymentsFacade(mockito_provider) result = facade.init_new_payment("nonsense", Money(Decimal(10.99), "USD")) assert result is None

expect will also call verifyUnwantedInteractions to make sure there are no unexpected calls.

Implementing Fake

For Fakes, there are no shortcuts or libraries. We are better off writing them manually. You can find an example here – InMemoryAuctionsRepository . It is meant to be a test double for a real implementation that uses a relational database.

Summary

Initially this blog post was meant to be only about the tool, but I couldn’t resist squeezing in some general advice about testing techniques.

While python-mockito does not solve an issue with calling every test double a mock, it definitely deserves attention. Test doubles created with it require less code and are by default much more secure and strict than those using unittest.mock. Regarding cons, camelCasing can be a little distracting at first, but this is not a huge issue. The most important thing is that safety we get out-of-the-box with python-mockito has been being added to python standard library over several versions and is not as convenient.

I strongly recommend to read python-mockito’s documentation and try it out!

Further reading

The post Meet python-mockito and leave built-in mock & patch behind appeared first on Breadcrumbs Collector.

Categories: FLOSS Project Planets

Talk Python to Me: #312 Python Apps that Scale to Billions of Users

Sun, 2021-04-18 04:00
How do you build Python applications that can handling literally billions of requests. I has certainly been done to great success with places like YouTube (handling 1M requests / sec) and Instagram as well as internal pricing APIs at places like PayPal and other banks. <br/> <br/> While Python can be fast at some operations and slow at others, it's generally not so much about language raw performance as it is about building an architecture for this scale. That's why it's great to have Julian Danjou on the show today. We'll dive into his book "The Hacker's Guide to Scaling Python" as well as some of his performance work he's doing over at Datadog.<br/> <br/> <strong>Links from the show</strong><br/> <br/> <div><b>Julian on Twitter</b>: <a href="https://twitter.com/juldanjou" target="_blank" rel="noopener">@juldanjou</a><br/> <b>Scaling Python Book</b>: <a href="https://scaling-python.com/" target="_blank" rel="noopener">scaling-python.com</a><br/> <br/> <b>DD Trace production profiling code</b>: <a href="https://github.com/DataDog/dd-trace-py" target="_blank" rel="noopener">github.com</a><br/> <b>Futurist package</b>: <a href="https://pypi.org/project/futurist/" target="_blank" rel="noopener">pypi.org</a><br/> <b>Tenacity package</b>: <a href="https://tenacity.readthedocs.io/en/latest/" target="_blank" rel="noopener">tenacity.readthedocs.io</a><br/> <b>Cotyledon package</b>: <a href="https://cotyledon.readthedocs.io/en/latest/" target="_blank" rel="noopener">cotyledon.readthedocs.io</a><br/> <b>Locust.io Load Testing</b>: <a href="https://locust.io/" target="_blank" rel="noopener">locust.io</a><br/> <b>Datadog</b>: <a href="talkpython.fm/datadog" target="_blank" rel="noopener">datadoghq.com</a><br/> <b>daiquiri package</b>: <a href="https://daiquiri.readthedocs.io/en/latest" target="_blank" rel="noopener">daiquiri.readthedocs.io</a><br/> <br/> <b>YouTube Live Stream Video</b>: <a href="https://www.youtube.com/watch?v=MEyxf7fOoxg" target="_blank" rel="noopener">youtube.com</a><br/></div><br/> <strong>Sponsors</strong><br/> <br/> <a href='https://talkpython.fm/45drives'>45Drives</a><br> <a href='https://talkpython.fm/training'>Talk Python Training</a>
Categories: FLOSS Project Planets

Fabio Zadrozny: PyDev 8.3.0 (Java 11, Flake 8 , Code-completion LRU, issue on Eclipse 4.19)

Sun, 2021-04-18 00:53

PyDev 8.3.0 is now available!

Let me start with some warnings here:

First, PyDev now requires Java 11. I believe that Java 11 is pretty standard nowadays and the latest Eclipse also requires Java 11 (if you absolutely need Java 8, please keep using PyDev 8.2.0 -- or earlier -- indefinitely, otherwise, if you are still using Java 8, please upgrade to Java 11 -- or higher).

Second, Eclipse 2021-03 (4.19) is broken and cannot be used with any version of PyDev due to https://bugs.eclipse.org/bugs/show_bug.cgi?id=571990, so, if you use PyDev, please keep to Eclipse 4.18 (or get a newer if available) -- the latest version of PyDev warns about this, older versions will not complain but some features will not function properly, so, please skip on using Eclipse 4.19 if you use PyDev.

Now, on to the goodies ;)

On the linters front, the configurations for the linters can now be saved to the project or user settings and flake8 has an UI for configuration which is much more flexible, allowing to change the severity of any error.

A new option which allows all comments to be added to a single indent was added (and this is now the default).

The code-completion and quick fixes which rely on automatically adding some import will now cache the selection so that if a given token is imported that selection is saved and when asked again it'll be reused (so, for instance, if you just resolved Optional to be typing.Optional, that'll be the first choice the next time around).

Environment variables are now properly supported in .pydevproject. The expected format is: ${env_var:VAR_NAME}.

Acknowledgements

Thanks to Luis Cabral, who is now helping in the project for doing many of those improvements (and to the Patrons at https://www.patreon.com/fabioz which enabled it to happen).

 Enjoy!

Categories: FLOSS Project Planets

Janusworx: I Can’t Do This Yet … Updated

Sat, 2021-04-17 05:03

Updated version of the post. I seem to have somehow, mangled the old one.
I’ll just blame it on the gremlins in the cloud.

Read more… (4 min remaining to read)

Categories: FLOSS Project Planets

Montreal Python User Group: Montréal-Python 85 – Polite Koala

Sat, 2021-04-17 00:00

Join us for the April monthly meeting for the Pythonistas of Montréal on Monday the 19th at 6pm.

This month, we have for you:

  • Machine learning in robotics by Nicholas Nadeau;
  • Module of the month: random by Kouame Kofi;
  • Overview of the PyCon 2021 programme by the chairman of the talk selection committee, Philippe Gagnon.

All the details on Meetup page of the event.

Categories: FLOSS Project Planets

PyCharm: PyCharm 2021.1.1 RC Is Out: A Better Experience From PyCharm 2021.1

Fri, 2021-04-16 13:00

Thank you for your feedback on PyCharm 2021.1! We were looking carefully into it – and have managed to make some quick and important bug fixes – so that you can enjoy working with PyCharm.

DOWNLOAD PyCharm 2021.1.1 RC

Here is the list of major improvements:

  • Find in Files: works well again. IDEA-266391
  • Python Console: we disabled the auto-import feature for the Python Console. The code completion should work smoothly again in the Python Console. PY-47905
  • Jupyter Notebooks: autoscroll from and to source works in a synchronized manner for the preview and editor panes. PY-47976
  • Jupyter Notebooks: preview pane stays active and updated. PY-45112
  • PyCharm no longer scans the home directory for virtualenvs if the custom virtualenv directory was deleted. PyCharm switches back to keeping virtualenvs in the project/venv path. PY-47913

The whole list of improved functionality is available in the release notes.

Do not forget to submit your bug reports and feature requests to our tracker.

Categories: FLOSS Project Planets

Python Pool: What is the Use of Semicolon in Python? [Explained]

Fri, 2021-04-16 10:33
Introduction

In Python, we have discussed many concepts. In this tutorial, we will be discussing the role of the semicolon in python. The basic meaning of semicolon(;) in all the other languages is to end or discontinue the current statement. 

In programming languages like c, c++, and java, we semicolon to terminate the line of code, and it is necessary to use. But in python, it is not compulsory to use a semicolon.

What are semicolon in python?

The semicolon in python is used to denote separation rather than termination—the use of it to write the multiple statements on the same line. If you write a semicolon, it makes it legal to put a semicolon at the end of a single statement. So. it is treated as two statements where the second statement is empty.

Role of the semicolon in python
  • In python, we don’t use semicolons, but it is not restricted.
  • Python does not use a semicolon to denote the end of the line.
  • Python is a simple and easy coding language Because there we don’t need to use a semicolon, and if we even forget to place it, it doesn’t throw an error.
  • The use of Semicolon in python is as the line terminator where it is used as a separator to separate multiple lines.
Examples to Use a Semicolon in Python

Here, we will be discussing how we can use the semicolon in python by different methods or ways:

1. Printing semicolon in python

In this example, we will normally be printing the semicolon inside the print statement. Let us look at the example for understanding the concept in detail.

#print semicolon print(";")

Output:

;

Explanation:

  • In this example, we have directly applied the print function.
  • Inside the print function, we have written a semicolon inside the Quotation mark so that semicolon gets treated as a string.
  • Hence, you can see the output as the semicolon.
2. Splitting statements with the help of semicolons

In this example, we will first print the multiple statements in multiple lines with a print statement. After that, we will print all the statements in a single line separated by the semicolons between them. And see what happens with the help of semicolons in python. Let us look at the example for understanding the concept in detail.

print("Hy") print("Python") print("Pool") print("\n") print("Hy"); print("Python"); print("Pool")

Output:

Hy Python Pool Hy Python Pool

Explanation:

  • Firstly, we have printed three statements with different print functions and in different lines.
  • After that, we have applied a new line so that we understand the output more clearly.
  • Then, we have applied all three print statements in a single line, separating them by the semicolon.
  • At last, we have seen the output.
  • Hence, you can see how you can write the print function but using a semicolon is not the correct order.
3. Using semicolons with loops

In this example, we will be using the semicolons with the for loop. Let us look at the example for understanding the concept in detail.

#using for loop for i in range (4): print ('Hi') ; print('Python Pool')

Output:

Hi Python Pool Hi Python Pool Hi Python Pool Hi Python Pool

Explanation:

  • Firstly, we have applied for loop from i=0 to i=4.
  • Then, we have put a semicolon and printed hi with the help of the print function.
  • After that, again, we have applied a semicolon and printed the python pool.
  • Hence, we have seen that we have written for loop in a single line, which worked properly.
How to print the semicolon in python

Here, we will be printing the semicolon inside the print statement with the help of some string. Let us look at the example for understanding the concept in detail.

print("Python pool is the best website ; ")

Output:

Python pool is the best website ; Is using semicolon necessary in python?

Using a semicolon is not necessary, but it is also not restricted. It is only used as the line terminator, where it is used as a separator to separate multiple lines.

Also, See Conclusion

In this tutorial. We have learned about the concept of semicolons in python. We have also seen the role of it in python. After that, we have explained different examples of semicolons. All the ways are explained in detail with the help of examples. You can use any of the functions according to your choice and your requirement in the program.

However, if you have any doubts or questions, do let me know in the comment section below. I will try to help you as soon as possible.

The post What is the Use of Semicolon in Python? [Explained] appeared first on Python Pool.

Categories: FLOSS Project Planets

Quansight Labs Blog: Spot the differences: what is new in Spyder 5?

Fri, 2021-04-16 10:00

In case you missed it, Spyder 5 was released at the beginning of April! This blog post is a conversation attempting to document the long and complex process of improving Spyder's UI with this release. Portions lead by Juanita Gomez are marked as Juanita, and those lead by Isabela Presedo-Floyd are marked as Isabela.

What did we do?

[Juanita] Spyder was created more than 10 years ago and it has had the contributions of a great number of developers who have written code, proposed ideas, opened issues and tested PRs in order to build a piece of Spyder on their own. We (the Spyder team) have been lucky to have such a great community of people contributing throughout the years, but this is the first time that we decided to ask for help from an UX/UI expert! Why? You might wonder. Having the contributions of this great amount of people has resulted in inconsistencies around Spyder’s interface which we didn’t stop to analyze until now.

When Isabela joined Quansight, we realized that we had an opportunity of improving Spyder’s interface with her help. We thought her skill set was everything we needed to make Spyder’s UI better. So we started by reviewing the results of a community survey from a few months ago and realized that some of the most common feedback from users is related to its interface (very crowded, not consistent, many colors). This is why we decided to start a joint project with Isabela, (who we consider now part of the Spyder team) called Spyder 5!!!

Read more… (8 min remaining to read)

Categories: FLOSS Project Planets

Real Python: The Real Python Podcast – Episode #56: OrderedDict vs dict and Object Oriented Programming in Python vs Java

Fri, 2021-04-16 08:00

Are you looking for a bit of order when working with dictionaries in Python? Are you aware that the Python dict has changed over the last several versions and now keeps items in order? Could you learn more about object-oriented programming in Python by comparing it to another language? This week on the show, David Amos is back, and he's brought another batch of PyCoder's Weekly articles and projects.

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Codementor: Ruby Python which one the best

Fri, 2021-04-16 07:08
It should be understood that when talking about Python or Ruby, none of the developers will talk about it in its purest form. Because they resort to Python or Ruby in certain situations, which are required in specific projects.
Categories: FLOSS Project Planets

Programiz: Python Program to Append to a File

Fri, 2021-04-16 06:17
In this example, you will learn to append to a file.
Categories: FLOSS Project Planets

Programiz: Python Program to Count the Occurrence of an Item in a List

Fri, 2021-04-16 05:23
In this example, you will learn to count the occurrence of an item in a list.
Categories: FLOSS Project Planets

EuroPython: EuroPython 2021: Launching the conference website

Fri, 2021-04-16 05:12
https://ep2021.europython.eu/

During the last few weeks the team has been hard at work making final changes to the website, and we are excited to announce the launch of the conference website for EuroPython 2021 today !

We have also migrated the accounts from last year's website to the new one, so you should be able to login right away. That said, we still recommend changing your password as best practice. If you don't have an account yet, you can easily create one to be ready for ticket sales.

Quick Summary

EuroPython 2021 will be run online from July 26 - August 1:

  • Two workshop/training days (July 26 - 27)
  • Three conference days (July 28 - 30)
  • Two sprint days (July 31 - August 1)

The sessions will be scheduled to ensure they are also accessible for those in the Asian and Americas time zones.

More updates
  • Ticket sales will start on Monday, April 19. We have refined the ticket structure for EuroPython 2021 to make it easier to understand and added a table outlining the differences between the various ticket types.
  • Financial aid will once again be available, since we've grown our team. We'd like to enable more people from lower income countries to attend. Applications can be filed starting Wednesday, April 21.
  • The Call for Papers (CFP) will be opened on Monday, April 26. If you want to prepare, you can already have a look at the CFP page on the website. The CFP will stay open for two weeks. A mentorship program for first time speakers is planned as well.
  • Sponsorship packages are already available for review. If you decide to sponsor until May 7, you can get a 10% Early Bird discount on your packages.

We will send out more detailed posts on the above items in due course. Please subscribe to our newsletter if you want to make sure to get all information.

Enjoy,
EuroPython 2021 Team
EuroPython Society

Categories: FLOSS Project Planets

Programiz: Python Program to Check If a String Is a Number (Float)

Fri, 2021-04-16 04:51
In this example, you will learn to check if a string is a number (float).
Categories: FLOSS Project Planets

John Ludhi/nbshare.io: How To Code RNN and LSTM Neural Networks in Python

Thu, 2021-04-15 21:41
How To Code RNN And LSTM Neural Networks In Python In [1]: import matplotlib.pyplot as plt import numpy as np import pandas as pd import tensorflow as tf from tensorflow import keras from tensorflow.keras import layers tf.__version__ Out[1]: '2.3.1'

Check out following links if you want to learn more about Pandas and Numpy.

Pandas

Numpy Basics

What's so special about text?

Text is categorized as Sequential data: a document is a sequence of sentences, each sentence is a sequence of words, and each word is a sequence of characters. What is so special about text is that the next word in a sentence depends on:

  1. Context: which can extend long distances before and after the word, aka long term dependancy.
  2. Intent: different words can fit in the same contexts depending on the author's intent.
What do we need?

We need a neural network that models sequences. Specifically, given a sequence of words, we want to model the next word, then the next word, then the next word, ... and so on. That could be on a sentence, word, or character level. Our goal can be to just make a model to predict/generate the next word, like in unsupervised word embeddings. Alternatively, we could just map patterns in the text to associated labels, like in text classifications. In this notebook, we will be focusing on the latter. However, the networks used for either are pretty similar. The role of the network is most important in processing the textual input, extracting, and modelling the linguistic features. What we then do with these features is another story.

Recurrent Neural Networks (RNNs)

A Recurrent Neural Network (RNN) has a temporal dimension. In other words, the prediction of the first run of the network is fed as an input to the network in the next run. This beautifully reflects the nature of textual sequences: starting with the word "I" the network would expect to see "am", or "went", "go" ...etc. But then when we observe the next word, which let us say, is "am", the network tries to predict what comes after "I am", and so on. So yeah, it is a generative model!

Reber Grammar Classification

Let's start by a simple grammar classification. We assume there is a linguistic rule that characters are generated according to. This is a simple simulation of grammar in our natural language: you can say "I am" but not "I are". More onto Reber Grammar > here.

Defining the grammar

Consider the following Reber Grammar:

Reber Grammar

Let's represent it first in Python:

In [1]: default_reber_grammar=[ [("B",1)], #(state 0) =B=> (state 1) [("T", 2),("P", 3)], # (state 1) =T=> (state 2) or =P=> (state 3) [("X", 5), ("S", 2)], # (state 2) =X=> (state 5) or =S=> (state 2) [("T", 3), ("V", 4)], # (state 3) =T=> (state 3) or =V=> (state 4) [("V", 6), ("P", 5)], # (state 4) =V=> (state 6) or =P=> (state 5) [("X",3), ("S", 6)], # (state 5) =X=> (state 3) or =S=> (state 6) [("E", None)] # (state 6) =E=> <EOS> ]

Let's take this a step further, and use Embedded Reber Grammar, which simulates slightly more complicated linguistic rules, such as phrases!

In [2]: embedded_reber_grammar=[ [("B",1)], #(state 0) =B=> (state 1) [("T", 2),("P", 3)], # (state 1) =T=> (state 2) or =P=> (state 3) [(default_reber_grammar,4)], # (state 2) =REBER=> (state 4) [(default_reber_grammar,5)], # (state 3) =REBER=> (state 5) [("P", 6)], # (state 4) =P=> (state 6) [("T",6)], # (state 5) =T=> (state 3) [("E", None)] # (state 6) =E=> <EOS> ]

Now let's generate some data using these grammars:

Generating data In [3]: def generate_valid_string(grammar): state = 0 output = [] while state is not None: char, state = grammar[state][np.random.randint(len(grammar[state]))] if isinstance(char, list): # embedded reber char = generate_valid_string(char) output.append(char) return "".join(output) In [4]: def generate_corrupted_string(grammar, chars='BTSXPVE'): '''Substitute one character to violate the grammar''' good_string = generate_valid_string(grammar) idx = np.random.randint(len(good_string)) good_char = good_string[idx] bad_char = np.random.choice(sorted(set(chars)-set(good_char))) return good_string[:idx]+bad_char+good_string[idx+1:]

Let's define all the possible characters used in the grammar.

In [5]: chars='BTSXPVE' chars_dict = {a:i for i,a in enumerate(chars)} chars_dict Out[5]: {'B': 0, 'T': 1, 'S': 2, 'X': 3, 'P': 4, 'V': 5, 'E': 6}

One hot encoding is used to represent each character with a vector so that all vectors are equally far away from each other. For example,

In [6]: def str2onehot(string, num_steps=12, chars_dict=chars_dict): res = np.zeros((num_steps, len(chars_dict))) for i in range(min(len(string), num_steps)): c = string[i] res[i][chars_dict[c]] = 1 return res

Now let's generate a dataset of valid and corrupted strings

In [7]: def generate_data(data_size=10000, grammar=embedded_reber_grammar, num_steps=None): good = [generate_valid_string(grammar) for _ in range(data_size//2)] bad = [generate_corrupted_string(grammar) for _ in range(data_size//2)] all_strings = good+bad if num_steps is None: num_steps = max([len(s) for s in all_strings]) X = np.array([str2onehot(s) for s in all_strings]) l = np.array([len(s) for s in all_strings]) y = np.concatenate((np.ones(len(good)), np.zeros((len(bad))))).reshape(-1, 1) idx = np.random.permutation(data_size) return X[idx], l[idx], y[idx] In [9]: np.random.seed(42) X_train, seq_lens_train, y_train = generate_data(10000) X_val, seq_lens_val, y_val = generate_data(5000) X_train.shape, X_val.shape Out[9]: ((10000, 12, 7), (5000, 12, 7))

We have 10,000 words, each with 12 characters, and maximum of 7 unique letters (i.e. BTSXPVE)

Building the model

source

In [18]: x = layers.Input(shape=(12, 7)) # we define our input's shape # first we define our RNN cells to use in the RNN model # let's keep the model simple ... cell = layers.SimpleRNNCell(4, activation='tanh') # ... by just using 4 units (like 4 units in hidden layers) rnn = layers.RNN(cell) rnn_output = rnn(x)

We use tanh activation function to make the prediction between -1 and 1 the resulting activation between -1 and 1 is then weighted to finally give us the features to use in making our predictions

We finally add a fully connected layer to map our rnn outputs to the 0-1 classification output. We use a sigmoid function to map the prediction to probabilities between 0 and 1.

In [19]: output = layers.Dense(units=1, activation='sigmoid')(rnn_output) In [20]: # let's compile the model model = keras.Model(inputs=x, outputs=output) # loss is binary cropss entropy since this is a binary classification task # and evaluation metric as f1 model.compile(loss="binary_crossentropy", metrics=["accuracy"]) model.summary() Model: "functional_3" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input_2 (InputLayer) [(None, 12, 7)] 0 _________________________________________________________________ rnn_1 (RNN) (None, 4) 48 _________________________________________________________________ dense_1 (Dense) (None, 1) 5 ================================================================= Total params: 53 Trainable params: 53 Non-trainable params: 0 _________________________________________________________________

We have 12 characters in each input, and 4 units per RNN cell, so we have a total of 12x4=48 parameters to learn + 5 more parameters from the fully connected (FC) layer.

In [21]: # we train the model for 100 epochs # verbose level 2 displays more info while trianing H = model.fit(X_train, y_train, epochs=100, verbose=2, validation_data=(X_val, y_val)) In [20]: def plot_results(H): results = pd.DataFrame({"Train Loss": H.history['loss'], "Validation Loss": H.history['val_loss'], "Train Accuracy": H.history['accuracy'], "Validation Accuracy": H.history['val_accuracy'] }) fig, ax = plt.subplots(nrows=2, figsize=(16, 9)) results[["Train Loss", "Validation Loss"]].plot(ax=ax[0]) results[["Train Accuracy", "Validation Accuracy"]].plot(ax=ax[1]) ax[0].set_xlabel("Epoch") ax[1].set_xlabel("Epoch") plt.show() In [38]: plot_results(H) LSTM

Long short-term memory employs logic gates to control multiple RNNs, each is trained for a specific task. LSTMs allow the model to memorize long-term dependancies and forget less likely predictions. For example, if the training data had "John saw Sarah" and "Sarah saw John", when the model is given "John saw", the word "saw" can predict "Sarah" and "John" as they have been seen just after "saw". LSTM allows the model to recognize that "John saw" is going to undermine the possibility for "John", so we won't get "John saw John". Also we won't get "John saw John saw John saw John ..." as the model can predict that what comes after the word after saw, is the end of the sentence.

source

Now we will apply bidirectional LSTM (that looks both backward and forward in the sentence) for text classification.

Sentiment Analysis: IMDB reviews

source

NEVER train two models on the same kernel session. We already trained the reber grammar one, so we need to restart the kernel first.

Loading the data In [2]: !pip install -q tensorflow_datasets In [3]: import tensorflow_datasets as tfds In [4]: dataset, info = tfds.load('imdb_reviews', with_info=True, as_supervised=True) train_dataset, test_dataset = dataset['train'], dataset['test'] Processing the data

Now that we have downloaded the data, we now can go ahead and:

  1. (optional) take a small sample of the data, since this is just a demo!
  2. Align the reviews with their labels
  3. Shuffle the data
In [5]: train = train_dataset.take(4000) test = test_dataset.take(1000) In [6]: # to shuffle the data ... BUFFER_SIZE = 4000 # we will put all the data into this big buffer, and sample randomly from the buffer BATCH_SIZE = 128 # we will read 128 reviews at a time train = train.shuffle(BUFFER_SIZE).batch(BATCH_SIZE) test = test.batch(BATCH_SIZE)

prefetch: to allow the later elements to be prepared while the current elements are being processed.

In [7]: train = train.prefetch(BUFFER_SIZE) test = test.prefetch(BUFFER_SIZE) Text Encoding

Each word in the sentence is going to be replaced with its corresponding index in the vocabulary.

In [8]: VOCAB_SIZE=1000 # assuming our vocabulary is just 1000 words encoder = layers.experimental.preprocessing.TextVectorization(max_tokens=VOCAB_SIZE) encoder.adapt(train.map(lambda text, label: text)) # we just encode the text, not the labels In [9]: # here are the first 20 words in our 1000-word vocabulary vocab = np.array(encoder.get_vocabulary()) vocab[:20] Out[9]: array(['', '[UNK]', 'the', 'and', 'a', 'of', 'to', 'is', 'in', 'i', 'it', 'this', 'that', 'br', 'was', 'as', 'with', 'for', 'but', 'movie'], dtype='<U14') In [10]: example, label = list(train.take(1))[0] # that's one batch len(example) Out[10]: 128 In [11]: example[0].numpy() Out[11]: b'There have been so many many films based on the same theme. single cute girl needs handsome boy to impress ex, pays him and then (guess what?) she falls in love with him, there\'s a bit of fumbling followed by a row before everyone makes up before the happy ending......this has been done many times.<br /><br />The thing is I knew this before starting to watch. But, despite this, I was still looking forward to it. In the right hands, with a good cast and a bright script it can still be a pleasant way to pass a couple of hours.<br /><br />this was none of these.<br /><br />this was dire.<br /><br />A female lead lacking in charm or wit who totally failed to light even the slightest spark in me. I truly did not care if she "got her man" or remained single and unhappy.<br /><br />A male lead who, after a few of his endless words of wisdom, i wanted to kill. Just to remove that smug look. i had no idea that leading a life of a male whore was the path to all-seeing all-knowing enlightenment.<br /><br />A totally unrealistic film filled with unrealistic characters. none of them seemed to have jobs, all of them had more money than sense, a bridegroom who still goes ahead with his wedding after learning that his bride slept with his best friend....plus "i would miss you even if we had never met"!!!!! i could go on but i have just realised that i am wasting even more time on this dross.....I could rant about introducing a character just to have a very cheap laugh at the name "woody" but in truth that was the only remotely humorous thing that happened in the film.' In [12]: encoded_example = encoder(example[:1]).numpy() encoded_example Out[12]: array([[ 49, 26, 78, 36, 107, 107, 92, 417, 21, 2, 165, 810, 593, 988, 241, 795, 1, 429, 6, 1, 1, 1, 90, 3, 91, 495, 48, 56, 646, 8, 113, 16, 90, 222, 4, 197, 5, 1, 1, 33, 4, 1, 157, 336, 151, 57, 157, 2, 659, 1, 46, 78, 218, 107, 1, 13, 2, 144, 7, 9, 782, 11, 157, 1, 6, 104, 18, 475, 11, 9, 14, 122, 289, 971, 6, 10, 8, 2, 212, 946, 16, 4, 50, 185, 3, 4, 1, 227, 10, 69, 122, 28, 4, 1, 97, 6, 1, 4, 367, 5, 1, 13, 11, 14, 683, 5, 1, 13, 11, 14, 1, 13, 4, 634, 480, 1, 8, 1, 42, 1, 37, 432, 901, 6, 752, 55, 2, 1, 1, 8, 70, 9, 347, 118, 22, 425, 43, 56, 175, 40, 121, 42, 1, 593, 3, 1, 13, 4, 1, 480, 37, 101, 4, 178, 5, 23, 1, 609, 5, 1, 9, 449, 6, 485, 41, 6, 1, 12, 1, 158, 9, 63, 58, 326, 12, 813, 4, 115, 5, 4, 1, 1, 14, 2, 1, 6, 1, 1, 1, 13, 4, 432, 1, 20, 1, 16, 1, 103, 683, 5, 95, 463, 6, 26, 1, 32, 5, 95, 63, 51, 270, 71, 275, 4, 1, 37, 122, 278, 1, 16, 23, 1, 101, 1, 12, 23, 1, 1, 16, 23, 108, 1, 9, 60, 731, 25, 55, 43, 73, 63, 114, 1, 9, 96, 131, 21, 18, 9, 26, 41, 1, 12, 9, 214, 1, 55, 51, 59, 21, 11, 1, 96, 1, 45, 1, 4, 109, 41, 6, 26, 4, 52, 831, 500, 31, 2, 391, 1, 18, 8, 883, 12, 14, 2, 64, 1, 1, 144, 12, 571, 8, 2, 20]]) Creating the model

In [13]: model = tf.keras.Sequential([ encoder, # the encoder tf.keras.layers.Embedding( input_dim=len(encoder.get_vocabulary()), output_dim=64, # Use masking to handle the variable sequence lengths mask_zero=True), tf.keras.layers.Bidirectional(layers.LSTM(64)), # making LSTM bidirectional tf.keras.layers.Dense(32, activation='relu'), # FC layer for the classification part tf.keras.layers.Dense(1) # final FC layer ])

Let's try it out!

In [14]: sample_text = ('The movie was cool. The animation and the graphics ' 'were out of this world. I would recommend this movie.') predictions = model.predict(np.array([sample_text])) print(predictions[0]) [-0.00052149]

yeah yeah, we haven't trained the model yet.

Compiling & training the model In [15]: # we will use binary cross entropy again because this is a binary classification task (positive or negative) # we also did not apply a sigmoid activation function at the last FC layer, so we specify that the # are calculating the cross entropy from logits model.compile( loss=tf.keras.losses.BinaryCrossentropy(from_logits=True), # adam optimizer is more efficient (not always the most accurate though) optimizer=tf.keras.optimizers.Adam(1e-4), metrics=['accuracy'] ) In [16]: model.summary() Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= text_vectorization (TextVect (None, None) 0 _________________________________________________________________ embedding (Embedding) (None, None, 64) 64000 _________________________________________________________________ bidirectional (Bidirectional (None, 128) 66048 _________________________________________________________________ dense (Dense) (None, 32) 4128 _________________________________________________________________ dense_1 (Dense) (None, 1) 33 ================================================================= Total params: 134,209 Trainable params: 134,209 Non-trainable params: 0 _________________________________________________________________

Wow that's a lot of parameters!

In [17]: H2 = model.fit(train, epochs=25, validation_data=test) In [21]: plot_results(H2)

It works! We stopped after only 25 epochs, but obviously still has plenty of room for fitting with more epochs.

Summary & Comments
  1. Text is a simply a sequential data.
  2. RNN-like models feed the prediction of the current run as input to the next run.
  3. LSTM uses 4 RNNs to handel more complex features of text (e.g. long-term dependancy)
  4. Bidirectional models can provide remarkably outperform unidirectional models.
  5. You can stack as many LSTM layers as you want. It is just a new LEGO piece to use when building your NN :)
Categories: FLOSS Project Planets

PyCharm: PyCharm Supports Django – you can too!

Thu, 2021-04-15 12:03
For those developers, whose daily work benefits from the Django framework.

As you probably know, the Django framework is built and maintained by community members. This might give you the impression that Django is effortlessly built by itself, but the truth is that the organization needs strong management and efficient communication. Making major Django releases every 8 months and bug fixes every month is no small task! That’s why the Django Fellowship program exists and why Django Software Foundation needs support from the community – both technical and financial.

Moreover, the Django Software Foundation supports community development and outreach projects such as Django Girls.

You can read more about the DSF activities here.

Support Django!

JetBrains is participating in a Django fundraising campaign – this year is the fifth iteration of this initiative. Over the past four years, JetBrains PyCharm has raised more than $140,000 for the Django Software Foundation.

How does it work?

During this campaign, you can buy a new individual license for PyCharm Professional for 30% off, and the full purchase price will go to the DSF’s general fundraising efforts and the Django Fellowship program.

This campaign will help the DSF maintain a healthy Django project and continue contributing to their various outreach programs.

Only with our support can Django make sure that the Web framework you base your work on can grow to be even better in the coming years. Don’t hesitate to share the link to the fundraising campaign with your fellow developers, who benefit from Django. Together we can support Django a lot!

Support Django!

Stay at the edge of your efficiency with Django and PyCharm

PyCharm always tries to support the latest features in Django and other technologies and make working with them easier, so that you can spend more time and energy solving the really interesting problems in your projects.

To help you get even more out of PyCharm and Django, we often prepare tutorials and arrange webinars on Django-related topics. Here are some useful resources we have for learning about working with Django projects:

Categories: FLOSS Project Planets

Python for Beginners: Iterating over dictionary in Python

Thu, 2021-04-15 10:40

Dictionaries are one of the most frequently used data structures in python. It contains data in the form of key value pairs. While processing the data with dictionaries, we may need to iterate over the items in the dictionary to change the values or read the values present in the dictionary. In this article, we will see various ways for iterating over dictionary in python.

Iterating over a dictionary using for loop

As we iterate through lists or tuples using for loops in python, we can also iterate  through a python dictionary using a for loop.

When we try to iterate over a dictionary using for loop,it implicitly calls __iter__() method. The __iter__() method returns an iterator with the help of which we can iterate over the entire dictionary. As we know that dictionaries in python are indexed using keys, the iterator returned by __iter__() method iterates over the keys in the python dictionary.

So, with a for loop, we can iterate over and access all the keys of a dictionary as follows.

myDict={"name":"PythonForBeginners","acronym":"PFB","about":"Python Tutorials Website"} print("The dictionary is:") print(myDict) print("The keys in the dictionary are:") for x in myDict: print(x)

Output:

The dictionary is: {'name': 'PythonForBeginners', 'acronym': 'PFB', 'about': 'Python Tutorials Website'} The keys in the dictionary are: name acronym about

In the output we can see that all the the all the keys have been printed. In the for loop the iterator x iterates over all the keys in the dictionary which are then printed.

Having obtained the keys of the dictionary using for loop, we can also iterate over the values in the dictionary as follows.

myDict={"name":"PythonForBeginners","acronym":"PFB","about":"Python Tutorials Website"} print("The dictionary is:") print(myDict) print("The values in the dictionary are:") for x in myDict: print(myDict[x])

Output:

The dictionary is: {'name': 'PythonForBeginners', 'acronym': 'PFB', 'about': 'Python Tutorials Website'} The values in the dictionary are: PythonForBeginners PFB Python Tutorials Website

In the code above, we have simply obtained an iterator which iterates over the keys in the and then we have accessed the values associated to the keys using the syntax dict_name[key_name] and then the values are printed.

Iterate over keys of a  dictionary

We can use the keys() method to iterate over keys of a dictionary. The keys() method returns a list of keys in the dictionary when invoked on a dictionary and then we can iterate over the list to access the keys in the dictionary as follows.

myDict={"name":"PythonForBeginners","acronym":"PFB","about":"Python Tutorials Website"} print("The dictionary is:") print(myDict) print("The keys in the dictionary are:") keyList=myDict.keys() for x in keyList: print(x)

Output:

The dictionary is: {'name': 'PythonForBeginners', 'acronym': 'PFB', 'about': 'Python Tutorials Website'} The keys in the dictionary are: name acronym about

We can also access values associated with the keys of the dictionary once we have the keys in the list as follows.

myDict={"name":"PythonForBeginners","acronym":"PFB","about":"Python Tutorials Website"} print("The dictionary is:") print(myDict) print("The values in the dictionary are:") keyList=myDict.keys() for x in keyList: print(myDict[x])

Output:

The dictionary is: {'name': 'PythonForBeginners', 'acronym': 'PFB', 'about': 'Python Tutorials Website'} The values in the dictionary are: PythonForBeginners PFB Python Tutorials Website Iterate over values of a dictionary in python

If we only want to access the values in the dictionary, we can do so with the help of the values() method.The values() method when invoked on a dictionary returns a list of all the values present in the dictionary. We can access the values in the dictionary using values() method and for loop as follows.

myDict={"name":"PythonForBeginners","acronym":"PFB","about":"Python Tutorials Website"} print("The dictionary is:") print(myDict) print("The values in the dictionary are:") valueList=myDict.values() for x in valueList: print(x)

Output:

The dictionary is: {'name': 'PythonForBeginners', 'acronym': 'PFB', 'about': 'Python Tutorials Website'} The values in the dictionary are: PythonForBeginners PFB Python Tutorials Website

In the output, we can see that all the values present in the dictionary are printed one by one.

Iterating over items in a dictionary in python

We can iterate over and access the key value pairs using the items() method. The items() method when invoked on a dictionary, returns a list of tuples which have keys and values as pairs. Each tuple has a key on its 0th index and the value associated with the key is present on the 1st index of the tuple. We can access the key value pairs using items() method as shown below.

myDict={"name":"PythonForBeginners","acronym":"PFB","about":"Python Tutorials Website"} print("The dictionary is:") print(myDict) print("The items in the dictionary are:") itemsList=myDict.items() for x in itemsList: print(x)

Output:

The dictionary is: {'name': 'PythonForBeginners', 'acronym': 'PFB', 'about': 'Python Tutorials Website'} The items in the dictionary are: ('name', 'PythonForBeginners') ('acronym', 'PFB') ('about', 'Python Tutorials Website')

Along with iterating the items in the dictionary using the items() method,we can also iterate over keys and values of  a dictionary using two iterators in the for loop as follows.

myDict={"name":"PythonForBeginners","acronym":"PFB","about":"Python Tutorials Website"} print("The dictionary is:") print(myDict) itemList=myDict.items() print("The key value pairs in the dictionary are:") for x,y in itemList: print(x,end=":") print(y)

Output:

The dictionary is: {'name': 'PythonForBeginners', 'acronym': 'PFB', 'about': 'Python Tutorials Website'} The key value pairs in the dictionary are: name:PythonForBeginners acronym:PFB about:Python Tutorials Website

In the above program, the first iterator iterates over the keys in the dictionary and the second iterator iterates over the respective values associated with those keys which are present in the form of tuple containing key value pairs in the list returned by items() method. 

Conclusion

In this article, we have seen various ways to iterate over the data in dictionaries. We have seen how to access the keys and values of the dictionary using for loop and inbuilt methods like keys(), values() and items(). We can also write the programs used in this article with exception handling using python try except to make the programs more robust and handle errors in a systematic way. Stay tuned for more informative articles.

The post Iterating over dictionary in Python appeared first on PythonForBeginners.com.

Categories: FLOSS Project Planets

Stack Abuse: Radix Sort in Python

Thu, 2021-04-15 08:30
Introduction to Radix Sort

The radix (or base) is the number of digits used to represent numbers in a positional numeral system. For the binary system, the radix is 2 (it uses only two digits - 0 and 1). For the decimal system, the radix is 10 (it uses ten digits to represent all numbers - from 0 to 9).

A positional numeral system is, in simple terms, a number writing system, where the weight (or the value) of a digit is determined by its position. For example, in the number 123, 1 has more value than 3 because it's in position that denotes hundreds, and the 2 is in the tens.

Radix Sort can be used to lexicographically sort many types of data - integers, words, emails, but is mainly used to sort collections of integers and strings (that are mapped to appropriate integer keys).

It's a non-comparative sorting algorithm, meaning that it doesn't sort a collection by comparing its individual elements, but rather uses the inherent nature of the data its sorting to sort faster - it sorts data based on their radix.

Comparative sorting algorithms have the best case time complexity of O(nlogn), which is comparatively worse to linear execution time (O(n+k)) of non-comparative algorithms.

For example, let n be the number of elements to be sorted, and k is the range of allowed element values.

Counting Sort (a popular non-comparative algorithm) has the complexity of O(n+k) when the k is in the range from 1..n. But, if elements range from 1..n², then the complexity rises to O(n²), which is worse than any comparative sorting algorithm.

Counting Sort has the potential to be significantly faster than other popular comparative algorithms, though, only if a certain condition was fulfilled.

The idea of the Radix Sort is to upgrade Counting Sort so that it maintains the linear time complexity even if the range of elements' values drastically exceeds the number of elements.

In fact, Radix Sort inherently uses Counting Sort as the main subroutine, with a few tweaks to overcome the issues that arise with an incrased range of elements' values.

Counting Sort Algorithm

In order to get a grasp of Radix Sort, we'll have to delve into Counting Sort first, implement it and observe the downfall with an increased number of element values.

Why Use Counting Sort in the Radix Sort?

Counting sort is a stable, non-comparative sorting algorithm, and it is mainly used to sort integer arrays. All of these characteristics are important for its use in Radix Sort. You can use other algorithms as the subroutine, as long as they have these characteristics, though, Counting Sort is the most natural matchup.

Radix Sort needs to maintain a relative order of elements with the same key values in the input array while sorting the same place value digits, therefore, our main subroutine by definition needs to be some sort of stable sorting algorithm:

Non-comparative sorting algorithms generally have linear complexity, so they will have less impact on the complexity of the Radix Sort.

How Does the Counting Sort Work?

Let's take a look at an unsorted integer array, which we'll sort using Counting Sort:

I = [2, 2, 0, 6, 1, 9, 9, 7]

Counting Sort works by counting the number of elements, that fit a distinct key value, and then calculates the positions of each key.

First of all, we'll find the maximum element in the input array - max = 9.

Then, we'll create an auxiliary array with max+1 elements. This is the count array (C), which will be used to store the number of occurrences of each element in the input array.

Initially, all counts are initialized to 0:

C = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0] # Count array #indices: 0 1 2 3 4 5 6 7 8 9

Now, we need to go through the following steps:

1. Traverse the input array and increase the corresponding count for every element by 1

For example, if we come across an element with the value of 2 in the input array (I), we add 1 to the element with the index 2 in the count array:

I = [2, 2, 0, 6, 1, 9, 9, 7] # The first element is 2 ^ C = [0, 0, 1, 0, 0, 0, 0, 0, 0, 0] # We increase count of 2nd element by 1 #indices: 0 1 2 3 4 5 6 7 8 9

After this step, the count array will store the number of occurrences of each element in the input array:

C = [1, 1, 2, 0, 0, 0, 1, 1, 0, 2] #indices: 0 1 2 3 4 5 6 7 8 9 # Element 0 has 1 occurrence # Element 1 has 1 occurrence # Element 2 has 2 occurrences # Element 3 has no occurrences...

2. For each element in the count array, sum up its value with the value of all its previous elements, and then store that value as the value of the current element:

C = [1, 2, 4, 4, 4, 4, 5, 6, 6, 8] #indices: 0 1 2 3 4 5 6 7 8 9 # Element 0 = 1 # Element 1 = 1 + 1 # Element 2 = 1 + 1 + 2 # Element 3 = 1 + 1 + 2 + 0 #...

This way, we are storing the cumulative sum of the elements of the count array, on each step.

3. Calculate element position based on the count array values

To store this sorted sequence, we'll need to create a new array. Let's call it the output array (O), and initialize it with k zeros, where k is the number of elements in the input array:

O = [0, 0, 0, 0, 0, 0, 0, 0] // Initialized output array #indices: 0 1 2 3 4 5 6 7

For each element I[i] (starting from the end) in the input array:

  1. Find the index in the count array that is equal to the value of the current element I[i]
    • That's the element C[j] where j=I[i]
  2. Subtract 1 from the value of the C[i]
    • Now we have newValue = C[i]-1
  3. Store the I[i] to the O[newValue]
  4. Update the C[i] with the newValue

In the end, the output array contains the sorted elements of the input array!

Implementing Counting Sort in Python

Now, with all that out of the way - let's go ahead an implement Counting Sort in Python:

def countingSort(inputArray): # Find the maximum element in the inputArray maxEl = max(inputArray) countArrayLength = maxEl+1 # Initialize the countArray with (max+1) zeros countArray = [0] * countArrayLength # Step 1 -> Traverse the inputArray and increase # the corresponding count for every element by 1 for el in inputArray: countArray[el] += 1 # Step 2 -> For each element in the countArray, # sum up its value with the value of the previous # element, and then store that value # as the value of the current element for i in range(1, countArrayLength): countArray[i] += countArray[i-1] # Step 3 -> Calculate element position # based on the countArray values outputArray = [0] * len(inputArray) i = len(inputArray) - 1 while i >= 0: currentEl = inputArray[i] countArray[currentEl] -= 1 newPosition = countArray[currentEl] outputArray[newPosition] = currentEl i -= 1 return outputArray inputArray = [2,2,0,6,1,9,9,7] print("Input array = ", inputArray) sortedArray = countingSort(inputArray) print("Counting sort result = ", sortedArray)

Running the code above will yield us the following output:

Input array = [2, 2, 0, 6, 1, 9, 9, 7] Counting sort result = [0, 1, 2, 2, 6, 7, 9, 9] Counting Sort Complexity

The time complexity of the counting sort is O(n+k), where n is the number of elements in the input array, and k is the value of the max element in the array.

The problem occurs when the value of the largest element drastically exceeds the number of elements in the array. As the k approaches n², the time complexity gets closer to O(n²), which is a horrible time complexity for a sorting algorithm.

This is where Radix Sort kicks in.

Radix Sort Algorithm

Instead of counting the elements by their distinct key value - Radix Sort groups digits by their positional value and performing Counting Sort in each group. The starting position can vary - LSD (Least Significant Digits) or MSD (Most Significant Digits) are two common ones, and accordingly, these variations of Radix Sort are called LSD Radix Sort and MSD Radix Sort.

Let I = [2, 20, 61, 997, 1, 619] be the input array that we want to sort:

We'll focus on LSD Radix Sort.

Radix Sort Algorithm

The steps taken by Radix Sort are fairly straighforward:

  1. Find the maximum element in the input array - max = 997
  2. Find the number of digits in the max element - D = 3
  3. Initialize the place value to the least significant place - placeVal = 1
  4. For D times do:
    1. Perform the counting sort by the current place value
    2. Move to the next place value by multiplying placeVal by 10

Implementing Radix Sort in Python

And finally, with that out of the way, let's implement Radix Sort in Python:

def countingSortForRadix(inputArray, placeValue): # We can assume that the number of digits used to represent # all numbers on the placeValue position is not grater than 10 countArray = [0] * 10 inputSize = len(inputArray) # placeElement is the value of the current place value # of the current element, e.g. if the current element is # 123, and the place value is 10, the placeElement is # equal to 2 for i in range(inputSize): placeElement = (inputArray[i] // placeValue) % 10 countArray[placeElement] += 1 for i in range(1, 10): countArray[i] += countArray[i-1] # Reconstructing the output array outputArray = [0] * inputSize i = inputSize - 1 while i >= 0: currentEl = inputArray[i] placeElement = (inputArray[i] // placeValue) % 10 countArray[placeElement] -= 1 newPosition = countArray[placeElement] outputArray[newPosition] = currentEl i -= 1 return outputArray def radixSort(inputArray): # Step 1 -> Find the maximum element in the input array maxEl = max(inputArray) # Step 2 -> Find the number of digits in the `max` element D = 1 while maxEl > 0: maxEl /= 10 D += 1 # Step 3 -> Initialize the place value to the least significant place placeVal = 1 # Step 4 outputArray = inputArray while D > 0: outputArray = countingSortForRadix(outputArray, placeVal) placeVal *= 10 D -= 1 return outputArray input = [2,20,61,997,1,619] print(input) sorted = radixSort(input) print(sorted)

Running the code above will yield us the following output:

[2, 20, 61, 997, 1, 619] [1, 2, 20, 61, 619, 997] Radix Sort Complexity

As we stated before, Radix Sort has linear time complexity. If we use Counting Sort as the main subroutine, the complexity of radix sort is O(d(n+k)). That is because we are executing the counting sort d times, and the complexity of the Counting Sort itself is O(n+k).

Conclusion

Radix sort is a great sorting algorithm to use in some specific cases. Some benchmarks have even shown that radix sort can execute up to 3 times faster than other, more general-purpose sorting algorithms.

It shines when the input array has shorter keys, or the range of the element values is smaller. But has poor space complexity in other cases, when the range of element values is quite large and elements have too many digits in their representation.

That is the main reason why the radix sort is not as widely used as some other types of sorting algorithms, even if it has linear time complexity.

Categories: FLOSS Project Planets

Programiz: Python Program to Randomly Select an Element From the List

Thu, 2021-04-15 07:31
In this example, you will learn to select a random element from the list.
Categories: FLOSS Project Planets

Pages