Planet Python
Stack Abuse: Common Docstring Formats in Python
When you're knee-deep in Python code, comprehensive documentation can be a lifesaver (but, admittedly, the last thing you want to write). It's an important part of programming, and Python, being a highly readable and straightforward language, places great emphasis on it.
One key component of Python documentation is the docstring, a unique feature that sets Python apart from many other languages. In this article, we'll delve into what a docstring is and explore some of the most common docstring formats used in Python.
What is a Docstring?A docstring, short for documentation string, is a literal string used right after the definition of a function, method, class, or module. It captures the essence of what the function does, providing an easy reference for the programmer. In Python, a docstring is a first-class citizen, meaning it can be accessed programmatically using the __doc__ attribute.
Here's a simple Python function with a docstring:
def add_numbers(a, b): """ Adds two numbers together. Args: a (int): The first number. b (int): The second number. Returns: int: The sum of the two numbers. """ return a + bAnd here's how you can access the docstring:
print(add_numbers.__doc__)The output will be:
Adds two numbers together. Args: a (int): The first number. b (int): The second number. Returns: int: The sum of the two numbers.Note: Python's built-in help() function can also be used to access the docstring of a function, method, class, or module. For instance, help(add_numbers) will print the docstring along with some additional information.
There's really no strict rule on how to write a docstring, although several widely accepted formats make the docstrings more structured and useful. These formats not only help in understanding the code, but they also allow tools like Sphinx, PyDoc, and Doxygen to automatically generate well-formatted documentation.
We'll look at these formats in the following sections.
Common Python Docstring FormatsDocstrings in Python are a powerful tool for documenting your code. They're essentially comments that are written in a specific format, which allows them to be parsed by documentation generation tools. There are several common formats for writing docstrings, and they each have their own strengths and weaknesses. The most commonly used formats are reStructuredText (reST), Google, NumPy/SciPy, and Epytext.
Note: It's important to keep in mind that the best docstring format for you depends on your specific use case. You should consider factors like the complexity of your code, the tools you're using to generate documentation, and your personal preference.
ReStructured Text (reST) Docstring FormatReStructuredText, often abbreviated as reST, is a file format for textual data used primarily in the Python community for technical documentation. It's the default plaintext markup language used by Sphinx, a Python documentation generator.
In a reST docstring, you would typically start with a brief description of the function's purpose. You would then include sections for :param: to describe input parameters, :returns: to describe the return value, and :raises: to describe any exceptions that the function may raise.
Here's an example of what a reST docstring might look like:
def add_numbers(x, y): """ Adds two numbers together. :param x: The first number to add :type x: int or float :param y: The second number to add :type y: int or float :returns: The sum of x and y :rtype: int or float :raises ValueError: If either x or y is not an int or float """ if not isinstance(x, (int, float)) or not isinstance(y, (int, float)): raise ValueError('Both x and y must be ints or floats') return x + yIn this example, the docstring starts with a brief description of the function. It then uses :param: to describe the input parameters x and y, :type: to specify their types, :returns: to describe the return value, and :raises: to describe the ValueError exception that the function may raise.
Note: With reST, you can also include other sections like :example: for examples of usage, :seealso: for related functions, and :note: for additional notes. This makes it a very flexible and comprehensive documentation tool.
Google Docstring FormatThe Google Docstring format is a popular choice among Python developers due to its readability and simplicity. This format is characterized by a clear separation of sections, which are indicated by section headers. Section headers include Args, Returns, Raises, Yields, and Attributes, among others.
Here's an example of a function documented using the Google Docstring format:
def add_numbers(a, b): """ Adds two numbers together. Args: a (int): The first number. b (int): The second number. Returns: int: The sum of a and b. """ return a + bHere, the Args section describes the arguments the function expects, including their type and purpose. The Returns section, on the other hand, describes the result that the function returns, along with its type.
NumPy/SciPy Docstring FormatThe NumPy/SciPy Docstring format is another popular format, especially among scientific computing communities. It provides a structured way to document Python code and is characterized by its extensive use of sections and sub-sections, which makes it suitable for documenting complex code.
Here's an example of a function documented using the NumPy/SciPy Docstring format:
def add_numbers(a, b): """ Adds two numbers together. Parameters ---------- a : int The first number. b : int The second number. Returns ------- int The sum of a and b. """ return a + bIn this example, the Parameters section describes the function's parameters, including their type and purpose. The Returns section describes the result that the function returns, along with its type. The use of dashes (------) to separate sections is a distinctive feature of this format.
Note: Both the Google and NumPy/SciPy Docstring formats are supported by various tools for generating documentation, like Sphinx and Pydoc. This means that you can automatically generate HTML, PDF, or other formats of documentation from your Python docstrings.
EpYtext Docstring FormatEpYtext is another popular docstring format used in Python. It's a plain text format for Python docstrings that was developed as part of the Epydoc project. Epytext markup language is designed to be easy to read and write in its raw form, yet easy to render in a variety of output formats.
Here's an example of how to use the EpYtext docstring format:
def add_numbers(a, b): """ This function adds two numbers. @param a: The first number. @type a: C{int} @param b: The second number. @type b: C{int} @return: The sum of the two numbers. @rtype: C{int} """ return a + bIn the above example, you can see that the EpYtext format uses @-style tags to denote different sections of the docstring. The @param and @type tags are used to document the function parameters, while the @return and @rtype tags are used to document the return value of the function.
Choosing the Right Docstring FormatChoosing the right docstring format is largely a matter of personal preference and the specific needs of your project. However, there are a few things to consider when making your decision.
Firstly, consider the complexity of your project. If your project is large and complex, a more structured docstring format like reST or NumPy/SciPy might be beneficial. These formats allow for more detailed documentation, which can be especially helpful in large codebases.
Secondly, consider the tools you're using. Some documentation generation tools, like Sphinx, have better support for certain docstring formats. For example, Sphinx has built-in support for the reST docstring format.
Thirdly, consider the readability of the docstring format. Some developers find certain formats easier to read and write than others. For example, some people find the Google docstring format to be more readable than the reST format.
Here's a quick comparison of the four docstring formats we've discussed:
- reST: Highly structured, great for complex projects, excellent support in Sphinx.
- Google: Less structured, easy to read and write, good support in various tools.
- NumPy/SciPy: Highly structured, great for scientific projects, excellent support in Sphinx.
- EpYtext: Less structured, easy to read and write, good support in Epydoc.
Remember, the most important thing is that you're documenting your code. The specific format you choose is less important than the act of documentation itself.
ConclusionIn this article, we've taken a deep dive into the world of Python docstrings and explored some of the most common formats that developers use to document their code. We've looked at the ReStructured Text (reST), Google, NumPy/SciPy, and Epytext docstring formats, each with their own unique styles and conventions.
Choosing the right docstring format largely depends on your specific project needs and personal preference. Whether you prefer the simplicity of Google's style, the detailed structure of reST, or the mathematical focus of NumPy/SciPy, remember that the key to good documentation is consistency and clarity. As long as your docstrings are clear, concise, and consistent, they will serve as a useful guide for both you and other developers who interact with your code.
Shannon -jj Behrens: Python: My Favorite Python Tricks for LeetCode Questions
I've been spending a lot of time practicing on LeetCode recently, so I thought I'd share some of my favorite intermediate-level Python tricks. I'll also cover some newer features of Python you may not have started using yet. I'll start with basic tips and then move to more advanced ones.
Get help()Python's documentation is pretty great, and some of these examples are taken from there.
For instance, if you just google "heapq", you'll see the official docs for heapq, which are often enough.
However, it's also helpful to sometimes just quickly use help() in the shell. Here, I can't remember that push() is actually called append().
>>> help([]) >>> dir([]) >>> help([].append) enumerate()If you need to loop over a list, you can use enumerate() to get both the item as well as the index. As a pneumonic, I like to think for (i, x) in enumerate(...):
for (i, x) in enumerate(some_list): ... items()Similarly, you can get both the key and the value at the same time when looping over a dict using items():
for (k, v) in some_dict.items(): ... [] vs. get()Remember, when you use [] with a dict, if the value doesn't exist, you'll get a KeyError. Rather than see if an item is in the dict and then look up its value, you can use get():
val = some_dict.get(key) # It defaults to None. if val is None: ...Similarly, .setdefault() is sometimes helpful.
Some people prefer to just use [] and handle the KeyError since exceptions aren't as expensive in Python as they are in other languages.
range() is smarter than you think for item in range(items): ... for index in range(len(items)): ... # Count by 2s. for i in range(0, 100, 2): ... # Count backward from 100 to 0 inclusive. for i in range(100, -1, -1): ... # Okay, Mr. Smarty Pants, I'm sure you knew all that, but did you know # that you can pass a range object around, and it knows how to reverse # itself via slice notation? :-P r = range(100) r = r[::-1] # range(99, -1, -1) print(f'') debuggingHave you switched to Python's new format strings yet? They're more convenient and safer (from injection vulnerabilities) than % and .format(). They even have a syntax for outputing the thing as well as its value:
# Got 2+2=4 print(f'Got {2+2=}') for elsePython has a feature that I haven't seen in other programming languages. Both for and while can be followed by an else clause, which is useful when you're searching for something.
for item in some_list: if is_what_im_looking_for(item): print(f"Yay! It's {item}.") break else: print("I couldn't find what I was looking for.") Use a list as a stackThe cost of using a list as a stack is (amortized) O(1):
elements = [] elements.append(element) # Not push element = elements.pop()Note that inserting something at the beginning of the list or in the middle is more expensive it has to shift everything to the right--see deque below.
sort() vs. sorted() # sort() sorts a list in place. my_list.sort() # Whereas sorted() returns a sorted *copy* of an iterable: my_sorted_list = sorted(some_iterable)And, both of these can take a key function if you need to sort objects.
set and frozensetSets are so useful for so many problems! Just in case you didn't know some of these tricks:
# There is now syntax for creating sets. s = {'Von'} # There are set "comprehensions" which are like list comprehensions, but for sets. s2 = {f'{name} the III' for name in s} {'Von the III'} # If you can't remember how to use union, intersection, difference, etc. help(set()) # If you need an immutable set, for instance, to use as a dict key, use frozenset. frozenset((1, 2, 3)) dequeIf you find yourself needing a queue or a list that you can push and pop from either side, use a deque:
>>> from collections import deque >>> >>> d = deque() >>> d.append(3) >>> d.append(4) >>> d.appendleft(2) >>> d.appendleft(1) >>> d deque([1, 2, 3, 4]) >>> d.popleft() 1 >>> d.pop() 4 Using a stack instead of recursionInstead of using recursion (which has a depth of about 1024 frames), you can use a while loop and manually manage a stack yourself. Here's a slightly contrived example:
work = [create_initial_work()] while work: work_item = work.pop() result = process(work_item) if is_done(result): return result work.push(result.pieces[0]) work.push(result.pieces[1]) Using yield fromIf you don't know about yield, you can go spend some time learning about that. It's awesome.
Sometimes, when you're in one generator, you need to call another generator. Python now has yield from for that:
def my_generator(): yield 1 yield from some_other_generator() yield 6So, here's an example of backtracking:
class Solution: def problem(self, digits: str) -> List[str]: def generate_possibilities(work_so_far, remaining_work): if not remaining_work: if work_so_far: yield work_so_far return first_part, remaining_part = remaining_work[0], remaining_work[1:] for i in things_to_try: yield from generate_possibilities(work_so_far + i, remaining_part) output = list(generate_possibilities(no_work_so_far, its_all_remaining_work)) return outputThis is appropriate if you have less than 1000 "levels" but a ton of possibilities for each of those levels. This won't work if you're going to need more than 1000 layers of recursion. In that case, switch to "Using a stack instead of recursion".
Pre-initialize your listIf you know how long your list is going to be ahead of time, you can avoid needing to resize it multiple times by just pre-initializing it:
dp = [None] * len(items) collections.Counter()How many times have you used a dict to count up something? It's built-in in Python:
>>> from collections import Counter >>> c = Counter('abcabcabcaaa') >>> c Counter({'a': 6, 'b': 3, 'c': 3}) defaultdictSimilarly, there's defaultdict:
>>> from collections import defaultdict >>> d = defaultdict(list) >>> d['girls'].append('Jocylenn') >>> d['boys'].append('Greggory') >>> d defaultdict(<class 'list'>, {'girls': ['Jocylenn'], 'boys': ['Greggory']})Notice that I didn't need to set d['girls'] to an empty list before I started appending to it.
heapqI had heard of heaps in school, but I didn't really know what they were. Well, it turns out they're pretty helpful for several of the problems, and Python has a list-based heap implementation built-in.
If you don't know what a heap is, I recommend this video and this video. They'll explain what a heap is and how to implement one using a list.
The heapq module is a built-in module for managing a heap. It builds on top of an existing list:
import heapq some_list = ... heapq.heapify(some_list) # The head of the heap is some_list[0]. # The len of the heap is still len(some_list). heapq.heappush(some_list, item) head_item = heapq.heappop(some_list)The heapq module also has nlargest and nsmallest built-in so you don't have to implement those things yourself.
Keep in mind that heapq is a minheap. Let's say that what you really want is a maxheap, and you're not working with ints, you're working with objects. Here's how to tweak your data to get it to fit heapq's way of thinking:
heap = [] heapq.heappush(heap, (-obj.value, obj)) (ignored, first_obj) = heapq.heappop()Here, I'm using - to make it a maxheap. I'm wrapping things in a tuple so that it's sorted by the obj.value, and I'm including the obj as the second value so that I can get it.
Use bisect for binary searchI'm sure you've implemented binary search before. Python has it built-in. It even has keyword arguments that you can use to search in only part of the list:
import bisect insertion_point = bisect.bisect_left(sorted_list, some_item, lo=lo, high=high)Pay attention to the key argument which is sometimes useful, but may take a little work for it to work the way you want.
namedtuple and dataclassesTuples are great, but it can be a pain to deal with remembering the order of the elements or unpacking just a single element in the tuple. That's where namedtuple comes in.
>>> from collections import namedtuple >>> Point = namedtuple('Point', ['x', 'y']) >>> p = Point(5, 7) >>> p Point(x=5, y=7) >>> p.x 5 >>> q = p._replace(x=92) >>> p Point(x=5, y=7) >>> q Point(x=92, y=7)Keep in mind that tuples are immutable. I particularly like using namedtuples for backtracking problems. In that case, the immutability is actually a huge asset. I use a namedtuple to represent the state of the problem at each step. I have this much stuff done, this much stuff left to do, this is where I am, etc. At each step, you take the old namedtuple and create a new one in an immutable way.
Updated: Python 3.7 introduced dataclasses. These have multiple advantages:
- They can be mutable or immutable (although, there's a small performance penalty).
- You can use type annotations.
- You can add methods.
dataclasses are great when you want a little class to hold some data, but you don't want to waste much time writing one from scratch.
int, decimal, math.inf, etc.Thankfully, Python's int type supports arbitrarily large values by default:
>>> 1 << 128 340282366920938463463374607431768211456There's also the decimal module if you need to work with things like money where a float isn't accurate enough or when you need a lot of decimal places of precision.
Sometimes, they'll say the range is -2 ^ 32 to 2 ^ 32 - 1. You can get those values via bitshifting:
>>> -(2 ** 32) == -(1 << 32) True >>> (2 ** 32) - 1 == (1 << 32) - 1 TrueSometimes, it's useful to initialize a variable with math.inf (i.e. infinity) and then try to find new values less than that.
ClosuresI'm not sure every interviewer is going to like this, but I tend to skip the OOP stuff and use a bunch of local helper functions so that I can access things via closure:
class Solution(): # This is what LeetCode gave me. def solveProblem(self, arg1, arg2): # Why they used camelCase, I have no idea. def helper_function(): # I have access to arg1 and arg2 via closure. # I don't have to store them on self or pass them around # explicitly. return arg1 + arg2 counter = 0 def can_mutate_counter(): # By using nonlocal, I can even mutate counter. # I rarely use this approach in practice. I usually pass in it # as an argument and return a value. nonlocal counter counter += 1 can_mutate_counter() return helper_function() + counter match statementDid you know Python now has a match statement?
# Taken from: https://learnpython.com/blog/python-match-case-statement/ >>> command = 'Hello, World!' >>> match command: ... case 'Hello, World!': ... print('Hello to you too!') ... case 'Goodbye, World!': ... print('See you later') ... case other: ... print('No match found')It's actually much more sophisticated than a switch statement, so take a look, especially if you've never used match in a functional language like Haskell.
OrderedDictIf you ever need to implement an LRU cache, it'll be quite helpful to have an OrderedDict.
Python's dicts are now ordered by default. However, the docs for OrderedDict say that there are still some cases where you might need to use OrderedDict. I can't remember. If you never need your dicts to be ordered, just read the docs and figure out if you need an OrderedDict or if you can use just a normal dict.
@functools.cacheIf you need a cache, sometimes you can just wrap your code in a function and use functools.cache:
from functools import cache @cache def factorial(n): return n * factorial(n - 1) if n else 1 print(factorial(5)) ... factorial.cache_info() # CacheInfo(hits=3, misses=8, maxsize=32, currsize=8) Debugging ListNodesA lot of the problems involve a ListNode class that's provided by LeetCode. It's not very "debuggable". Add this code temporarily to improve that:
def list_node_str(head): seen_before = set() pieces = [] p = head while p is not None: if p in seen_before: pieces.append(f'loop at {p.val}') break pieces.append(str(p.val)) seen_before.add(p) p = p.next joined_pieces = ', '.join(pieces) return f'[{joined_pieces}]' ListNode.__str__ = list_node_str Saving memory with the array moduleSometimes you need a really long list of simple numeric (or boolean) values. The array module can help with this, and it's an easy way to decrease your memory usage after you've already gotten your algorithm working.
>>> import array >>> array_of_bytes = array.array('b') >>> array_of_bytes.frombytes(b'\0' * (array_of_bytes.itemsize * 10_000_000))Pay close attention to the type of values you configure the array to accept. Read the docs.
I'm sure there's a way to use individual bits for an array of booleans to save even more space, but it'd probably cost more CPU, and I generally care about CPU more than memory.
Using an exception for the success case rather than the error caseA lot of Python programmers don't like this trick because it's equivalent to goto, but I still occasionally find it convenient:
class Eureka(StopIteration): """Eureka means "I found it!" """ pass def do_something_else(): some_value = 5 raise Eureka(some_value) def do_something(): do_something_else() try: do_something() except Eureka as exc: print(f'I found it: {exc.args[0]}') Using VS Code, etc.VS Code has a pretty nice Python extension. If you highlight the code and hit shift-enter, it'll run it in a shell. That's more convenient than just typing everything directly in the shell. Other editors have something similar, or perhaps you use a Jupyter notebook for this.
Another thing that helps me is that I'll often have separate files open with separate attempts at a solution. I guess you can call this the "fast" approach to branching.
Write English before PythonOne thing that helps me a lot is to write English before writing Python. Just write all your thoughts. Keep adding to your list of thoughts. Sometimes you have to start over with a new list of thoughts. Get all the thoughts out, and then pick which thoughts you want to start coding first.
ConclusionWell, those are my favorite tricks off the top of my head. I'll add more if I think of any.
This is just a single blog post, but if you want more, check out Python 3 Module of the Week.
Stack Abuse: Creating a Singleton in Python
Of all the design patterns, the Singleton pattern holds a unique place. It's straightforward, yet is often misunderstood. In this Byte, we'll try to explain the Singleton pattern, understand its core principles, and learn how to implement it in Python. We'll also explore how to create a Singleton using a decorator.
The Singleton PatternThe Singleton pattern is a design pattern that restricts the instantiation of a class to a single instance. This is useful when exactly one object is needed to coordinate actions across the system. The concept is sometimes generalized to systems that operate more efficiently when only one object exists, or that restrict the instantiation to a certain number of objects.
The Singleton pattern is a part of the Gang of Four design patterns and falls under the category of creational patterns. Creational patterns deal with object creation mechanisms, trying to create objects in a manner suitable to the situation.
Note: The Singleton pattern is considered an anti-pattern by some due to its potential for misuse. It's important to use it judiciously and only when necessary.
Creating a Singleton in PythonPython doesn't natively support the Singleton pattern, but there are several ways to create one. Here's a simple example:
class Singleton: _instance = None def __new__(cls, *args, **kwargs): if not cls._instance: cls._instance = super(Singleton, cls).__new__(cls, *args, **kwargs) return cls._instanceIn the above code, we override the __new__ method. This method is called before __init__ when an object is created. If the Singleton class's _instance attribute is None, we create a new Singleton object and assign it to _instance. If _instance is already set, we return that instead.
Using this technique effectively only allows the Singleton class to be instantiated once. You can then add any properties or methods to this class that you need.
Using a DecoratorAnother way to create a Singleton in Python is by using a decorator. Decorators allow us to wrap another function in order to extend the behavior of the wrapped function, without permanently modifying it.
Here's how we can create a Singleton using a decorator:
def singleton(cls): instances = {} def wrapper(*args, **kwargs): if cls not in instances: instances[cls] = cls(*args, **kwargs) return instances[cls] return wrapper @singleton class Singleton: passIn the above code, the @singleton decorator checks if an instance of the class it's decorating exists in the instances dictionary. If it doesn't, it creates one and adds it to the dictionary. If it does exist, it simply returns the existing instance.
Using a Base ClassCreating a singleton using a base class is a straightforward approach. Here, we define a base class that maintains a dictionary of instance references. Whenever an instance is requested, we first check if the instance already exists in the dictionary. If it does, we return the existing instance, otherwise, we create a new instance and store its reference in the dictionary.
Here's how you can implement a singleton using a base class in Python:
class SingletonBase: _instances = {} def __new__(cls, *args, **kwargs): if cls not in cls._instances: instance = super().__new__(cls) cls._instances[cls] = instance return cls._instances[cls] class Singleton(SingletonBase): pass s1 = Singleton() s2 = Singleton() print(s1 is s2) # Output: TrueIn the above code, SingletonBase is the base class that implements the singleton pattern. Singleton is the class that we want to make a singleton.
Using a MetaclassA metaclass in Python is a class of a class, meaning a class is an instance of its metaclass. We can use a metaclass to create a singleton by overriding its __call__ method to control the creation of instances.
Here's how you can implement a singleton using a metaclass in Python:
class SingletonMeta(type): _instances = {} def __call__(cls, *args, **kwargs): if cls not in cls._instances: instance = super().__call__(*args, **kwargs) cls._instances[cls] = instance return cls._instances[cls] class Singleton(metaclass=SingletonMeta): pass s1 = Singleton() s2 = Singleton() print(s1 is s2) # Output: TrueIn the above code, SingletonMeta is the metaclass that implements the singleton pattern. Singleton is the class that we want to make a singleton.
Use CasesSingletons are useful when you need to control access to a resource or when you need to limit the instantiation of a class to a single object. This is typically useful in scenarios such as logging, driver objects, caching, thread pools, and database connections.
Singleton pattern is considered an anti-pattern by some due to its global nature and the potential for unintended side effects. Be sure to use it only when necessary!
Singletons and MultithreadingWhen dealing with multithreading, singletons can be tricky. If two threads try to create an instance at the same time, they might end up creating two different instances. To prevent this, we need to synchronize the instance creation process.
Here's how you can handle singleton creation in a multithreaded environment:
import threading class SingletonMeta(type): _instances = {} _lock: threading.Lock = threading.Lock() def __call__(cls, *args, **kwargs): with cls._lock: if cls not in cls._instances: instance = super().__call__(*args, **kwargs) cls._instances[cls] = instance return cls._instances[cls] class Singleton(metaclass=SingletonMeta): pass def test_singleton(): s1 = Singleton() print(s1) # Create multiple threads threads = [threading.Thread(target=test_singleton) for _ in range(10)] # Start all threads for thread in threads: thread.start() # Wait for all threads to finish for thread in threads: thread.join()In the above code, we use a lock to ensure that only one thread can create an instance at a time. This prevents the creation of multiple singleton instances in a multithreaded environment.
Common PitfallsWhile singletons can be a powerful tool in your Python programming toolkit, they are not without their pitfalls. Here are a few common ones to keep in mind:
-
Global Variables: Singleton can sometimes be misused as a global variable. This can lead to problems as the state of the singleton can be changed by any part of the code, leading to unpredictable behavior.
-
Testability: Singletons can make unit testing difficult. Since they maintain state between calls, a test could potentially modify that state and affect the outcome of other tests. This is why it's important to ensure that the state is reset before each test.
-
Concurrency Issues: In a multithreaded environment, care must be taken to ensure that the singleton instance is only created once. If not properly handled, multiple threads could potentially create multiple instances.
Here's an example of how a singleton can cause testing issues:
class Singleton(object): _instance = None def __new__(cls): if cls._instance is None: cls._instance = super(Singleton, cls).__new__(cls) return cls._instance s1 = Singleton() s2 = Singleton() s1.x = 5 print(s2.x) # Outputs: 5In this case, if you were to test the behavior of Singleton and modify x, that change would persist across all instances and could potentially affect other tests.
ConclusionSingletons are a design pattern that restricts a class to a single instance. They can be useful in scenarios where a single shared resource, such as a database connection or configuration file, is needed. In Python, you can create a singleton using various methods such as decorators, base classes, and metaclasses.
However, singletons come with their own set of pitfalls, including misuse as global variables, difficulties in testing, and concurrency issues in multithreaded environments. It's important to be aware of these issues and use singletons judiciously.
Go Deh: "Staring at assorted perms and Python", or "The Ranking and Unranking of Lexical Permutations"
I have been playing with encryptions and then series that involved permutations. Submitting my Godeh series to the OEIS made me look closer at the permutations that Python produces, and at maybe reducing those permutations to a single integer.
(p.s. Post is best read on a larger than portrait phone screen)
Here are the start of the permutations used in the Godeh series rows:
(0,)
(0, 1)
(1, 0)
(0, 1, 2)
(0, 2, 1)
(1, 0, 2)
(1, 2, 0)
(2, 0, 1)
(2, 1, 0)
(0, 1, 2, 3)
(0, 1, 3, 2)
(0, 2, 1, 3)
(0, 2, 3, 1)
(0, 3, 1, 2)
(0, 3, 2, 1)
(1, 0, 2, 3)
(1, 0, 3, 2)
(1, 2, 0, 3)
(1, 2, 3, 0)
(1, 3, 0, 2)
(1, 3, 2, 0)
(2, 0, 1, 3)
(2, 0, 3, 1)
(2, 1, 0, 3)
(2, 1, 3, 0)
(2, 3, 0, 1)
(2, 3, 1, 0)
(3, 0, 1, 2)
(3, 0, 2, 1)
(3, 1, 0, 2)
(3, 1, 2, 0)
(3, 2, 0, 1)
(3, 2, 1, 0)
It can be generated by this:
from itertools import permutations, chainfrom math import factorialfrom typing import Sequencefrom pprint import pp# %% perms in Python generation ordergeneration_order = []for width in range(1, 5): for perm in permutations(range(width)): print(perm) generation_order.append(perm)
I truncate at all the perms for four items, but you get the idea - it could extend in this way without limit.
Ordered by...Successive groups of permutations are for incrementing items permuted. Within each group, of permutations of the same amount of items, the order is lexicographical, i.e. the Python sorted order,
I stared at, and messed around with the ordered perms of four items above, looking for patterns. But lets just confirm the order is sort by width then lexicographical:
wtl = sorted(set(generation_order), key=lambda x:(len(x), x))if generation_order == wtl: print(' == width_then_lexicographical ordering')else: print(' != width_then_lexicographical ordering of:') pp(wtl)# Prints: == width_then_lexicographical ordering
I remembered that I had created a Rank of a permutation task some time ago, (2012), and that there was something "off" with the ranking.
Sure enough the issue was that a rank number gets transformed into a unique permutation, and vice-versa, knowing the start configuration and given any permutation of it then yoy can generate its rank number - But the order of permutations for increasing rank number does not have to be that lexical sort used by Python, and indeed the Myrvold & Ruskey or Trottor & Johnson algorithms I converted to Python are not in lexical order.
So now I know I needed some way to rank and un-rank a lexically ordered set of permutations, as generated by Python.
Something to look at and talk aboutLets assign ranks to Python perms and column headings to make it easier to talk about:
from string import ascii_uppercasen = 4 # large enough to hopefully show patterns, but not too largeprint(' # : ' + ' '.join(char for char in ascii_uppercase[:n]))for rank, perm in enumerate(permutations(range(n))): print(f"{rank:>2} : {perm}")"""Outputs:
# : A B C D 0 : (0, 1, 2, 3) 1 : (0, 1, 3, 2) 2 : (0, 2, 1, 3) 3 : (0, 2, 3, 1) 4 : (0, 3, 1, 2) 5 : (0, 3, 2, 1) 6 : (1, 0, 2, 3) 7 : (1, 0, 3, 2) 8 : (1, 2, 0, 3) 9 : (1, 2, 3, 0)10 : (1, 3, 0, 2)11 : (1, 3, 2, 0)12 : (2, 0, 1, 3)13 : (2, 0, 3, 1)14 : (2, 1, 0, 3)15 : (2, 1, 3, 0)16 : (2, 3, 0, 1)17 : (2, 3, 1, 0)18 : (3, 0, 1, 2)19 : (3, 0, 2, 1)20 : (3, 1, 0, 2)21 : (3, 1, 2, 0)22 : (3, 2, 0, 1)23 : (3, 2, 1, 0)"""
I can't go through the many false trails I had in coming to my result, (there were several), but I finally noticed that The digits in column A stayed the same for 6 successive rows and six is three factorial. Column B stays the same for 2 successive rows i.e. 2! , I guessed 1! for column C and D is what's left.
A lot of deleted code later I found that, kinda, after you pop the first digit out of the starting perm at rank 0, which would be rank // (4 - 1)!; you could take the second digit out using indices using decreasing factorials, but also modulo the decreasing numbers left to index.
Hard to write a textual description of, but I was very chuffed to work out the following on my own:
def nth_perm(initial: Sequence[int], n: int) -> tuple[int]: init = list(initial) a = len(initial) fac_divs = tuple((n // factorial(a - j)) % (a - j + 1) for j in range(1, a))return tuple([init.pop(indx) for indx in fac_divs] + init)
nth_perm(range(4), 11)
And indeed the nth perm of (0, 1, 2 ,3) produced for n = 11 was (1, 3, 2, 0).
CheckI decided to check my perm-from-rank generator against the Python permutations:
start = range(4)for i, perm in enumerate(permutations(range(4))): assert perm == nth_perm(start, i)This had its own aha moments, and eventually I created the following code generating the rank of the sorted perm:
def s_perm_rank(p: Sequence[int]) -> int: """ Ranking of perm p in the sorted sequence of perms of the ints 0 .. len(p)-1 p must be a permutation of the integers 0 .. (len(p)-1) """ this = list(p) a = len(this) init = list(range(a)) # Perm of rank 0 assert set(this) == set(init), "p must be perm of the ints 0 .. (len(p)-1)"n, f = 0, a while this: f -= 1 n += init.index(this[0]) * factorial(f) init.remove(this.pop(0))
return n
To check I roundtrip from rank to perm to rank, and also check against the Python permutations ranking
n = 4init = range(n)for i, py_perm in enumerate(permutations(range(n))): s_perm = nth_perm(init, i) print(f"{i:>2}: {py_perm=} {py_perm==s_perm=} {i==s_perm_rank(s_perm)=}")It works!
TidyI searched online, read more mathematical articles in and around the subject of permutations and decided to tidy things up with renamings and changes to function arguments. For example, I found that lex is used as a short replacement for lexicographic ordering, and that a perm is usually generated from the integer width of the perm and the rank required, rather than giving the perm of rank 0 and the rank
The final, (?), version is:
#%% Tidydef lex_perm_unrank(width: int, rank: int) -> tuple[int]: """ Generate the lexicographic-permutation of ints 0 .. width-1 of given rank. Author: Donald S. McCarthy "Paddy3118" Date: August 2023 """ initial = list(range(width)) indices = [(rank // factorial(width - j)) % (width - j + 1) for j in range(1, width)]
return tuple([initial.pop(index) for index in indices] + initial)
# %% Checkfor w in range(7): for py_rank, py_perm in enumerate(permutations(range(w))): lex_perm = lex_perm_unrank(w, py_rank) assert lex_perm == py_permprint("lex_perm_unrank consistent with Python permutations order.")
# %% Tidierdef lex_perm_rank(p: Sequence[int]) -> int: """ Rank of perm p in the lexicographic ordered perms of ints 0 .. len(p)-1 p must be a permutation of the integers 0 .. (len(p)-1) Author: Donald S. McCarthy "Paddy3118" Date: August 2023 """ perm = list(p) width = len(perm) initial = list(range(width)) # Perm of rank 0 assert set(perm) == set(initial), \ "p must be a permutation of the integers 0 .. (len(p)-1)"
rank, f = 0, width while perm: f -= 1 rank += initial.index(perm[0]) * factorial(f) initial.remove(perm.pop(0))
return rank
# %% Check bothfor w in range(7): for py_rank, py_perm in enumerate(permutations(range(w))): lex_perm = lex_perm_unrank(w, py_rank) lex_rank = lex_perm_rank(lex_perm) assert lex_perm == py_perm and lex_rank == py_rankprint("lex_perm_rank/unrank consistent with Python permutations order.")
END.
Stack Abuse: Python: Accessing the Last Element of a List
In Python, lists are one of the most used data types,
We commonly use lists to store data in Python, and for good reason, they offer a great deal of flexibility with their operations. One of those operation that often comes up is the need to access the last element of a list.
This Byte will guide you through several methods to achieve this, including negative indexing, slicing, and the itertools module.
Using Negative IndexingPython supports negative indexing, which allows us to access elements from the end of the list. The index of -1 refers to the last item, -2 refers to the second last item, and so on. So here's how you can get the last element of a list using negative indexing:
my_list = [1, 2, 3, 4, 5] last_element = my_list[-1] print(last_element)Output:
5Note: Remember that negative indexing starts from -1. This is a feature specific to Python and not available in all programming languages.
Accessing Last n ElementsIf you want to get more than one element from the end of the list, you can use slicing. Slicing in Python allows you to get a subset of the list. Here's how you can get the last n elements of a list:
my_list = [1, 2, 3, 4, 5] last_two_elements = my_list[-2:] print(last_two_elements)Output:
[4, 5]In the above example, my_list[-2:] gets the last two elements of the list. You can replace 2 with any number to get that many elements from the end of the list.
Using itertoolsThe itertools module in Python comes with a function called islice() that can be used to get the last n elements of a list. Here's how you can do it:
from itertools import islice my_list = [1, 2, 3, 4, 5] last_two_elements = list(islice(my_list, len(my_list)-2, None)) print(last_two_elements)Output:
[4, 5]In the above example, islice() takes three parameters: the iterable, start index, and end index. We're passing len(my_list)-2 as the start index and None as the end index to get the last two elements. You can replace 2 with any number to get that many elements from the end of the list.
Comparing the MethodsWe've looked at a few different methods to get the last element of a list in Python. Each has its own strengths and weaknesses, and the best one to use can depend on your specific situation.
Negative indexing is probably the most straightforward. It's built right into Python and doesn't require any extra imports. It's also quite efficient, since getting an item by index is a constant-time operation in Python lists.
my_list = [1, 2, 3, 4, 5] print(my_list[-1]) # Outputs: 5On the other hand, if you need to get the last n elements of a list, negative indexing becomes less convenient. You could use slicing, but this creates a new list, which can be inefficient if n is large.
my_list = [1, 2, 3, 4, 5] print(my_list[-3:]) # Outputs: [3, 4, 5]This is where itertools comes in. The itertools.islice function can get the last n elements without creating a new list. However, it does require an extra import, and the syntax is a bit more complex.
import itertools my_list = [1, 2, 3, 4, 5] print(list(itertools.islice(my_list, len(my_list) - 3, None))) # Outputs: [3, 4, 5]Note: Remember that itertools.islice returns an iterator, so you'll need to convert it to a list (with the list function) if you want to use it like a list.
Potential IssuesWhile these methods are generally quite reliable, there are a number of potential issues to be aware of, especially for beginners that are more prone to mistakes.
First, all of these methods assume that the list is not empty. If the list is empty, they will all raise an IndexError. You can avoid this by checking the length of the list before trying to access its last element.
my_list = [] if my_list: print(my_list[-1]) # This line will not be executed if the list is emptySecond, remember that slicing a list creates a new list. This can be a problem if your list is very large and memory is a concern. Duplicating a list can be an expensive operation if it's large enough.
Finally, keep in mind that itertools.islice returns an iterator, not a list. This means that you can only iterate over the result once. If you need to use the result multiple times, you should convert it to a list.
ConclusionIn this Byte, we've explored several methods to get the last element of a list in Python, including negative indexing, slicing, and using itertools. Each method has its own advantages and potential issues.
Negative indexing is simple and efficient, but less convenient for getting the last n elements. Slicing is more flexible, but can be inefficient for large n. itertools provides a more efficient solution for large n, but the syntax is more complex and it returns an iterator rather than a list.
Stack Abuse: Dropping NaN Values in Pandas DataFrame
When working with data in Python, it's not uncommon to encounter missing or null values, often represented as NaN. In this Byte, we'll see how to handle these NaN values within the context of a Pandas DataFrame, particularly focusing on how to identify and drop rows with NaN values in a specific column.
NaN Values in PythonIn Python, NaN stands for "Not a Number" and it is a special floating-point value that cannot be converted to any other type than float. It is defined under the NumPy library, and it's used to represent missing or undefined data.
It's important to note that NaN is not equivalent to zero or any other number. In fact, NaN is not even equal to itself. For instance, if you compare NaN with NaN, the result will be False.
import numpy as np # Comparing NaN with NaN print(np.nan == np.nan) # Output: False What is a DataFrame?A DataFrame is a two-dimensional labeled data structure with columns, which can be potentially different types, much like a spreadsheet or SQL table, or a dictionary of Series objects. It's one of the primary data structures in Pandas, and therefore often used for data manipulation and analysis in Python. You can create DataFrame from various data types like dict, list, set, and from series as well.
import pandas as pd # Creating a DataFrame data = {'Name': ['John', 'Anna', 'Peter', 'Linda'], 'Age': [28, 24, 35, np.nan]} df = pd.DataFrame(data) print(df)This will output:
Name Age 0 John 28.0 1 Anna 24.0 2 Peter 35.0 3 Linda NaN Why Drop NaN Values from a DataFrame?NaN values can be a problem when doing data analysis or building machine learning models since they can lead to skewed or incorrect results. While there are methods to fill in NaN values with a specific value or an interpolated value, sometimes the simplest and most effective way to handle them is to drop the rows or columns that contain them. This is particularly true when the proportion of NaN values is small, and their absence won't significantly impact your analysis.
How to Identify NaN Values in a DataFrameBefore we start dropping NaN values, let's first see how we can find them in your DataFrame. To do this, you can use the isnull() function in Pandas, which returns a DataFrame of True/False values. True, in this case, indicates the presence of a NaN value.
# Identifying NaN values print(df.isnull())This will output:
Name Age 0 False False 1 False False 2 False False 3 False TrueNote: The isnull() function can also be used with the sum() function to get a total count of NaN values in each column.
# Count of NaN values in each column print(df.isnull().sum())This will output:
Name 0 Age 1 dtype: int64 Dropping Rows with NaN ValuesNow that we have an understanding of the core components of this problem, let's see how we can actually remove the NaN values. Pandas provides the dropna() function to do just that.
Let's say we have a DataFrame like this:
import pandas as pd import numpy as np df = pd.DataFrame({ 'A': [1, 2, np.nan, 4], 'B': [5, np.nan, 7, 8], 'C': [9, 10, 11, 12] }) print(df)Output:
A B C 0 1.0 5.0 9 1 2.0 NaN 10 2 NaN 7.0 11 3 4.0 8.0 12To drop rows with NaN values, we can use:
df = df.dropna() print(df)Output:
A B C 0 1.0 5.0 9 3 4.0 8.0 12This works well as you call it on the actual DataFrame object, making it easy to use and less error prone. However, what if we don't want to get rid of each row containing a NaN, but instead we'd rather get rid of the column that contains it. We'll show that in the next section.
Dropping Columns with NaN ValuesSimilarly, you might want to drop columns with NaN values instead of rows. Again, the dropna() function can be used for this purpose, but with a different parameter. By default, dropna() drops rows. To drop columns, you need to provide axis=1.
Let's use the same DataFrame as above:
df = pd.DataFrame({ 'A': [1, 2, np.nan, 4], 'B': [5, np.nan, 7, 8], 'C': [9, 10, 11, 12] })To drop columns with NaN values, we can use:
df = df.dropna(axis=1) print(df)Output:
C 0 9 1 10 2 11 3 12As you can see, this drops the columns A and B since they both contained at least one NaN value.
Replacing NaN Values Instead of DroppingSometimes, dropping NaN values might not be the best solution, especially when you don't want to lose data. In such cases, you can replace NaN values with a specific value using the fillna() function.
For instance, let's replace NaN values in our DataFrame with 0:
df = pd.DataFrame({ 'A': [1, 2, np.nan, 4], 'B': [5, np.nan, 7, 8], 'C': [9, 10, 11, 12] }) df = df.fillna(0) print(df)Output:
A B C 0 1.0 5.0 9 1 2.0 0.0 10 2 0.0 7.0 11 3 4.0 8.0 12Note: The fillna() function also accepts a method argument which can be set to 'ffill' or 'bfill' to forward fill or backward fill the NaN values in the DataFrame.
For certain datasets, replacing the value with something like 0 is more valuable than dropping the entire row, but all depends on your use-case.
ConclusionDealing with NaN values is a common task when working with data in Python. In this Byte, we've covered how to identify and drop rows or columns with NaN values in a DataFrame using the dropna() function. We've also seen how to replace NaN values with a specific value using the fillna() function. Remember, the choice between dropping and replacing NaN values depends on the specific requirements of your data analysis task.
Stack Abuse: Randomly Select an Item from a List in Python
In Python, lists are among the most widely used data structures due to their versatility. They can hold a variety of data types and are easily manipulated. One task you may come across is having to randomly select an item from a list.
This Byte will guide you through how to do this using a few different methods. From this, you can then choose which one you prefer or best fits your use-case.
Why randomly select an item?Randomly selecting an item from a list is a common operation in many programming tasks. For instance, it can be used in games for creating random behaviors, in machine learning for splitting datasets into training and test sets, or in simulations for generating other random inputs. Understanding how to randomly select an item from a list can be a useful tool in your Python code.
Method: Using random.choice()The random.choice() function is the most straightforward way to select a random item from a list. This function is part of the random module, so you need to import this module before using it.
Here's an example:
import random my_list = ['apple', 'banana', 'cherry', 'date', 'elderberry'] random_item = random.choice(my_list) print(random_item)Running this code will output a random item from the list each time. For example:
$ python3 random_choice.py bananaHeads up! The random.choice() function will raise an IndexError if the input list is empty. So, make sure your list has at least one item before using this function.
Method: Using random.randint()Another way to randomly select an item from a list is by using the random.randint() function. This function generates a random integer within a specified range, which can be used as an index to select an item from the list.
Here's how you can do it:
import random my_list = ['apple', 'banana', 'cherry', 'date', 'elderberry'] random_index = random.randint(0, len(my_list) - 1) random_item = my_list[random_index] print(random_item)Running this code will also output a random item from the list each time. For example:
$ python3 random_randint.py dateThe random.randint() function includes both end points while generating the random integer, so we must subtract 1 from the list length to avoid an IndexError.
This method might be best if you only want to select a random choice from part of the list. For example, if your list is 100 items long, you can set the 2nd argument to 50 to only choose from the first half. Thus, this method gives you a bit more control than random.choice().
Randomly Selecting Multiple ItemsYou can easily select multiple items from a list randomly using the random.sample() function. This function returns a particular length list of items chosen from the sequence you provide. Let's say we want to select three random items from a list:
import random my_list = ['apple', 'banana', 'cherry', 'date', 'elderberry'] random_sample = random.sample(my_list, 3) print(random_sample)This might output:
['date', 'apple', 'cherry']The random.sample() function is a great way to get multiple random items from a list. However, keep in mind that the number of items you request should not exceed the length of the list! If it does, you'll get a ValueError.
Note: The random.sample() function does not allow for duplicates. Each item in the returned list will be unique.
Randomly Choosing Unique ValuesDepending on your use-case, you may want to select random items from a list but don't want to select the same item twice. In this case, you can use the random.sample() function as it ensures that there are no duplicates in the output.
However, if you want to select items randomly from a list and allow for duplicates, you can use a loop with random.choice(). Here's an example:
import random my_list = ['apple', 'banana', 'cherry', 'date', 'elderberry'] random_choices = [random.choice(my_list) for _ in range(3)] print(random_choices)This might output:
['date', 'date', 'cherry']Here, 'date' was chosen twice. This method is useful when you want to allow your code to choose the same item multiple times.
ConclusionRandom selection from a list is a common task in Python, and the random module provides several methods to achieve this. The random.choice() and random.randint() methods are useful for selecting a single item, while random.sample() can select multiple items without repetition. If you need to select multiple items with possible repetitions, a loop with random.choice() is the way to go.
Stack Abuse: How to Reverse a String in Python
Python is a versatile and powerful language with a wide array of built-in functions and libraries. Whether its for a coding interview or your application, you may find yourself needing to reverse a string. This could be for a variety of reasons, like data manipulation, algorithm requirements, or simply for the purpose of solving a coding challenge.
In this Byte, we'll explore different methods of reversing a string in Python.
Strings in PythonBefore we get into the methods of reversing a string, let's briefly take a look at what strings are in Python. A string is a sequence of characters and is one of the basic data types in Python. It can be created by enclosing characters inside a single quote or double-quotes.
my_string = "Hello, World!" print(my_string)Output:
Hello, World!Strings in Python are immutable. This means that once a string is created, it cannot be changed. Any operation that seems to modify a string will actually create a new string.
Methods of Reversing a String in PythonThere are several ways to reverse a string in Python. Each method has its own advantages and disadvantages, and the best method to use depends on the specific situation. Here are a few common methods:
- Using the slice operator
- Using the reversed() function
- Using the join() method
- Using a for loop
We'll go through each method in detail in the following sections.
Using the Slice OperatorThe slice operator [:] in Python can be used to slice a string (or any sequence) in various ways. It can also be used to reverse a string. The syntax for this is string[::-1].
my_string = "Hello, World!" reversed_string = my_string[::-1] print(reversed_string)Output:
!dlroW ,olleHIn this code, [::-1] is an example of slice notation. It means start at the end of the string and end at position 0, move with the step -1 (which means one step backwards). This effectively reverses the string.
Note: The slice operator is a simple and efficient method to reverse a string. However, it may not be immediately clear to someone reading the code what it does, especially if they are not familiar with Python's slice notation. If you use this method, be sure to include comments to make it clear what is going on.
Using the reversed() FunctionThe reversed() function in Python is a built-in function that reverses objects of list in place. However, it doesn't work directly with strings. If you try to use it directly with a string, it will return a reverse iterator object. To make it work with strings, you can convert the returned reverse iterator object to a string using the join() method.
Here's how you can do it:
def reverse_string(input_string): return ''.join(reversed(input_string)) print(reverse_string("Hello, World!"))When you run this code, it will print:
!dlroW ,olleH Using a for LoopI'm showing this section last because it's unlikely you'll want to use it in a real-world application. The above methods are less error prone and cleaner. However, I'm showing it because I think it's important you understand how the process actually works.
To use a for loop for this process, all we need to do is iterate through each character, which we can do since strings are basically just arrays of characters. We then append the current character to the start of a new string, which results in the string being reversed by the end.
original_string = "Hello, World!" reversed_string = "" for char in original_string: reversed_string = char + reversed_string print(reversed_string) # Output: "!dlroW ,olleH" Using reversed() and join()Another alternative is to use the reversed() function with join():
def reverse_string(input_string): return ''.join(reversed(input_string)) print(reverse_string("Hello, World!")) # Output: "!dlroW ,olleH"The reversed() function returns an iterator that accesses the given sequence in reverse order, so we can pass it directly to the join() method to create the reversed string.
Again, just like the for loop method, this isn't very practical for a simple string reversal, but it does help illustrate other methods that can be used for something like this, which you can then apply to other areas when needed.
Which method should I use?Both the .reversed() function and the slice operator are both effective ways to reverse a string in Python. The choice between the two often comes down to personal preference and the specific requirements of your code.
In my opinion, the .reversed() method is better for readability since it's more obvious what is going on, so that should be used in most cases.
However, the slice operator method is much more concise and may be better for cases where you have many manipulations to make on a given string, which would help reduce the verbosity of your code.
ConclusionReversing a string in Python can be best accomplished through a few methods, like using the .reversed() function or slicing operator. While both methods are effective, the choice between the two often depends on personal preference and the specific needs of your code.
PyCharm: Getting Started in Data Science: EuroPython 2023 Follow-Up
One of my favorite parts of my job as a developer advocate is being able to help people get started in data science. I still remember when I made the transition from academia to data science almost 8 years ago, and how overwhelming it was and how much I felt like I needed to learn to even get started. I am also truly passionate about this wonderful field, and I love to help others get started in an area that is so interesting and rewarding.
I was lucky enough to be involved in a couple of activities geared toward helping data science beginners at EuroPython this year, including the Humble Data workshop and a Q&A session for data science newbies along with Cheuk Ting Ho, Valerio Maggio, and Vaibhav (VB) Srivastav. After both of these sessions I had a lot of great conversations with people who asked about which resources helped me when I was starting, and I wanted to share the content of these conversations a bit more widely.
Let’s first recap what we covered in the Q&A session, and then dive into some further resources to get you started on your data science journey.
What we covered in the Q&A session How do you define what a data scientist is in 2023?Just like when I started in 2016, data science is defined differently depending on who you talk to. However, the field has definitely gotten more complicated as it has matured, with additional roles like machine learning and MLOps engineers becoming established in the last few years.
Despite all of the continued confusion, the core of the role remains working with data to tell a story scientifically (after all, it’s in the name!). This involves applying techniques like data preparation and analysis, statistics, and visualization to answer a question that is typically somewhat complex. While machine learning has become synonymous with data science, it’s not actually a core part of data science work. Some data science projects may involve machine learning, but certainly not all of them.
What skills do data scientists tend to have?There is a well-known Venn diagram that has been circulating since before I even started in data science. It depicts the field as a convergence of mathematical skills, engineering skills, and domain knowledge. When I first started out, this diagram really overwhelmed me; I felt like I needed to master all three of these to even get started!
In reality, it is impossible to know every skill used in data science in depth. Some people will come in with more strengths in mathematics or scientific skills, others will come from a software engineering background, and they’ll all pick up the remaining skills on the job. The split between data science roles also means you can play to your strengths and interests better. Those who have more experience with analysis or statistics may go for a more traditional data scientist role, while those with stronger engineering skills may gravitate toward machine learning engineering.
Finally, unless you work in a tiny startup, it’s unlikely you will be working alone. Data scientists tend to do the research and prototyping side of things, while engineers put the models into production. So don’t worry if you’re not an expert at everything – there’s a place for your skills in this field!
How can I start developing my skills?One of the most common misconceptions about data science is that you need a PhD or some other advanced degree. However, this is just one possible path for developing the core skill set of data scientists we talked about above.
The best way to develop this skill is just to get hold of datasets that interest you and start creating projects with them. VB in particular found the subreddit r/dataisbeautiful helpful for getting motivation and feedback. I love writing, so I started a blog. Cheuk recommends volunteering for organizations like DataKind and having a community around you. Once you have a feel for working with real data, you have one of the most important skills mastered and you’ll build the rest on top of this.
Finally, the main thing is not to panic! Just choose the tooling (language, development environment, and packages) that you like best in the beginning, and build up your skills using these. I personally loved R when I started because it was designed for people from statistics backgrounds and suited me better, but over time I switched to Python as I moved more into machine learning.
Useful resourcesTo help you continue your data science journey, I’m also including a list of resources I’ve found useful in the past (or content I’ve created to cover specific topics).
Programming languagesYour first step will be getting some basic programming under your belt – and by basic, I really do mean basic! I’d recommend starting with either R or Python. There are dozens of courses for each online, but I can recommend the two that I used: R for Psychological Science and Learn Python the Hard Way.
You should also try to include SQL in your coding toolbelt. I’ve found that W3Schools’ SQL course is a great place to get started.
Data analysisLearning pandas is fundamental to getting started with data analysis in Python, and I cannot recommend Wes McKinney’s book Python for Data Analysis highly enough. Once you’ve finished with that book, you probably want to start playing with some real data. For this, I recommend two sources: the UC Irvine Machine Learning Repository and Kaggle Datasets.
From there, you will probably want to get into data visualization. For R, the gold standard for graphing is ggplot2, but there is more diversity in Python plotting packages, which include Matplotlib, seaborn, plotly, lets-plot, plotnine, and more. I think the best way to get started with plotting is just to think about what you want to show (maybe check out r/dataisbeautiful for inspiration) and start messing around with a plotting package that you like.
Once you want to start covering data cleaning and issues, you may want to pick up another book or course to cover this. I have a talk where I give an overview of some of the major issues that can come up in datasets and negatively affect your data science work. Much of this talk’s contents comes from one of my university statistics books, Using Multivariate Statistics.
Statistics and machine learningOnce you’re ready to dive into more advanced topics, you can start covering statistics and machine learning. I think these are both topics you can cover bit by bit (as they can be quite dense), so don’t feel like you need to master everything before you can start working as a data scientist.
While I learned statistics from my university textbooks (which are probably a bit too specific to psychology to recommend widely), I have heard nothing but good things about Think Stats. In terms of machine learning, there are a few options. I personally loved Andrew Ng’s Machine Learning Specialization for machine learning and François Chollet’s Deep Learning for an introduction to deep learning. I’ve also had friends who really liked both the classic Introduction to Statistical Learning and Google’s Machine Learning Crash Course.
Shout out to Humble Data!And as a final plug – if you’re looking for a way to get started but want some more support, you can also keep your eye out for the next Humble Data workshop! This free workshop is aimed at getting you up and running with basic Python data science, going from the basics of Python programming to working with pandas and data visualization.
Stack Abuse: The Python 3 Equivalent of SimpleHTTPServer
In this article, we'll explore Python's built-in HTTP servers. We will discuss the SimpleHTTPServer module, its Python 3 equivalent, and how to run these servers via the command line. This knowledge is crucial for developers who need to quickly set up a server for testing or sharing files.
What is the SimpleHTTPServer?The SimpleHTTPServer module is a Python 2.x built-in module that allows you to create a simple HTTP server. This server can serve files from the directory it's run in, making it a nice tool for testing web pages or even sharing files.
# Python 2.x $ python -m SimpleHTTPServer Serving HTTP on 0.0.0.0 port 8000 ... The http.server in Python 3With the advent of Python 3, the SimpleHTTPServer was replaced by the http.server module. The http.server module provides similar functionality to the SimpleHTTPServer but is updated to work with Python 3.
The http.server module also includes a more robust HTTP request handler than SimpleHTTPServer, offering more control over HTTP responses.
# Python 3.x $ python3 -m http.server Serving HTTP on 0.0.0.0 port 8000 ... Running the Server via Command LineRunning the server via the command line is straightforward. In Python 2.x, you would use the SimpleHTTPServer module like so:
$ python -m SimpleHTTPServerIn Python 3.x, you would use the http.server module instead:
$ python3 -m http.serverBoth commands will start a server on port 8000, serving files from the current directory. If you want to specify a different port, you can do so by adding the port number at the end of the command:
$ python3 -m http.server 8080 Running a Basic HTTP Server in Python 3With Python 3, running a basic HTTP Server is as simple as using the http.server module. This module is a straightforward and efficient way of serving up files and directories on your machine. Here's how you can do it:
$ python3 -m http.serverOnce you run this command, you should see something like this:
$ python3 -m http.server Serving HTTP on 0.0.0.0 port 8000 (http://0.0.0.0:8000/) ...This means your HTTP server is up and running. You can then access it by going to http://localhost:8000 in your web browser. By default, the server is set to port 8000, but you can specify a different port by appending it to the command:
$ python3 -m http.server 8080This command will start the server on port 8080.
CGI Handling in Python 3In earlier versions of Python, the CGIHTTPServer module was commonly used to handle CGI (Common Gateway Interface) scripts. Starting a CGI server was as simple as running the command:
$ python -m CGIHTTPServerHowever, starting with Python 3.3, the CGIHTTPServer module was removed, and its functionality was incorporated into the http.server module. This was done in order to simplify the HTTP server capabilities in the standard library.
The modern equivalent for starting a CGI server in Python 3.3 and later versions is:
$ python3 -m http.server --cgiBy using the --cgi option with the http.server module, you can enable the same CGI handling functionality that was available with CGIHTTPServer. This should make migrating over to Python 3 much easier.
Differences Between SimpleHTTPServer and http.serverWhile SimpleHTTPServer and http.server essentially perform the same function, there are a few key differences between them. The most significant difference is that SimpleHTTPServer is only available in Python 2, while http.server is available in Python 3.
Another notable difference is that http.server is more secure than SimpleHTTPServer. The http.server module does not execute or interpret any code, making it safer to use. On the other hand, SimpleHTTPServer can potentially execute arbitrary Python code present in the web directory. Because of this, I'd highly recommend you use http.server when possible.
Be careful! Always be cautious when serving files and directories from your personal or development computer, especially when doing so over a network. Never serve sensitive information or allow server access to untrusted individuals.
ConclusionPython 3's http.server module is a simple and effective tool for running an HTTP server. It's a more secure and updated version of Python 2's SimpleHTTPServer, and it's just as easy to use. Whether you're testing a website, sharing files over a network, or just playing around, http.server is a great tool to have at your disposal.
Real Python: The Real Python Podcast – Episode #169: Improving Classification Models With XGBoost
How can you improve a classification model while avoiding overfitting? Once you have a model, what tools can you use to explain it to others? This week on the show, we talk with author and Python trainer Matt Harrison about his new book Effective XGBoost: Tuning, Understanding, and Deploying Classification Models.
[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
TestDriven.io: Customizing the Django Admin
Python Insider: Python 3.11.5, 3.10.13, 3.9.18, and 3.8.18 is now available
There’s security content in the releases, let’s dive right in.
- gh-108310: Fixed an issue where instances of ssl.SSLSocket were vulnerable to a bypass of the TLS handshake and included protections (like certificate verification) and treating sent unencrypted data as if it were post-handshake TLS encrypted data. Security issue reported as CVE-2023-40217 1 by Aapo Oksman. Patch by Gregory P. Smith.
Upgrading is highly recommended to all users of affected versions.
Python 3.11.5Get it here: https://www.python.org/downloads/release/python-3115/
This release was held up somewhat by the resolution of this CVE, which is why it includes a whopping 328 new commits since 3.11.4 (compared to 238 commits between 3.10.4 and 3.10.5). Among those, there is a fix for CVE-2023-41105 which affected Python 3.11.0 - 3.11.4. See gh-106242 for details.
There are also some fixes for crashes, check out the change log to see all information.
Most importantly, the release notes on the downloads page include a description of the Larmor precession. I understood some of the words there!
Python 3.10.13Get it here: https://www.python.org/downloads/release/python-31013/
16 commits.
Python 3.9.18Get it here: https://www.python.org/downloads/release/python-3918/
11 commits.
Python 3.8.18Get it here: https://www.python.org/downloads/release/python-3818/
9 commits.
Stay safe and upgrade!Thanks to all of the many volunteers who help make Python Development and these releases possible! Please consider supporting our efforts by volunteering yourself or through organization contributions to the Python Software Foundation.
–
Łukasz Langa @ambv
on behalf of your friendly release team,
Ned Deily @nad
Steve Dower @steve.dower
Pablo Galindo Salgado @pablogsal
Łukasz Langa @ambv
Thomas Wouters @thomas
PyCharm: PyCharm 2023.2.1 Is Out!
PyCharm 2023.2.1, the first bug-fix update for PyCharm 2023.2, is now available!
You can update to v2023.2.1 by using the Toolbox App, installing it right from the IDE, or downloading it from our website.
Here are the most notable fixes available in this version:
Updates to profiler support and code coverageProfiler and code coverage functionality is now available for projects using remote interpreters, like those on SSH, WSL, Docker, and Docker Compose. You can now use cProfile, yappi, and vmprof. Additionally, you can now use profilers for projects that use Python 3.12.
Django Inherited HTTP methods in the Endpoints tool windowIn PyCharm 2023.2, we added support for Django in the Endpoints tool window to help you work with the Django Rest Framework more easily. Starting from this update, you will also be able to work with inherited HTTP methods for your Django views in the Endpoints tool window. [PY-61405]
We also fixed the bug that prevented generating HTTP requests in the Endpoints tool window for lowercase method names. [PY-62033]
PyCharm will now show the correct results for the Go To Declaration action for routes in the HTTP Client when working with the Django Rest Framework. This also works for FastAPI and Flask.
Run manage.py Task from the main menu with remote interpretersWhen working with a project with a remote interpreter on Docker, Docker Compose, SSH, or WSL, you can now run the manage.py task from the main menu (Tools | Run manage.py Task). [PY-52610]
Python Run/Debug Configuration Updates to the Parameters fieldIn the updated Python Run / Debug configuration, the Parameters field is available by default. We increased the minimum width for the field and restored the ability to add macros to the Parameters field. [PY-61917], [PY-59738]
We also fixed a bug that made it impossible to delete an option to add content roots or source roots to the PYTHONPATH. [PY-61902]
Black formatter: an option to suppress warnings about non-formatted filesIn PyCharm 2023.2, we added built-in support for the Black formatter. If you have Black configured in PyCharm, the IDE will check whether each file you are working with is formatted properly. When your code is not formatted with Black, PyCharm will notify you. If you don’t want to use the Black formatter for a particular file or the whole project, you can now suppress warnings about non-formatted files.
You can set this up in Settings | Appearance & Behavior | Notifications.
Updates to frontend development supportWe’ve added support for:
- CSS system colors. [WEB-59994]
- CSS trigonometric and exponential functions. [WEB-61934]
- .mjs and .cjs config files in Prettier. [WEB-61966]
- You can again run multiprocessing scripts in the Python console. [PY-50116]
- Changing themes on Linux now works as expected. [IDEA-283945]
- The IDE no longer enters full screen mode unexpectedly on a secondary monitor when the Linux native header is switched off. [IDEA-326021]
- Updating bundled plugins no longer removes plugin files from the IDE’s installation folder. [IDEA-326800]
- We fixed the behavior of the Go To Implementation and Go To Declaration actions when Python stubs are involved. PyCharm now shows the implementation instead of .pyi stubs. [PY-54905], [PY-54620], [PY-61740]
For the full list of issues addressed in PyCharm 2023.2.1, please see the release notes. Please, feel free to share your feedback with us or report any bugs you encounter using our issue tracker.
PyCharm: PyCharm 2023.2.1 Is Out!
PyCharm 2023.2.1, the first bug-fix update for PyCharm 2023.2, is now available!
You can update to v2023.2.1 by using the Toolbox App, installing it right from the IDE, or downloading it from our website.
Here are the most notable fixes available in v2023.2.1:
Updates to profiler support and code coverageProfiler and code coverage functionality is now available for projects using remote interpreters, like those on SSH, WSL, Docker, and Docker Compose. You can now use cProfile, yappi, and vmprof. Additionally, you can now use profilers for projects on Python 3.12.
Django Inherited HTTP methods in the Endpoints tool windowIn PyCharm 2023.2, we added support for Django in the Endpoints tool window so that you can work with the Django Rest Framework easily. Starting from this update, you will also be able to work with inherited HTTP methods for your Django views in the Endpoints tool window. [PY-61405]
We also fixed the bug that prevented generating HTTP requests in the Endpoints tool window for lowercase method names. [PY-62033]
PyCharm will now show the correct results for the Go To Declaration action for routes in the HTTP Client when working with the Django Rest Framework. This also works for FastAPI and Flask.
Run manage.py Task from the main menu with the remote interpretersWhen working with a project with a remote interpreter on Docker, Docker Compose, SSH, or WSL, you can now run the manage.py task from the main menu (Tools | Run manage.py Task). [PY-52610]
Python Run/Debug Configuration Updates to the Parameters fieldIn the updated Python Run / Debug configuration, the Parameters field is available by default. We increased the minimum width for the field and restored the ability to add macros to the Parameters field. [PY-61917], [PY-59738]
We also fixed a bug making it impossible to delete an option to add content roots or source roots to the PYTHONPATH. [PY-61902]
Black formatter: an option to suppress warnings about non-formatted filesIn PyCharm 2023.2, we added built-in support for the Black formatter. If you configured Black in PyCharm, the IDE will check if each file you are working with is formatted properly. When your code is not formatted with Black, PyCharm will notify you. If you don’t want to use the Black formatter for a particular file or whole project, we’ve added the ability to suppress warnings about non-formatted files.
You can set this up in Settings | Appearance & Behavior | Notifications.
Updates to frontend development support- We’ve added support for CSS system colors. [WEB-59994]
- We’ve added support for CSS trigonometric and exponential functions. [WEB-61934]
- We’ve added support for .mjs and .cjs config files in Prettier. [WEB-61966]
- You can again run multiprocessing scripts in the Python console. [PY-50116]
- Changing themes on Linux now works as expected. [IDEA-283945]
- The IDE no longer enters full screen mode unexpectedly on a secondary monitor when the Linux native header is switched off. [IDEA-326021]
- Updating bundled plugins no longer removes plugin files from the IDE’s installation folder. [IDEA-326800]
- We fixed the behavior of the Go To Implementation and Go To Declaration actions when Python stubs are involved. PyCharm now shows the implementation instead of .pyi stubs. [PY-54905], [PY-54620], [PY-61740]
For the full list of issues addressed in PyCharm 2023.2.1, please see the release notes. Please, feel free to share your feedback with us or report any bugs you encounter using our issue tracker.
EuroPython Society: EPS 2023 General Assembly - Call for Board Candidates
It feels like yesterday that many of us were together in Prague or online for EuroPython 2023. Each year, the current board of the EuroPython Society (EPS) holds a General Assembly (GA). It is a precious opportunity for all our members to get together annually, and reflect on the learnings of the past and direction for the future the Society holds.
This year’s GA will be held online again to allow as many members as possible to engage with us. We have tentatively reserved the date 1 October for the GA. But official confirmation will be sent out as soon as we receive the go-ahead from our auditor on the finance side.
As an EPS member, you are welcome and encouraged to join us to discuss Society matters and vote at the meeting, including the next Society Board. A Zoom meeting link will be sent out to you with the formal General Assembly Invitation.
Calling for Board CandidatesEvery year at the GA, we call for and vote in a new EPS Board of Directors. This is also our main theme of this post: we are calling for the next Board candidates.
This year, we have at least 4 members from the current board standing down, including myself who will be stepping down as chair and from the board. While transition always poses challenges, it is a chance to take in new experience, fresh perspectives and more diversity. With most, if not all, female board members from the current board stepping down, we are especially worried about the diversity of our next board and welcome all suggestions and nominations from our members to help make our next board diverse.
If you are interested in stepping up, or if you know someone who might be, please get in touch with us! You can reach the current board at board@europython.eu. We also have set up a private discord thread for you to get to know all interested candidates and ask any questions you might have. Get in touch with us if you would like an invite!
What does the EPS Board do?As per our bylaws, the EPS board is made up of up to 9 directors (including 1 chair and 1 vice chair). The duties and responsibilities of the board are substantial: the board collectively takes up the fiscal and legal responsibility of the Society. At the moment, running the annual EuroPython conference is a major task for the EPS. As such, the board members are expected to invest significant time and effort towards overseeing the smooth execution of the conference, ranging from venue selection, contract negotiations, and budgeting, to volunteer management. Every board member has the duty to support one or more EuroPython teams to facilitate decision-making and knowledge transfer. In addition, the Society prioritises building a close relationship with local Python communities in Europe. Board members should be passionate about the Python community, and ideally also have a high-level vision and plan for how the EPS could best serve the community.
Time commitment for the board: as the Society currently comprises entirely volunteers, serving on the board does come with a significant time commitment. This is particularly important to keep in mind, due to the changes EPS will undergo this year. However, everyone has been very understanding of differing schedules. Other than the 1.5 hour board call we expect all board members to attend every two weeks, we have managed to primarily work async.
The Nomination ProcessAll EPS members are eligible to stand for election to the board of directors . And everyone who wishes to stand or nominate others need to send in your nomination notice, along with a biography of yours.
Though the formal deadline for sending in your nomination is at the time of the GA, we would appreciate it if you could return it to us by emailing board@europython.eu by Friday 15 September 2023. We will publish all the candidates and their nomination statements on a separate blog post for our members to read in advance.
Then at the General Assembly, each candidate will usually be given a minute to introduce themselves before the members cast their anonymous votes. You can find out refer to our previous GAs if you want to find out more details: https://www.europython-society.org/records/
If you have any questions or concerns, you are also very welcome to reach out to me directly at raquel@europython.eu.
Raquel Dou
Stack Abuse: Creating a Directory and its Parent Directories in Python
In Python, we often need to interact with the file system, whether it's reading files, writing to them, or creating directories. This Byte will focus on how to create directories in Python, and more specifically, how to create a directory and any missing parent directories. We'll be exploring the os.mkdir and os.makedirs functions for this purpose.
Why do we need to create the parent directories?When working with file systems, which is common for system utilities or tools, you'll likely need to create a directory at a certain path. If the parent directories of that path don't exist, you'll encounter an error.
To avoid this, you'll need to create all necessary parent directories since the OS/filesystem doesn't handle this for you. By ensuring all the necessary parent directories exist before creating the target directory, you can avoid these errors and have a more reliable codebase.
Creating a Directory Using os.mkdirThe os.mkdir function in Python is used to create a directory. This function takes the path of the new directory as an argument. Here's a simple example:
import os os.mkdir('my_dir')This will create a new directory named my_dir in the current working directory. However, os.mkdir has a limitation - it can only create the final directory in the specified path, and assumes that the parent directories already exist. If they don't, you'll get a FileNotFoundError.
import os os.mkdir('parent_dir/my_dir')If parent_dir doesn't exist, this code will raise a FileNotFoundError.
Note: The os.mkdir function will also raise a FileExistsError if the directory you're trying to create already exists. It's always a good practice to check if a directory exists before trying to create it. To do this, you can pass the exist_ok=True argument, like this: os.makedirs(path, exist_ok=True). This will make the function do nothing if the directory already exists.
One way to work around the limitation of os.mkdir is to manually check and create each parent directory leading up to the target directory. The easiest way to approach this problem is to split our path by the slashes and check each one. Here's an example of how you can do that:
import os path = 'parent_dir/sub_dir/my_dir' # Split the path into parts parts = path.split('/') # Start with an empty directory path dir_path = '' # Iterate through the parts, creating each directory if it doesn't exist for part in parts: dir_path = os.path.join(dir_path, part) if not os.path.exists(dir_path): os.mkdir(dir_path)This code will create parent_dir, sub_dir, and my_dir if they don't already exist, ensuring that the parent directories are created before the target directory.
However, there's a more concise way to achieve the same goal by using the os.makedirs function, which we'll see in the next section.
Creating Parent Directories Using os.makedirsTo overcome the limitation of os.mkdir, Python provides another function - os.makedirs. This function creates all the intermediate level directories needed to create the final directory. Here's how you can use it:
import os os.makedirs('parent_dir/my_dir')In this case, even if parent_dir doesn't exist, os.makedirs will create it along with my_dir. If parent_dir already exists, os.makedirs will simply create my_dir within it.
Note: Like os.mkdir, os.makedirs will also raise a FileExistsError if the final directory you're trying to create already exists. However, it won't raise an error if the intermediate directories already exist.
Using pathlib to Create DirectoriesThe pathlib module in Python 3.4 and above provides an object-oriented approach to handle filesystem paths. It's more intuitive and easier to read than using os.mkdir or os.makedirs. To create a new directory with pathlib, you can use the Path.mkdir() method.
Here is an example:
from pathlib import Path # Define the path path = Path('/path/to/directory') # Create the directory path.mkdir(parents=True, exist_ok=True)In this code, the parents=True argument tells Python to create any necessary parent directories, and exist_ok=True allows the operation to proceed without raising an exception if the directory already exists.
Handling Exceptions when Creating DirectoriesWhen working with filesystems, it's always a good idea to handle exceptions. This could be due to permissions, the directory already existing, or a number of other unforeseen issues. Here's one way to handle exceptions when creating your directories:
from pathlib import Path # Define the path path = Path('/path/to/directory') try: # Create the directory path.mkdir(parents=True, exist_ok=False) except FileExistsError: print("Directory already exists.") except PermissionError: print("You don't have permissions to create this directory.") except Exception as e: print(f"An error occurred: {e}")In this code, we've set exist_ok=False to raise a FileExistsError if the directory already exists. We then catch this exception, along with PermissionError and any other exceptions, and print a relevant message. This gives us more fine-grained control over what we do when certain situations arise, although it's less concise and hurts readability.
When to use os or pathlib for Creating DirectoriesChoosing between os and pathlib for creating directories largely depends on your specific use case and personal preference.
The os module has been around for a while and is widely used for interacting with the operating system. It's a good choice if you're working with older versions of Python or if you need to use other os functions in your code.
On the other hand, pathlib is a newer module that provides a more intuitive, object-oriented approach to handling filesystem paths. It's a good choice if you're using Python 3.4 or above and prefer a more modern, readable syntax.
Luckily, both os and pathlib are part of the standard Python library, so you won't need to install any additional packages to use them.
ConclusionIn this Byte, we've explored how to create directories and handle exceptions using the os and pathlib modules in Python. Remember that choosing between these two options depends on your specific needs and personal preferences. Always be sure to handle exceptions when working with filesystems to make your code more robust and reliable. This is important as it's easy to make mistakes when working with filesystems and end up with an error.
Stack Abuse: Get All Object Attributes in Python
In Python, everything is an object - from integers and strings to classes and functions. This may seem odd, especially for primitive types like numbers, but even those have attributes, like real and imag. Each object has its own attributes, which are basically juset properties or characteristics that help define the object.
In this Byte, we will explore different ways to get all attributes of an object in Python, and how to display and manipulate them effectively.
Viewing Object AttributesTo start with, let's look at how we can view the attributes of an object in Python. Python provides a built-in function, dir(), which returns a list of all attributes and methods of an object, which also includes those inherited from its class or parent classes.
Consider a simple class, Company, with a few attributes:
class Company: def __init__(self, name, industry, num_employees): self.name = name self.industry = industry self.num_employees = num_employeesNow, let's create an instance of Company and use dir() to get its attributes:
c = Company('Dunder Mifflin', 'paper', 15) print(dir(c))This will output:
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'industry', 'num_employees', 'name']As you can see, dir() returns not only the attributes we defined (i.e. name, industry, num_employees), but also a list of special methods (also known as dunder methods) inherent to all Python objects.
Getting their ValuesNow that we know how to get the attributes of an object, let's see how to also extract their values. Python provides a built-in function, getattr(), which allows us to get the value of a specific attribute.
Here's how you can use getattr():
name = getattr(c, 'name') print(name)This will output:
Dunder MifflinIn this example, getattr() returns the value of the name attribute of the Company instance c. If the attribute does not exist, getattr() will raise an AttributeError. However, you can provide a third argument to getattr(), which will be returned if the attribute is not found, thus avoiding the error:
location = getattr(c, 'location', 'Not available') print(location)This will output:
Not availableIn this case, since location is not an attribute of c, getattr() returns the provided default value, 'Not available'.
Using __dict__ to get Properties and ValuesIn Python, every object is equipped with a __dict__ attribute. This built-in attribute is a dictionary that maps the object's attributes to their respective values. This can be very handy when we want to extract all properties and values of an object. Let's see how it works.
class TestClass: def __init__(self): self.attr1 = 'Hello' self.attr2 = 'World' instance = TestClass() print(instance.__dict__)When you run the above code, it will output:
{'attr1': 'Hello', 'attr2': 'World'}Note: __dict__ does not return methods of an object, only the properties and their values.
Formatting Object Attributes into StringsSometimes you may want to format the attributes of an object into a readable string for display or logging purposes. Python's built-in str function can be overridden in your class to achieve this. Here's how you can do it:
class TestClass: def __init__(self): self.attr1 = 'Hello' self.attr2 = 'World' def __str__(self): return str(self.__dict__) instance = TestClass() print(str(instance))When you run the above code, it will output:
"{'attr1': 'Hello', 'attr2': 'World'}" Employing vars() for Attribute ExtractionAnother way to extract attributes from an object in Python is by using the built-in vars() function. This function behaves very similar to the __dict__ attribute and returns the __dict__ attribute of an object. Here's an example:
class TestClass: def __init__(self): self.attr1 = 'Hello' self.attr2 = 'World' instance = TestClass() print(vars(instance))When you run the above code, it will output:
{'attr1': 'Hello', 'attr2': 'World'}Note: Like __dict__, vars() also does not return methods of an object, only the properties and their values.
ConclusionGetting all of the attributes of an object in Python can be achieved in several ways. Whether you're using dir(), the __dict__ attribute, overriding the str function, or using the vars() function, Python provides a variety of tools to extract and manipulate object attributes.
Stack Abuse: Incompatible Type Comparisons in Python
In Python, we often encounter a variety of errors and exceptions while writing or executing a script. A very common error, especially for beginners, is TypeError: '<' not supported between instances of str and int, or some variant. This error occurs when we try to perform an operation between two incompatible types.
In this article, we'll delve into what this error means, why it happens, and how to resolve it.
Note: There are quite a few different permutations of this error as it can occur between many different data types. I'd suggest looking at the Table of Contents on the right side to more easily find your specific scenario.
Incompatible Type ComparisonsPython is a dynamically typed language, which means the interpreter determines the type of an object at runtime. This flexibility allows us to write code quickly, but it can also lead to certain types of errors if we're not careful.
One of those errors is TypeError: '<' not supported between instances of str and int. This happens when we try to compare a string and an integer using the less than (<) operator. Python doesn't know how to compare these two different types of objects, so it raises a TypeError.
Note: The error could involve any comparison operator, not just the less than (<) operator. For example, you might see a similar error with the > (greater than) operator.
If you're coming to Python from a language like JavaScript, this may take some getting used to. JS will do the conversion for you, without the need for explicit type casting (i.e. convert "2" to the integer 2). It'll even happily compare different types that don't make sense (i.e. "StackAbuse" > 42). So in Python you'll need to remember to convert your data types.
Comparing a String and an IntegerTo illustrate this error, let's try to compare a string and an integer:
print("3" < 2)When you run this code, Python will throw an error:
TypeError: '<' not supported between instances of 'str' and 'int'This error is stating that Python doesn't know how to compare a string ("3") and an integer (2). These are fundamentally different types of objects, and Python doesn't have a built-in way to determine which one is "less than" the other.
Fixing the TypeError with String to Integer ConversionOne way to resolve this error is by ensuring that both objects being compared are of the same type. If we're comparing a string and an integer, we can convert the string to an integer using the int() function:
print(int("3") < 2)Now, when you run this code, Python will output:
FalseBy converting the string "3" to an integer, we've made it possible for Python to compare the two objects. Since 3 is not less than 2, Python correctly outputs False.
The Input Function and its String Return TypeIn Python, the input() function is used to capture user input. The data entered by the user is always returned as a string, even if the user enters a number. Let's see an example:
user_input = input("Enter a number: ") print(type(user_input))If you run this code and enter 123, the output will be:
<class 'str'>This shows that the input() function returns the user input as a string, not an integer. This can lead to TypeError if you try to use the input in a comparison operation with an integer.
Comparing Integers and Strings with Min() and Max() FunctionsThe min() and max() functions in Python are used to find the smallest and largest elements in a collection, respectively. If you try to use these functions on a collection that contains both strings and integers, you'll encounter a TypeError. This is because Python cannot compare these two different types of data.
Here's an example:
values = [10, '20', 30] print(min(values))Again, this will raise the TypeError: '<' not supported between instances of 'str' and 'int' because Python doesn't know how to compare a string to an integer.
Identifying Stored Variable TypesTo avoid TypeError issues, it's crucial to understand the type of data stored in your variables. You can use the type() function to identify the data type of a variable. Here's an example:
value = '10' print(type(value))Running this code will output:
<class 'str'>This shows that the variable value contains a string. Knowing the data type of your variables can help you avoid TypeError issues when performing operations that require specific data types.
Comparing a List and an IntegerIf you try to compare a list and an integer directly, Python will raise a TypeError. Python cannot compare these two different types of data. For example, the following code will raise an error:
numbers = [1, 2, 3] if numbers > 2: print("The list is greater than 2.")When you run this code, you'll get TypeError: '>' not supported between instances of 'list' and 'int'. To compare an integer with the elements in a list, you need to iterate over the list and compare each element individually.
Accessing List Values for ComparisonIn Python, we often need to access individual elements in a list for comparison. This is done by using indices. The index of a list starts from 0 for the first element and increases by one for each subsequent element. Here's an example:
my_list = ['apple', 2, 'orange', 4, 'grape', 6] print(my_list[1]) # Outputs: 2In this case, we are accessing the second element in the list, which is an integer. If we were to compare this with another integer, we would not encounter an error.
Ensuring Value Compatibility for ComparisonIt's essential to ensure that the values you're comparing are compatible. In Python, you cannot directly compare a string with an integer. Doing so will raise a TypeError. If you're unsure of the types of values you're dealing with, it's a good practice to convert them to a common type before comparison. For instance, to compare a string and an integer, you could convert the integer to a string:
str_num = '5' int_num = 10 comparison = str_num < str(int_num) # Converts int_num to string for comparison print(comparison) # Outputs: TrueNote: Be cautious when converting types for comparison. Converting an integer to a string for comparison could lead to unexpected results. For instance, '10' is considered less than '2' in string comparison because the comparison is based on ASCII value, not numerical value.
Filtering Integers in a List for ComparisonIn a list with mixed types, you might want to filter out the integers for comparison. You can do this using list comprehension and the isinstance() function, which checks if a value is an instance of a particular type:
my_list = ['apple', 2, 'orange', 4, 'grape', 6] integers = [i for i in my_list if isinstance(i, int)] print(integers) # Outputs: [2, 4, 6]Now, you can safely compare the integers in the list without worrying about getting an error!
Comparing List Length with an IntegerAnother common operation in Python is comparing the length of a list with an integer. This is done using the len() function, which returns the number of items in a list. Here's an example:
my_list = ['apple', 2, 'orange', 4, 'grape', 6] list_length = len(my_list) print(list_length > 5) # Outputs: TrueIn this case, we're comparing the length of the list (6) with the integer 5. Since 6 is greater than 5, the output is True. No TypeError is raised here because we're comparing two integers.
Comparing List Item Sum with an IntegerIn Python, you can sum the items in a list using the built-in sum() function. This function returns the sum of all items if they are integers or floats. If you then want to compare this sum with an integer, you can do so without any issues. Here's an example:
list_numbers = [1, 2, 3, 4, 5] sum_of_list = sum(list_numbers) print(sum_of_list > 10) # Output: TrueIn this example, sum_of_list is the sum of all items in list_numbers. We then compare this sum with the integer 10.
Comparing a Float and a StringWhen you try to compare a float and a string in Python, you'll encounter the error. This is because Python doesn't know how to compare these two different types. Here's an example:
print(3.14 < "5") # Output: TypeError: '<' not supported between instances of 'float' and 'str'In this example, Python throws a TypeError because it doesn't know how to compare a float (3.14) with a string ("5").
Resolving TypeError with String to Float ConversionTo resolve this issue, you can convert the string to a float using the float() function. This function takes a string or a number and returns a floating point number. Here's how you can use it:
print(3.14 < float("5")) # Output: TrueIn this example, we convert the string "5" to a float using the float() function. We then compare this float with the float 3.14. Since Python now knows how to compare these two floats, it doesn't throw a TypeError.
Note: The float() function can only convert strings that represent a number. If you try to convert a string that doesn't represent a number (like "hello"), Python will throw a ValueError.
Handling TypeError in PandasPandas is a powerful data analysis library in Python that provides flexible data structures. However, you might encounter a TypeError when you try to compare different types in a Pandas DataFrame.
To handle this error, you can use the apply() function to apply a function to each element in a DataFrame column. This function can be used to convert the elements to the correct type. Here's an example:
import pandas as pd df = pd.DataFrame({ 'A': [1, 2, 3], 'B': ['4', '5', '6'] }) df['B'] = df['B'].apply(float) print(df['A'] < df['B']) # Output: 0 True # 1 True # 2 True # dtype: boolIn this example, we use the apply() function to convert the elements in column 'B' to floats. We then compare the elements in column 'A' with the elements in column 'B'. Since all elements are now floats, Python doesn't throw a TypeError.
Comparing Floats and Strings with Min() and Max() FunctionsIn Python, the min() and max() functions are used to find the smallest and largest elements in an iterable, respectively. However, these functions can throw a TypeError if you try to compare a float and a string.
Here's an example:
print(min(3.14, 'pi'))This code will cause a TypeError, because Python cannot compare a float and a string. The error message will be: TypeError: '<' not supported between instances of 'str' and 'float'.
To resolve this, you can convert the float to a string before comparing:
print(min(str(3.14), 'pi'))This will output '3.14', as it's the "smallest" in alphabetical order.
Comparing a Tuple and an IntegerA tuple is an immutable sequence of Python objects. If you try to compare a tuple with an integer, Python will throw a TypeError. Here's an example:
print((1, 2, 3) < 4)This code will cause a TypeError with the message: TypeError: '<' not supported between instances of 'tuple' and 'int' since it doesn't know how to compare one number to a collection of numbers.
Accessing Tuple Values for ComparisonTo compare an integer with a value inside a tuple, you need to access the tuple value first. You can do this by indexing the tuple. Here's how:
my_tuple = (1, 2, 3) print(my_tuple[0] < 4)This will output True, as the first element of the tuple (1) is less than 4.
Filtering a Tuple for ComparisonIf you want to compare all values in a tuple with an integer, you can loop through the tuple and compare each value individually. Here's an example:
my_tuple = (1, 2, 3) for i in my_tuple: print(i < 4)This will output True three times, as all elements in the tuple are less than 4.
Note: Python's filter() function can also be used to filter a tuple based on a comparison with an integer. This function constructs an iterator from elements of the tuple for which the function returns true.
Here's an example of how to use the filter() function to filter a tuple:
my_tuple = (1, 2, 3) filtered_tuple = filter(lambda i: i < 4, my_tuple) print(tuple(filtered_tuple))This will output (1, 2, 3), as all elements in the tuple are less than 4.
Comparing Tuple Length with an IntegerIn Python, we often need to compare the length of a tuple with an integer. This is straightforward and can be done using the len() function, which returns the number of items in an object. Here's how you can do it:
my_tuple = ('apple', 'banana', 'cherry', 'dates') length = len(my_tuple) if length < 5: print('The tuple has less than 5 items.') else: print('The tuple has 5 or more items.')In the above example, the length of my_tuple is 4, so the output will be 'The tuple has less than 5 items.'
Understanding Tuple Construction in PythonTuples are one of Python's built-in data types. They are used to store multiple items in a single variable. Tuples are similar to lists, but unlike lists, tuples are immutable. This means that once a tuple is created, you cannot change its items.
You can create a tuple by placing a comma-separated sequence of items inside parentheses (). Here's an example:
my_tuple = ('apple', 'banana', 'cherry', 'dates') print(my_tuple)In the above example, my_tuple is a tuple containing four items.
Note: A tuple with only one item is called a singleton tuple. You need to include a trailing comma after the item to define a singleton tuple. For example, my_tuple = ('apple',) is a singleton tuple.
Comparing Tuple Item Sum with an IntegerIf your tuple contains numeric data, you might want to compare the sum of its items with an integer. You can do this using the sum() function, which returns the sum of all items in an iterable.
Here's an example:
my_tuple = (1, 2, 3, 4) total = sum(my_tuple) if total < 10: print('The sum of tuple items is less than 10.') else: print('The sum of tuple items is 10 or more.')In the above example, the sum of my_tuple items is 10, so the output will be 'The sum of tuple items is 10 or more.'
Comparing a Method and an IntegerIn Python, a method is a function that is associated with an object. Methods perform specific actions on an object and can also return a value. However, you cannot directly compare a method with an integer. You need to call the method and use its return value for comparison.
Here's an example:
class MyClass: def my_method(self): return 5 my_object = MyClass() if my_object.my_method() < 10: print('The return value of the method is less than 10.') else: print('The return value of the method is 10 or more.')In the above example, the my_method() method of my_object returns 5, so the output will be 'The return value of the method is less than 10.'
Resolving TypeError by Calling the MethodIn Python, methods are objects too. You'll definitely get an error when comparing a method and an integer. This is because Python doesn't know how to compare these two different types of objects. Comparing a method to an integer just doesn't make sense. Let's take a look at an example:
def my_method(): return 5 print(my_method < 10)Output:
TypeError: '<' not supported between instances of 'function' and 'int'To resolve this, we need to call the method instead of comparing the method itself to an integer. Remember, a method needs to be called to execute its function and return a value. We can do this by adding parentheses () after the method name:
def my_method(): return 5 print(my_method() < 10)Output:
TrueNote: The parentheses () are used to call a method in Python. Without them, you are referencing the method object itself, not the value it returns.
ConclusionIn Python, the error message "TypeError: '<' not supported between instances of 'str' and 'int'" is a common error that occurs when you try to compare incompatible types.
This article has walked you through various scenarios where this error may occur and how to resolve it. Understanding these concepts will help you write more robust and error-free code. Remember, when you encounter a TypeError, the key is to identify the types of the objects you are comparing and ensure they are compatible. In some cases, you may need to convert one type to another or access specific values from a complex object before comparison.
PyBites: Make Each Line Count, Keeping Things Simple in Python
A challenge in software development is to keep things simple
For your code to not grow overly complex over time
Simple is better than complex.
Complex is better than complicated.
Simplicity in your code means fewer possibilities for bugs to hide and easier debugging when they do arise
It also makes your code more understandable and maintainable, which is crucial in a team setting or when returning to your code after a period of time.
A good example is (not) using Python built-ins.
Photo by Pablo Arroyo on Unsplash
Given you go from supporting checking if one number is divisible:
def is_divisible(num, divisor): return num % divisor == 0To multiple numbers:
def is_divisible_by_all(num, divisors): for divisor in divisors: if num % divisor != 0: return False return TrueThis is valid and works, but you might write this in a simpler matter using the all() built-in function:
def is_divisible_by_all(num, divisors): return all(num % divisor == 0 for divisor in divisors)Very clean / easy to read
Another example is doing dictionary lookups, checking if the key is in the dictionary:
Complex (unnecessary):
def get_value(dictionary, key): if key in dictionary: return dictionary[key] else: return "Key not found"Better: leverage the fact that Python dicts have a get() method:
dictionary.get(key, "Key not found")Remember, simple code isn’t just about having fewer lines, it’s about being concise, making each line count
This usually means heavily using the Python built-ins and Standard Library
The more you can do with less, the easier your code is to understand and maintain
Code more idiomatically with our collection of 400 Python exercises on our platform