Planet Python

Syndicate content
Planet Python - http://planetpython.org/
Updated: 2 hours 46 min ago

Amit Saha: Linux System Mining with Python

5 hours 8 min ago

In this article, we will explore the Python programming language as a tool to retrieve various information about a system running Linux. Let's get started.

Which Python?

When I refer to Python, I am referring to CPython 2 (2.7 to be exact). I will mention it explicitly when the same code won't work with CPython 3 (3.3) and provide the alternative code, explaining the differences. Just to make sure that you have CPython installed, type python or python3 from the terminal and you should see the Python prompt displayed in your terminal.

Note

Please note that all the programs have their first line as #!/usr/bin/env python meaning that, we want the Python interpreter to execute these scripts. Hence, if you make your script executable using chmod +x your-script.py, you can execute it using ./your-script.py (which is what you will see in this article).

Exploring the platform module

The platform module in the standard library has a number of functions which allow us to retrieve various system information. Let us start the Python interpreter and explore some of them, starting with the platform.uname() function:

>>> import platform >>> platform.uname() ('Linux', 'fedora.echorand', '3.7.4-204.fc18.x86_64', '#1 SMP Wed Jan 23 16:44:29 UTC 2013', 'x86_64')

If you are aware of the uname command on Linux, you will recognize that this function is an interface of sorts to this command. On Python 2, it returns a tuple consisting of the system type (or Kernel type), hostname, version, release, machine hardware and processor information. You can access individual attributes using indices, like so:

>>> platform.uname()[0] 'Linux'

On Python 3, the function returns a named tuple:

>>> platform.uname() uname_result(system='Linux', node='fedora.echorand', release='3.7.4-204.fc18.x86_64', version='#1 SMP Wed Jan 23 16:44:29 UTC 2013', machine='x86_64', processor='x86_64')

Since the returned result is a named tuple, this makes it easy to refer to individual attributes by name rather than having to remember the indices, like so:

>>> platform.uname().system 'Linux'

The platform module also has direct interfaces to some of the above attributes, like so:

>>> platform.system() 'Linux' >>> platform.release() '3.7.4-204.fc18.x86_64'

The linux_distribution() function returns details about the Linux distribution you are on. For example, on a Fedora 18 system, this command returns the following information:

>>> platform.linux_distribution() ('Fedora', '18', 'Spherical Cow')

The result is returned as a tuple consisting of the distribution name, version and the code name. The distributions supported by your particular Python version can be obtained by printing the value of the _supported_dists attribute:

>>> platform._supported_dists ('SuSE', 'debian', 'fedora', 'redhat', 'centos', 'mandrake', 'mandriva', 'rocks', 'slackware', 'yellowdog', 'gentoo', 'UnitedLinux', 'turbolinux')

If your Linux distribution is not one of these (or a derivative of one of these), then you will likely not see any useful information from the above function call.

The final function from the platform module, we will look at is the architecture() function. When you call the function without any arguments, this function returns a tuple consisting of the bit architecture and the executable format of the Python executable, like so:

>>> platform.architecture() ('64bit', 'ELF')

On a 32-bit Linux system, you would see:

>>> platform.architecture() ('32bit', 'ELF')

You will get similar results if you specify any other executable on the system, like so:

>>> platform.architecture(executable='/usr/bin/ls') ('64bit', 'ELF')

You are encouraged to explore other functions of the platform module which among others, allow you to find the current Python version you are running. If you are keen to know how this module retrieves this information, the Lib/platform.py file in the Python source directory is where you should look into.

The os and sys modules are also of interest to retrieve certain system attributes such as the native byteorder. Next, we move beyond the Python standard library modules to explore some generic approaches to access the information on a Linux system made available via the proc and sysfs file systems. It is to be noted that the information made available via these filesystems will vary between various hardware architectures and hence you should keep that in mind while reading this article and also writing scripts which attempt to retrieve information from these files.

CPU Information

The file /proc/cpuinfo contains information about the processing units on your system. For example, here is a Python version of what the Linux command cat /proc/cpuinfo would do:

#! /usr/bin/env python """ print out the /proc/cpuinfo file """ from __future__ import print_function with open('/proc/cpuinfo') as f: for line in f: print(line.rstrip('\n'))

When you execute this program either using Python 2 or Python 3, you should see all the contents of /proc/cpuinfo dumped on your screen (In the above program, the rstrip() method removes the trailing newline character from the end of each line).

The next code listing uses the startswith() string method to display the models of your processing units:

#! /usr/bin/env python """ Print the model of your processing units """ from __future__ import print_function with open('/proc/cpuinfo') as f: for line in f: # Ignore the blank line separating the information between # details about two processing units if line.strip(): if line.rstrip('\n').startswith('model name'): model_name = line.rstrip('\n').split(':')[1] print(model_name)

When you run this program, you should see the model names of each of your processing units. For example, here is what I see on my computer:

Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz

We have so far seen a couple of ways to find the architecture of the computer system we are on. To be technically correct, both those approaches actually report the architecture of the kernel your system is running. So, if your computer is actually a 64-bit computer, but is running a 32-bit kernel, then the above methods will report it as having a 32-bit architecture. To find the true architecture of the computer you can look for the lm flag in the list of flags in /proc/cpuinfo. The lm flag stands for long mode and is only present on computers with a 64-bit architecture. The next program shows how you can do this:

#! /usr/bin/env python """ Find the real bit architecture """ from __future__ import print_function with open('/proc/cpuinfo') as f: for line in f: # Ignore the blank line separating the information between # details about two processing units if line.strip(): if line.rstrip('\n').startswith('flags') \ or line.rstrip('\n').startswith('Features'): if 'lm' in line.rstrip('\n').split(): print('64-bit') else: print('32-bit')

As we have seen so far, it is possible to read the /proc/cpuinfo and use simple text processing techniques to read the data we are looking for. To make it friendlier for other programs to use this data, it is perhaps a better idea to make the contents of /proc/cpuinfo available as a standard data structure, such as a dictionary. The idea is simple: if you see the contents of this file, you will find that for each processing unit, there are a number of key, value pairs (in an earlier example, we printed the model name of the processor, here model name was a key). The information about different processing units are separated from each other by a blank line. It is simple to build a dictionary structure which has each of the processing unit's data as keys. For each of the these keys, the value is all the information about the corresponding processing unit present in the file /proc/cpuinfo. The next listing shows how you can do so.

#!/usr/bin/env/ python """ /proc/cpuinfo as a Python dict """ from __future__ import print_function from collections import OrderedDict import pprint def cpuinfo(): ''' Return the information in /proc/cpuinfo as a dictionary in the following format: cpu_info['proc0']={...} cpu_info['proc1']={...} ''' cpuinfo=OrderedDict() procinfo=OrderedDict() nprocs = 0 with open('/proc/cpuinfo') as f: for line in f: if not line.strip(): # end of one processor cpuinfo['proc%s' % nprocs] = procinfo nprocs=nprocs+1 # Reset procinfo=OrderedDict() else: if len(line.split(':')) == 2: procinfo[line.split(':')[0].strip()] = line.split(':')[1].strip() else: procinfo[line.split(':')[0].strip()] = '' return cpuinfo if __name__=='__main__': cpuinfo = cpuinfo() for processor in cpuinfo.keys(): print(cpuinfo[processor]['model name'])

This code uses an OrderedDict (Ordered dictionary) instead of a usual dictionary so that the key and values are stored in the order which they are found in the file. Hence, the data for the first processing unit is followed by the data about the second processing unit and so on. If you call this function, it returns you a dictionary. The keys of dictionary are each processing unit with. You can then use to sieve for the information you are looking for (as demonstrated in the if __name__=='__main__' block). The above program when run will once again print the model name of each processing unit (as indicated by the statement print(cpuinfo[processor]['model name']):

Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz Memory Information

Similar to /proc/cpuinfo, the file /proc/meminfo contains information about the main memory on your computer. The next program creates a dictionary from the contents of this file and dumps it.

#!/usr/bin/env python from __future__ import print_function from collections import OrderedDict def meminfo(): ''' Return the information in /proc/meminfo as a dictionary ''' meminfo=OrderedDict() with open('/proc/meminfo') as f: for line in f: meminfo[line.split(':')[0]] = line.split(':')[1].strip() return meminfo if __name__=='__main__': #print(meminfo()) meminfo = meminfo() print('Total memory: {0}'.format(meminfo['MemTotal'])) print('Free memory: {0}'.format(meminfo['MemFree']))

As earlier, you could also access any specific information you are looking for by using that as a key (shown in the if __name__==__main__ block). When you execute the program, you should see an output similar to the following:

Total memory: 7897012 kB Free memory: 249508 kB Network Statistics

Next, we explore the network devices on our computer system. We will retrieve the network interfaces on the system and the data bytes sent and recieved by them since your system reboot. The /proc/net/dev file makes this information available. If you examine the contents of this file, you will notice that the first two lines contain header information - i.e. the first column of this file is the network interface name, the second and the third columns display information about the received and the transmitted bytes (such as total bytes sent, number of packets, errors, etc.). Our interest here is to extract the total data sent and recieved by the different network devices. The next listing shows how we can extract this information from /proc/net/dev:

#!/usr/bin/env python from __future__ import print_function from collections import namedtuple def netdevs(): ''' RX and TX bytes for each of the network devices ''' with open('/proc/net/dev') as f: net_dump = f.readlines() device_data={} data = namedtuple('data',['rx','tx']) for line in net_dump[2:]: line = line.split(':') if line[0].strip() != 'lo': device_data[line[0].strip()] = data(float(line[1].split()[0])/(1024.0*1024.0), float(line[1].split()[8])/(1024.0*1024.0)) return device_data if __name__=='__main__': netdevs = netdevs() for dev in netdevs.keys(): print('{0}: {1} MiB {2} MiB'.format(dev, netdevs[dev].rx, netdevs[dev].tx))

When you run the above program, the output should display your network devices along with the total recieved and transmitted data in MiB since your last reboot as shown below:

em1: 0.0 MiB 0.0 MiB wlan0: 2651.40951061 MiB 183.173976898 MiB

You could probably couple this with a persistent data storage mechanism to write your own data usage monitoring program.

Processes

The /proc directory also contains a directory each for all the running processes. The directory names are the same as the process IDs for these processes. Hence, if you scan /proc for all directories which have digits as their names, you will have a list of process IDs of all the currently running processes. The function process_list() in the next listing returns a list with process IDs of all the currently running processes. The length of this list will hence be the total number of processes running on the system as you will see when you execute the above program.

#!/usr/bin/env python """ List of all process IDs currently active """ from __future__ import print_function import os def process_list(): pids = [] for subdir in os.listdir('/proc'): if subdir.isdigit(): pids.append(subdir) return pids if __name__=='__main__': pids = process_list() print('Total number of running processes:: {0}'.format(len(pids)))

The above program when executed will show an output similar to:

Total number of running processes:: 229

Each of the process directories contain number of other files and directories which contain various information about the invoking command of the process, the shared libraries its using, and others.

#!/usr/bin/env python """ Python interface to the /proc file system. Although this can be used as a replacement for cat /proc/... on the command line, its really aimed to be an interface to /proc for other Python programs. As long as the object you are looking for exists in /proc and is readable (you have permission and if you are reading a file, its contents are alphanumeric, this program will find it). If its a directory, it will return a list of all the files in that directory (and its sub-dirs) which you can then read using the same function. Example usage: Read /proc/cpuinfo: $ ./readproc.py proc.cpuinfo Read /proc/meminfo: $ ./readproc.py proc.meminfo Read /proc/cmdline: $ ./readproc.py proc.cmdline Read /proc/1/cmdline: $ ./readproc.py proc.1.cmdline Read /proc/net/dev: $ ./readproc.py proc.net.dev Comments/Suggestions: Amit Saha <@echorand> <http://echorand.me> """ from __future__ import print_function import os import sys import re def toitem(path): """ Convert /foo/bar to foo.bar """ path = path.lstrip('/').replace('/','.') return path def todir(item): """ Convert foo.bar to /foo/bar""" # TODO: breaks if there is a directory whose name is foo.bar (for # eg. conf.d/), but we don't have to worry as long as we are using # this for reading /proc return '/' + item.replace('.','/') def readproc(item): """ Resolves proc.foo.bar items to /proc/foo/bar and returns the appropriate data. 1. If its a file, simply return the lines in this file as a list 2. If its a directory, return the files in this directory in the proc.foo.bar style as a list, so that this function can then be called to retrieve the contents """ item = todir(item) if not os.path.exists(item): return 'Non-existent object' if os.path.isfile(item): # its a little tricky here. We don't want to read huge binary # files and return the contents. We will probably not need it # in the usual case. # utilities like 'file' on Linux and the Python interface to # libmagic are useless when it comes to files in /proc for # detecting the mime type, since the these are not on-disk # files. # Searching, i find this solution which seems to be a # reasonable assumption. If we find a '\0' in the first 1024 # bytes of a file, we declare it as binary and return an empty string # however, some of the files in /proc which contain text may # also contain the null byte as a constituent character. # Hence, I use a RE expression that matches against any # combination of alphanumeric characters # If any of these conditions suffice, we read the file's contents pattern = re.compile('\w*') try: with open(item) as f: chunk = f.read(1024) if '\0' not in chunk or pattern.match(chunk) is not None: f.seek(0) data = f.readlines() return data else: return '{0} is binary'.format(item) except IOError: return 'Error reading object' if os.path.isdir(item): data = [] for dir_path, dir_name, files in os.walk(item): for file in files: data.append(toitem(os.path.join(dir_path, file))) return data if __name__=='__main__': if len(sys.argv)>1: data = readproc(sys.argv[1]) else: data = readproc('proc') if type(data) == list: for line in data: print(line) else: print(data) Block devices

The next program lists all the block devices by reading from the sysfs virtual file system. The block devices on your system can be found in the /sys/block directory. Thus, you may have directories such as /sys/block/sda, /sys/block/sdb and so on. To find all such devices, we perform a scan of the /sys/block directory using a simple regular expression to express the block devices we are interested in finding.

#!/usr/bin/env python """ Read block device data from sysfs """ from __future__ import print_function import glob import re import os # Add any other device pattern to read from dev_pattern = ['sd.*','mmcblk*'] def size(device): nr_sectors = open(device+'/size').read().rstrip('\n') sect_size = open(device+'/queue/hw_sector_size').read().rstrip('\n') # The sect_size is in bytes, so we convert it to GiB and then send it back return (float(nr_sectors)*float(sect_size))/(1024.0*1024.0*1024.0) def detect_devs(): for device in glob.glob('/sys/block/*'): for pattern in dev_pattern: if re.compile(pattern).match(os.path.basename(device)): print('Device:: {0}, Size:: {1} GiB'.format(device, size(device))) if __name__=='__main__': detect_devs()

If you run this program, you will see output similar to as follows:

Device:: /sys/block/sda, Size:: 465.761741638 GiB Device:: /sys/block/mmcblk0, Size:: 3.70703125 GiB

When I run the program, I had a SD memory card plugged in as well and hence you can see that the program detects it. You can extend this program to recognize other block devices (such as virtual hard disks) as well.

Building command line utilities

One ubiquitious part of all Linux command line utilities is that they allow the user to specify command line arguments to customise the default behavior of the program. The argparse module allows your program to have an interface similar to built-in Linux utilities. The next listing shows a program which retrieves all the users on your system and prints their login shells (using the pwd standard library module):

#!/usr/bin/env python """ Print all the users and their login shells """ from __future__ import print_function import pwd # Get the users from /etc/passwd def getusers(): users = pwd.getpwall() for user in users: print('{0}:{1}'.format(user.pw_name, user.pw_shell)) if __name__=='__main__': getusers()

When run the program above, it will print all the users on your system and their login shells.

Now, let us say that you want the program user to be able to choose whether he or she wants to see the system users (like daemon, apache). We will see a first use of the argparse module to implement this feature in by extending the previous listing as follows.

#!/usr/bin/env python """ Utility to play around with users and passwords on a Linux system """ from __future__ import print_function import pwd import argparse import os def read_login_defs(): uid_min = None uid_max = None if os.path.exists('/etc/login.defs'): with open('/etc/login.defs') as f: login_data = f.readlines() for line in login_data: if line.startswith('UID_MIN'): uid_min = int(line.split()[1].strip()) if line.startswith('UID_MAX'): uid_max = int(line.split()[1].strip()) return uid_min, uid_max # Get the users from /etc/passwd def getusers(no_system=False): uid_min, uid_max = read_login_defs() if uid_min is None: uid_min = 1000 if uid_max is None: uid_max = 60000 users = pwd.getpwall() for user in users: if no_system: if user.pw_uid >= uid_min and user.pw_uid <= uid_max: print('{0}:{1}'.format(user.pw_name, user.pw_shell)) else: print('{0}:{1}'.format(user.pw_name, user.pw_shell)) if __name__=='__main__': parser = argparse.ArgumentParser(description='User/Password Utility') parser.add_argument('--no-system', action='store_true',dest='no_system', default = False, help='Specify to omit system users') args = parser.parse_args() getusers(args.no_system)

On executing the above program with the --help option, you will see a nice help message with the available options (and what they do):

$ ./getusers.py --help usage: getusers.py [-h] [--no-system] User/Password Utility optional arguments: -h, --help show this help message and exit --no-system Specify to omit system users

An example invocation of the above program is as follows:

$ ./getusers.py --no-system gene:/bin/bash

When you pass an invalid parameter, the program complains:

$ ./getusers.py --param usage: getusers.py [-h] [--no-system] getusers.py: error: unrecognized arguments: --param

Let us try to understand in brief how we used argparse in the above program. The statement: parser = argparse.ArgumentParser(description='User/Password Utility') creates a new ArgumentParser object with an optional description of what this program does.

Then, we add the arguments that we want the program to recognize using the add_argument() method in the next statement: parser.add_argument('--no-system', action='store_true', dest='no_system', default = False, help='Specify to omit system users'). The first argument to this method is the name of the option that the program user will supply as an argument while invoking the program, the next parameter action=store_true indicates that this is a boolean option. That is, its presence or absence affects the program behavior in some way. The dest parameter specifies the variable in which the value that the value of this option will be available to the program. If this option is not supplied by the user, the default value is False which is indicated by the parameter default = False and the last parameter is the help message that the program displays about this option. Finally, the arguments are parsed using the parse_args() method: args = parser.parse_args(). Once the parsing is done, the values of the options supplied by the user can be retrieved using the syntax args.option_dest, where option_dest is the dest variable that you specified while setting up the arguments. This statement: getusers(args.no_system) calls the getusers() function with the option value for no_system supplied by the user.

The next program shows how you can specify options which allow the user to specify non-boolean preferences to your program. This program is a rewrite of Listing 6, with the additional option to specify the network device you may be interested in.

#!/usr/bin/env python from __future__ import print_function from collections import namedtuple import argparse def netdevs(iface=None): ''' RX and TX bytes for each of the network devices ''' with open('/proc/net/dev') as f: net_dump = f.readlines() device_data={} data = namedtuple('data',['rx','tx']) for line in net_dump[2:]: line = line.split(':') if not iface: if line[0].strip() != 'lo': device_data[line[0].strip()] = data(float(line[1].split()[0])/(1024.0*1024.0), float(line[1].split()[8])/(1024.0*1024.0)) else: if line[0].strip() == iface: device_data[line[0].strip()] = data(float(line[1].split()[0])/(1024.0*1024.0), float(line[1].split()[8])/(1024.0*1024.0)) return device_data if __name__=='__main__': parser = argparse.ArgumentParser(description='Network Interface Usage Monitor') parser.add_argument('-i','--interface', dest='iface', help='Network interface') args = parser.parse_args() netdevs = netdevs(iface = args.iface) for dev in netdevs.keys(): print('{0}: {1} MiB {2} MiB'.format(dev, netdevs[dev].rx, netdevs[dev].tx))

When you execute the program without any arguments, it behaves exactly as the earlier version. However, you can also specify the network device you may be interested in. For example:

$ ./net_devs_2.py em1: 0.0 MiB 0.0 MiB wlan0: 146.099492073 MiB 12.9737148285 MiB virbr1: 0.0 MiB 0.0 MiB virbr1-nic: 0.0 MiB 0.0 MiB $ ./net_devs_2.py --help usage: net_devs_2.py [-h] [-i IFACE] Network Interface Usage Monitor optional arguments: -h, --help show this help message and exit -i IFACE, --interface IFACE Network interface $ ./net_devs_2.py -i wlan0 wlan0: 146.100307465 MiB 12.9777050018 MiB System-wide availability of your scripts

With the help of this article, you may have been able to write one or more useful scripts for yourself which you want to use everyday like any other Linux command. The easiest way to do is make this script executable and setup a BASH alias to this script. You could also remove the .py extension and place this file in a standard location such as /usr/local/sbin.

Other useful standard library modules

Besides the standard library modules we have already looked at in this article so far, there are number of other standard modules which may be useful: subprocess, ConfigParser, readline and curses.

What next?

At this stage, depending on your own experience with Python and exploring Linux internals, you may follow one of the following paths. If you have been writing a lot of shell scripts/command pipelines to explore various Linux internals, take a look at Python. If you wanted a easier way to write your own utility scripts for performing various tasks, take a look at Python. Lastly, if you have been using Python for programming of other kinds on Linux, have fun using Python for exploring Linux internals.

Resources Python resources System Information
Categories: FLOSS Project Planets

Techiediaries - Django: Handling CORS in Django REST Framework

Sun, 2018-01-21 19:00

If you are building applications with Django and modern front-end/JavaScript technologies such as Angular, React or Vue, chances are that you are using two development servers for the back-end server (running at the port 8000) and a development server (Webpack) for your front-end application.

When sending HTTP requests from your front-end application, using the browser's fetch API, the Axios client or the jQuery $.ajax() method (a wrapper for the JavaScript XHR interface), to your back-end API built with Django REST framework the web browser will throw an error related to the Same Origin Policy.

Cross Origin Resource Sharing or CORS allows client applications to interface with APIs hosted on different domains by enabling modern web browsers to bypass the Same origin Policy which is enforced by default.

CORS enables you to add a set of headers that tell the web browser if it's allowed to send/receive requests from domains other than the one serving the page.

You can enable CORS in Django REST framework by using a custom middleware or better yet using the django-cors-headers package

Using a Custom Middleware

First create a Django application:

python manage.py startapp app

Next you need to add a middleware file app/cors.py:

class CorsMiddleware(object): def process_response(self, req, resp): response["Access-Control-Allow-Origin"] = "*" return response

This will add an Access-Control-Allow-Origin:* header to every Django request but before that you need to add it to the list of middleware classes:

MIDDLEWARE_CLASSES = ( #... 'app.CorsMiddleware' )

That's it you have now enabled CORS in your Django backend. You can configure this middlware to add more fine grained options or you can use the well tested package django-cors-headers which works great with Django REST framework.

Using django-cors-headers

Start by installing django-cors-headers using pip

pip install django-cors-headers

You need to add it to your project settings.py file:

INSTALLED_APPS = ( ##... 'corsheaders' )

Next you need to add corsheaders.middleware.CorsMiddleware middleware to the middleware classes in settings.py

MIDDLEWARE_CLASSES = ( 'corsheaders.middleware.CorsMiddleware', 'django.middleware.common.BrokenLinkEmailsMiddleware', 'django.middleware.common.CommonMiddleware', #... )

You can then, either enable CORS for all domains by adding the following setting

CORS_ORIGIN_ALLOW_ALL = True

Or Only enable CORS for specified domains:

CORS_ORIGIN_ALLOW_ALL = False CORS_ORIGIN_WHITELIST = ( 'http//:localhost:8000', )

You can find more configuration options from the docs.

Conclusion

In this tutorial we have seen how to enable CORS headers in your Django REST framework back-end using a custom CORS middleware or the django-cors-headers package.

Categories: FLOSS Project Planets

Sandipan Dey: Recursive Graphics, Bilinear Interpolation and Image Transformation in Python

Sun, 2018-01-21 15:52
The following problem appeared in an assignment in the Princeton course COS 126 . The problem description is taken from the course itself. Recursive Graphics Write a program that plots a Sierpinski triangle, as illustrated below. Then develop a program that plots a recursive patterns of your own design. Part 1.  The Sierpinski triangle is an example of … Continue reading Recursive Graphics, Bilinear Interpolation and Image Transformation in Python
Categories: FLOSS Project Planets

Artem Golubin: Understanding internals of Python classes

Sun, 2018-01-21 10:14

The goal of this series is to describe internals and general concepts behind the class object in Python 3.6. In this part, I will explain how Python stores and lookups attributes. I assume that you already have a basic understanding of object-oriented concepts in Python.

Let's start with a simple class:

class Vehicle: kind = 'car' def __init__(self, manufacturer, model): self.manufacturer = manufacturer self.model_name = model @property def name(self): return "%s %s" %
Categories: FLOSS Project Planets

Python Does What?!: None on the left

Sun, 2018-01-21 05:00
A natural default, None is probably the most commonly assigned value in Python. But what happens if you move it to the left side of that equation?

In Python 2:
>>> None = 2
File "<stdin>", line 1
SyntaxError: cannot assign to None
This is similar to what happens when you assign to a literal:
>>> 1 = 2
File "<stdin>", line 1
SyntaxError: can't assign to literal
In Python 3 this walk on the wild side will get you a slightly different error:
>>> None = 1
File "<stdin>", line 1
SyntaxError: can't assign to keyword
None has graduated from useful snowflake to full-blown keyword!
Categories: FLOSS Project Planets

Import Python: #159: How to speed up Python application startup time, Hunting Memory Leaks and more

Sun, 2018-01-21 04:34
Worthy Read
Optimize Utilization with GoCD’s Elastic Agents GoCD is a continuous delivery tool specializing in advanced workflow modeling and dependency management. Our new AWS ECS elastic agents extension now allows you to scale up with on-demand agents based on your need. Try it now!
GoCD, advert
How to speed up Python application startup time? Python 3.7 has new feature to show time for importing modules. This feature is enabled with -X importtime option or PYTHONPROFILEIMPORTTIME environment variable.
processing time
Using textual analysis to quantify a cast of characters If you’ve ever worked on a text and wished you could get a list of characters or see how many times each character was mentioned, this is the tutorial for you.
NLTK
Hunting for memory leaks in asyncio applications. Sailing into the last two weeks of 2017 that I fully intended to spend experimenting with various eggnog recipes. I was alerted by our DevOps team that our asyncio app was consuming 10GB of memory. That is approximately 100 times more than it should!
memory leaks, async
The Industry’s Fastest eSignature API Integration Embed docs directly on your website with a few lines of code. Test the API for free.
advert
DjangoCon JP 2018 DjangoCon JP is a conference for the Django Web framework in Japan. If you're a seasoned Django pro or just starting, DjangoCon JP is for you. Our goal is for atendees to meet, talk, share tips, discover new ways to use Django, and, most importantly, have FUN.
conference
The flat success path If you want to write clear and easy to understand software, make sure it has a single success path. A 'single success path' means a few things. First, it means that any given function/method/procedure should have a single clear purpose.
code-quality
Normalizing Flows Tutorial, Part 1: Distributions and Determinants This series is written for an audience with a rudimentary understanding of linear algebra, probability, neural networks, and TensorFlow. Knowledge of recent advances in Deep Learning, generative models will be helpful in understanding the motivations and context underlying these techniques, but they are not necessary.
tensorflow
A GPU ready Docker container for OpenAI Gym Development with TensorFlow So, you want to write an agent, competing in the OpenAI Gym, you want to use Keras or TensorFlow or something similar and you don’t want everything installed on your workstation? You have come to the right place!
docker, tensorflow
Check your balance on Coinbase using Python Even though Coinbase has a mobile application so you’re able to check your balance on the go, I prefer using their API instead so I can setup custom alerts not available on their platform.
coinbase
Using bower to manage static files with Django Sharing a way to manage libraries like bootstrap, jquery with bower without using any external app.
django
Automatic model selection: H2O AutoML In this post, we will use H2O AutoML for auto model selection and tuning. This is an easy way to get a good tuned model with minimal effort on the model selection and parameter tuning side.
modeling
Logistic regression in Python sklearn

Projects
SimpleCoin - 209 Stars, 20 Fork Just a really simple, insecure and incomplete implementation of a blockchain for a cryptocurrency made in Python as educational material. In other words, a simple Bitcoin clone.
languagecrunch - 136 Stars, 8 Fork LanguageCrunch NLP server docker image.
howtopython.org - 86 Stars, 16 Fork A (book, website) that decribes how to Python, from scratch.
unimatrix - 83 Stars, 4 Fork Python script to simulate the display from "The Matrix" in terminal. Uses half-width katakana unicode characters by default, but can use custom character sets. Accepts keyboard controls while running. Based on CMatrix.
spacy-lookup - 32 Stars, 1 Fork Named Entity Recognition based on dictionaries.
simpledb - 14 Stars, 0 Fork miniature redis-like server implemented in Python.
python-bigone - 10 Stars, 1 Fork BigONE Exchange API python implementation for automated trading.
django-multiple-user-types-example - 10 Stars, 1 Fork Django Quiz Application
spotify-lyrics-cli - 9 Stars, 0 Fork Automatically get lyrics for the song currently playing in Spotify from command line.
aws-security-checks - 7 Stars, 0 Fork AWS Security Checks.
auditor - 5 Stars, 0 Fork Script for tracking file system changes.
sgqlc - 5 Stars, 1 Fork Simple GraphQL Client.
pyfakers - 4 Stars, 0 Fork py-fake-rs: a fake data generator for python, backed by fake-rs in rust.
shellson - 3 Stars, 1 Fork JSON command line parser.
django-qsessions - 3 Stars, 0 Fork Extends Django's cached_db session backend.
Categories: FLOSS Project Planets

Techiediaries - Django: Handling CORS in Express 4

Sat, 2018-01-20 19:00

CORS stands for Cross Origin Resource Sharing and allows modern web browsers to be able to send AJAX requests and receive HTTP responses for resource from other domains other that the domain serving the client side application.

If you have ever been developing an application which is making XHR requests to a cross-domain origin and getting an error like the following in your browser console?

XMLHttpRequest cannot load XXX. No 'Access-Control-Allow-Origin' header is present on the requested resource. Origin XXX is therefore not allowed access. The response had HTTP status code 500.

Your web browser is simply informing you that your web server is not sending back the headers that allow CORS i.e Access-Control-Allow-Origin and Access-Control-Allow-Methods

So in this tutorial you'll learn how to enable CORS in your Express 4 server to enable your front-end application to bypass the Same Origin Policy enforced by modern web browsers. This is particularly useful when you are locally developing your application, since in many cases you'll have two running development servers (front-end and back-end servers) in different ports, or if you want to enable resource sharing between different domains/hosts.

How to enable CORS in Express 4

There are many ways that you can use to enable CORS in Express.

If you are locally developing your application and want a quick way to CORS then you can simply use a middleware with a few lines of code:

var express = require('express'); var server = express(); server.use(bodyParser.urlencoded({extended: true})) server.use(bodyParser.json()) server.use(function(req, res, next) { res.header("Access-Control-Allow-Origin", "*"); res.header("Access-Control-Allow-Headers", "Origin, X-Requested-With, Content-Type, Accept"); next(); }); server.get('/endpoint', function (req, res, next) { res.json({msg: 'This is CORS-enabled for all origins!'}) }) server.listen(3000, () => { console.log('Listenning at http://localhost:3000' ) })

The wildcard * allows resources to be accessed from any origin.

That's it you can now send requests from any origin without getting the same origin policy problems.

You can also use fine grained options without having to deal with HTTP header names for CORS by using the CORS module installed from npm.

Using the CORS Module

Head over to your terminal and install:

npm install --save cors

You can then use it as a middleware

var express = require('express'); var server = express(); var cors = require('cors'); server.use(bodyParser.urlencoded({extended: true})) server.use(bodyParser.json()) server.get('/endpoint', function (req, res, next) { res.json({msg: 'This is CORS-enabled for all origins!'}) }) server.use(cors()); server.listen(3000, () => { console.log('Listenning at http://localhost:3000' ) })

This is equivalent to our previous example and allows resources to be accessed from any origin by adding the Access-Control-Allow-Origin: * header to all requests.

Controlling Allowed Hosts

When your are in production you don't want to allow CORS access for all origins but if you need to allow cross origin requests from some specified host(s) you can do add the following code:

server.use(cors({ origin: 'https://techiediaries.com' }));

This wil allow https://techiediaries.com to send cross origin requests to your Express server without the Same Origin Policy getting in the way.

You can also enable CORS for a single Express route

server.get('/endpoint', cors(), function (req, res, next) { res.json({msg: 'This has CORS-enabled for only this route: /endpoint'}) }) Allowing Dynamic/Multiple Origins

If you want to allow multiple origins you need to use a function (for origin instead of a string) that dynamically set the CORS header depending on the origin making the request and a white list that you specify which contains the origin to allow.

var express = require('express') var cors = require('cors') var server = express() var whitelist = ['http://techiediaries.com', 'http://othersite.com'] var options = { origin: function (origin, callback) { if (whitelist.indexOf(origin) !== -1) { callback(null, true) } else { callback(new Error('Not allowed by CORS')) } } } server.use(cors(options)) server.get('endpoint', function (req, res, next) { res.json({msg: 'This has CORS enabled'}) }) server.listen(3000, () => { console.log('Listenning at http://localhost:3000' ) }) Conclusion

In this tutorial we have seen some useful options for adding CORS headers to your web application, developed with Node.js and Express 4, which is particularly useful for development applications with separate front-end and back-end apps or if you want to be able to share resources (via API requests) across many domains.

Categories: FLOSS Project Planets

Python Data: Local Interpretable Model-agnostic Explanations – LIME in Python

Sat, 2018-01-20 14:57

When working with classification and/or regression techniques, its always good to have the ability to ‘explain’ what your model is doing. Using Local Interpretable Model-agnostic Explanations (LIME), you now have the ability to quickly provide visual explanations of your model(s).

Its quite easy to throw numbers or content into an algorithm and get a result that looks good. We can test for accuracy and feel confident that the classifier and/or model is ‘good’…but can we describe what the model is actually doing to other users? A good data scientist spends some of their time making sure they have reasonable explanations for what the model is doing and why the results are what they are.

There’s always been a focus on ‘trust’ in any type of modeling methodology but with machine learning and deep learning, many people feel like the black-box approach taken with these methods isn’t as trustworthy as other methods.  This topic was addressed in a paper titled Why Should I Trust You?”: Explaining the Predictions of Any Classifier, which proposes the concept of Local Interpretable Model-agnostic Explanations (LIME). According to the paper, LIME is ‘an algorithm that can explain the predictions of any classifier or regressor in a faithful way, by approximating it locally with an interpretable model.’

I’ve used the LIME approach a few times in recent projects and really like the idea. It breaks down the modeling / classification techniques and output into a form that can be easily described to non-technical people.  That said, LIME isn’t a replacement for doing your job as a data scientist, but it is another tool to add to your toolbox.

To implement LIME in python, I use this LIME library written / released by one of the authors the above paper.

I thought it might be good to provide a quick run-through of how to use this library. For this post, I’m going to mimic “Using lime for regression” notebook the authors provide, but I’m going to provide a little more explanation.

The full notebook is available in my repo here.

Getting started with Local Interpretable Model-agnostic Explanations (LIME)

Before you get started, you’ll need to install Lime.

pip install lime

Next, let’s import our required libraries.

from sklearn.datasets import load_boston import sklearn.ensemble import numpy as np from sklearn.model_selection import train_test_split import lime import lime.lime_tabular

Let’s load the sklearn dataset called ‘boston’. This data is a dataset that contains house prices that is often used for machine learning regression examples.

boston = load_boston()

Before we do much else, let’s take a look at the description of the dataset to get familiar with it.  You can do this by running the following command:

print boston['DESCR']

The output is:

Boston House Prices dataset =========================== Notes ------ Data Set Characteristics: :Number of Instances: 506 :Number of Attributes: 13 numeric/categorical predictive :Median Value (attribute 14) is usually the target :Attribute Information (in order): - CRIM per capita crime rate by town - ZN proportion of residential land zoned for lots over 25,000 sq.ft. - INDUS proportion of non-retail business acres per town - CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise) - NOX nitric oxides concentration (parts per 10 million) - RM average number of rooms per dwelling - AGE proportion of owner-occupied units built prior to 1940 - DIS weighted distances to five Boston employment centres - RAD index of accessibility to radial highways - TAX full-value property-tax rate per $10,000 - PTRATIO pupil-teacher ratio by town - B 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town - LSTAT % lower status of the population - MEDV Median value of owner-occupied homes in $1000's :Missing Attribute Values: None :Creator: Harrison, D. and Rubinfeld, D.L. This is a copy of UCI ML housing dataset. http://archive.ics.uci.edu/ml/datasets/Housing This dataset was taken from the StatLib library which is maintained at Carnegie Mellon University. The Boston house-price data of Harrison, D. and Rubinfeld, D.L. 'Hedonic prices and the demand for clean air', J. Environ. Economics & Management, vol.5, 81-102, 1978. Used in Belsley, Kuh & Welsch, 'Regression diagnostics ...', Wiley, 1980. N.B. Various transformations are used in the table on pages 244-261 of the latter. The Boston house-price data has been used in many machine learning papers that address regression problems. **References** - Belsley, Kuh & Welsch, 'Regression diagnostics: Identifying Influential Data and Sources of Collinearity', Wiley, 1980. 244-261. - Quinlan,R. (1993). Combining Instance-Based and Model-Based Learning. In Proceedings on the Tenth International Conference of Machine Learning, 236-243, University of Massachusetts, Amherst. Morgan Kaufmann. - many more! (see http://archive.ics.uci.edu/ml/datasets/Housing)

Now that we have our data loaded, we want to build a regression model to forecast boston housing prices. We’ll use random forest for this to follow the example by the authors.

First, we’ll set up the RF Model and then create our training and test data using the train_test_split module from sklearn. Then, we’ll fit the data.

rf = sklearn.ensemble.RandomForestRegressor(n_estimators=1000) train, test, labels_train, labels_test = train_test_split(boston.data, boston.target, train_size=0.80) rf.fit(train, labels_train)

Now that we have a Random Forest Regressor trained, we can check some of the accuracy measures.

print('Random Forest MSError', np.mean((rf.predict(test) - labels_test) ** 2))

Tbe MSError is: 10.45. Now, let’s look at the MSError when predicting the mean.

print('MSError when predicting the mean', np.mean((labels_train.mean() - labels_test) ** 2))

From this, we get 80.09.

Without really knowing the dataset, its hard to say whether they are good or bad.  Since we are really most interested in looking at the LIME approach, we’ll move along and assume these are decent errors.

To implement LIME, we need to get the categorical features from our data and then build an ‘explainer’. This is done with the following commands:

categorical_features = np.argwhere( np.array([len(set(boston.data[:,x])) for x in range(boston.data.shape[1])]) <= 10).flatten()

and the explainer:

explainer = lime.lime_tabular.LimeTabularExplainer(train, feature_names=boston.feature_names, class_names=['price'], categorical_features=categorical_features, verbose=True, mode='regression')

Now, we can grab one of our test values and check out our prediction(s). Here, we’ll grab the 100th test value and check the prediction and see what the explainer has to say about it.

i = 100 exp = explainer.explain_instance(test[i], rf.predict, num_features=5) exp.show_in_notebook(show_table=True)

LIME Explainer for regression

So…what does this tell us?

It tells us that the 100th test value’s prediction is 21.16 with the “RAD=24” value providing the most positive valuation and the other features providing negative valuation in the prediction.

For regression, this isn’t quite as interesting (although it is useful). The LIME approach shows much more benefit (at least to me) when performing classification.

As an example, if you are trying to classify plans as edible or poisonous, LIME’s explanation is much more useful. Here’s an example from the authors.

LIME explanation of edible vs poisonous

Take a look at LIME when you have some time. Its a good library to add to your toolkit, especially if you are doing a lot of classification work. It makes it much easier to ‘explain’ what the model is doing.

The post Local Interpretable Model-agnostic Explanations – LIME in Python appeared first on Python Data.

Categories: FLOSS Project Planets

Possbility and Probability: Using Python argparse and arguments with dashes

Sat, 2018-01-20 10:30

Have you ever been soooooo close to putting a puzzle together only to discover at the last minute that you are missing a piece? This happens to me all the time when I’m coding and I hit that last little … Continue reading →

The post Using Python argparse and arguments with dashes appeared first on Possibility and Probability.

Categories: FLOSS Project Planets

David MacIver: Lazy fisher-yates shuffling for precise rejection sampling

Sat, 2018-01-20 09:44

This is a trick I figured out a while ago. It came up in a problem I was working on, so I thought I’d write it up.

I haven’t seen it anywhere else, but I would be very surprised if it were in any way original to me rather than a reinvention. I think it’s niche and not very useful, so it’s hard to find prior art.

Attention conservation notice: My conclusion at the end is going to be that this is a neat trick that is probably not worth bothering with. You get a slight asymptotic improvement for a worse constant factor. I mention some variants and cases where it might be useful at the end, but it’s definitely more of an interesting intellectual exercise than a practical tool.

Suppose you have the following problem: You want to implement the following function:

def sample(random, n, f): """returns a number i in range(n) such that f(i) is True. Raises ValueError if no such i exists."""

What is the right way to do this?

The obvious way to do it is this:

def sample(random, n, f): choices = [i for i in range(n) if f(i)] if not choices: raise ValueError("No such i!") return random.choice(choices)

We calculate all of the possible outcomes up front and then sample from those.

This works but it is always \(O(n)\). We can’t help to do better than that in general – in the case where \(f(i)\) always returns False, we must call it \(n\) times to verify that and raise an error – but it is highly inefficient if, say, f always returns True.

For cases where we know a priori that there is at least one i with f(i) returning True, we can use the following implementation:

def sample(random, n, f): while True: i = random.randrange(0, n) if f(i): return i

Here we do rejection sampling: We simulate the distribution directly by picking from the larger uniform distribution and selecting on the conditional property that \(f(i)\) is True.

How well this works depends on a parameter \(A\) – the number of values in \([0, n)\) such that \(f(i)\) is True. The probability of stopping at any loop iteration is \(\frac{A}{n}\) (the probability of the result being True), so the number of loops is a geometric distribution with that parameter, and so the expected number of loop iterations is \(\frac{n}{R}\). i.e. we’ve effectively sped up our search by a factor of \(A\).

So in expectation rejection sampling is always at least as good and usually better than our straightforward loop, but it has three problems:

  1. It will occasionally take more than \(n\) iterations to complete because it may try the same \(i\) more than once.
  2. It loops forever when \(A=0\)
  3. Even when \(A > 0\) its worst-case may take strictly more iterations than the filtering method.

So how can we fix this?

If we fix the first by arranging so that it never calls \(f\) with the same \(i\) more than once, then we also implicitly fix the other two: At some point we’ve called \(f\) with every \(i\) and we can terminate the loop and error, and if each loop iteration returns a fresh \(i\) then we can’t have more than \(n\) iterations.

We can do this fairly naturally by shuffling the order in which we try the indices and then returning the first one that returns True:

def sample(random, n, f): indices = list(range(n)) random.shuffle(indices) for i in indices: if f(i): return i raise ValueError("No such i!")

Effectively at each iteration of the loop we are uniformly at random selecting an \(i\) among those we haven’t seen yet.

So now we are making fewer calls to \(f\) – we only call it until we first find an example, as in our rejection sampling. However we’re still paying an \(O(n)\) cost for actually doing the shuffle!

We can fix this partially by inlining the implementation of the shuffle as follows (this is a standard Fisher-Yates shuffle, though we’re doing it backwards:

def sample(random, n, f): indices = list(range(n)) random.shuffle(indices) for j in range(n): k = random.randrange(j, n) indices[j], indices[k] = indices[k], indices[j] i = indices[j] if f(i): return i raise ValueError("No such i!")

This works because after \(j\) iterations of the loop in a Fisher-Yates shuffle the indices up to and including \(j\) are shuffled. Thus we can effectively fuse our search into the loop – if we know we’re going to terminate here, we can just stop and not bother shuffling the rest.

But we’re still paying \(O(n)\) cost for initialising the list of indices. Can we do better? Yes we can!

The idea is that we can amortise the cost of our reads of indices into our writes of it, and we only do \(O(\frac{n}{A})\) writes. If we haven’t written to a position in the indices list yet, then its value must be equal to its position.

We can do this naturally in python using defaultdict (there are better data structures for this, but a hashmap gets us the desired amortized complexity) as follows:

class key_dict(defaultdict): def __missing__(self, key): return key def sample(random, n, f): indices = key_dict() random.shuffle(indices) for j in range(n): k = random.randrange(j, n) indices[j], indices[k] = indices[k], indices[j] i = indices[j] if f(i): return i raise ValueError("No such i!")

So we have now completely avoided any \(O(n)\) costs (assuming we’re writing Python 3 and so range is lazy. If you’re still using legacy Python, replace it with an xrange).

What is the actual expected number of loop iterations?

Well if \(A = n\) then we only make one loop iteration, and if \(A = 0\) we always make \(n\). In general we certainly make no more than \(n – A\) iterations, because after that many iterations we’ve exhausted all the values for which \(f\) is False.

I, err, confess I’ve got a bit lost in the maths for what the expected number of iterations of this loop is. If \(L\) is a random variable that is the position of the loop iteration on which this terminates with a successful result (with \(L = n + 1\) if there are no valid values) then a counting argument gives us that \(P(L \geq k) = \frac{(n – A)! (n – k)!}{n! (n – A – k)!}\), and we can calculate \(E(L) = \sum\limits_{k=0}^{n – A} P(L \geq k)\), but the sum is fiddly and my ability to do such sums is rusty. Based on plugging some special cases into Wolfram Alpha, I think the expected number of loop iterations is something like \(\frac{n+1}{A + 1}\), which at least gives the right numbers for the boundary cases. If that is the right answer then I’m sure there’s some elegant combinatorial argument that shows it, but I’m not currently seeing it. Assuming this is right, this is asymptotically no better than the original rejection sampling.

Regardless, the expected number of loop iterations is definitely \(\leq \frac{n}{k}\), because we can simulate it by running our rejection sampling and counting \(1\) whenever we see a new random variable, so it is strictly dominated by the expected number of iterations of the pure rejection sampling. So we do achieve our promised result of an algorithm that strictly improves on both rejection sampling and filter then sample – it has a better expected complexity than the former, and the same worst case complexity as the latter.

Is this method actually worth using? Ehhh… maybe? It’s conceptually pleasing to me, and it’s nice to have a method that complexity-wise strictly out-performs either of the two natural choices, but in practice the constant overhead of using the hash map almost certainly is greater than any benefit you get from it.

The real problem that limits this being actually useful is that either \(n\) is small, in which case who cares, or \(n\) is large, in which case the chances of drawing a duplicate are sufficiently low that the overhead is negligible.

There are a couple of ways this idea can still be useful in the right niche though:

  • This does reduce the constant factor in the number of calls to \(f\) (especially if \(A\) is small), so if \(f\) is expensive then the constant factor improvement in number of calls may be enough to justify this.
  • If you already have your values you want to sample in an array, and the order of the array is arbitrary, then you can use the lazy Fisher Yates shuffle trick directly without the masking hash map.
  • If you’re genuinely unsure about whether \(A > 0\) and do want to be able to check, this method allows you to do that (but you could also just run rejection sampling \(n\) times and then fall back to filter and sample and it would probably be slightly faster to do so).

If any of those apply, this might be a worthwhile trick. If they don’t, hopefully you enjoyed reading about it anyway. Sorry.

Categories: FLOSS Project Planets

David MacIver: A pathological example for test-case reduction

Sat, 2018-01-20 06:46

This is an example I cut from a paper I am currently writing about test-case reduction because I needed to save space and nobody except me actually cares about the algorithmic complexity of test-case reduction.

Suppose you have the following test case:

from hypothesis import given, strategies as st K = 64 INTS = st.integers(0, 2 ** K - 1) @given(INTS, INTS) def test_are_not_too_far_apart(m, n): assert abs(m - n) > 1

Then if this test ever fails, with initial values \(m\) and \(n\), if reduction can replace an int with its predecessor but can’t change two ints at once, the reduction will take at least \(\frac{m + n}{2}\) test executions: The fastest possible path you can take is to at each step reduce the larger of \(m\) and \(n\) by two, so each step only reduces \(m + n\) by \(2\), and the whole iteration takes \(\frac{m + n}{2}\) steps.

(Side note: You can get lucky with Hypothesis and trigger some special cases that make this test faster. if you happen to have \(m = n\) then it can reduce the two together, but if you start with \(m = n \pm 1\) then it will currently never trigger because it will not ever have duplicates at the entry to that step. Hypothesis will also actually find this bug immediately because it will try it with both examples set to zero. Trivial modifications to the test can be made to avoid these problems, but I’m going to ignore them here).

The interesting thing about this from a Hypothesis point of view is that \(m + n\) is potentially exponential in \(k\), and the data size is linear in \(k\), so Hypothesis’s test case reduction is of exponential complexity (which doesn’t really cause a performance problem in practice because the number of successful reductions gets capped, but does cause an example quality problem because you then don’t run the full reduction). But this isn’t specifically a Hypothesis problem – I’m morally certain every current property-based testing library’s test case reduction is exponential in this case (except for ones that haven’t implemented reduction at all), possibly with one or two patches to avoid trivial special cases like always trying zero first.

Another way to get around this is to almost never trigger this test case with large values! Typically property-based testing libraries will usually only generate an example like this with very small values. But it’s easy for that not to be the case – mutational property-based testing libraries like Hypothesis or crowbar can in theory fairly easily find this example for large \(m\) and \(n\) (Hypothesis currently doesn’t. I haven’t tried with Crowbar). Another way you could easily trigger it is with distributions that special case large values.

One thing I want to emphasise is that regardless of the specific nature of the example and our workarounds for it, this sort of problem is inherent. It’s easy to make patches that avoid this particular example (especially in Hypothesis which has no problem making simultaneous changes to \(m\) and \(n\)).

But if you fix this one, I can just construct another that is tailored to break whatever heuristic it was that you came up with. Test-case reduction is a local search method in an exponentially large  space, and this sort of problem is just what happens when you block off all of the large changes your local search method tries to make but still allow some of the small ones.

You can basically keep dangling the carrot in front of the test-case reducer going “Maybe after this reduction you’ll be done”, and you can keep doing that indefinitely because of the size of the space. Pathological examples like this are not weird special cases, if anything the weird thing is that most examples are not pathological like this.

My suspicion is that we don’t see this problem cropping up much in practice for a couple of reasons:

  1. Existing property-based testing libraries are very biased towards finding small examples in the first place, so even when we hit cases with pathological complexity, \(n\) is so small that it doesn’t actually matter.
  2. This sort of boundary case relies on what are essentially “weird coincidences” in the test case. They happen when small local changes unlock other small local changes that were previously locked. This requires subtle dependency between different parts of the test case, and where that subtle dependency exists we are almost never finding it. Thus I suspect the fact that we are not hitting exponential slow downs in our test case reduction on a regular basis may actually be a sign that there are entire classes of bug that we are just never finding because the probability of hitting the required dependency combination is too low.
  3. It may also be that bugs just typically do not tend to have that kind of sensitive dependencies. My suspicion is that this is not true given the prevalence of off-by-one errors.
  4. It may also be that people are hitting this sort of problem in practice and aren’t telling us because they don’t care that much about the performance of test case reduction or example quality.
Categories: FLOSS Project Planets

Amjith Ramanujam: FuzzyFinder - in 10 lines of Python

Fri, 2018-01-19 22:09
Introduction:

FuzzyFinder is a popular feature available in decent editors to open files. The idea is to start typing partial strings from the full path and the list of suggestions will be narrowed down to match the desired file. 

Examples: 

Vim (Ctrl-P)

Sublime Text (Cmd-P)

This is an extremely useful feature and it's quite easy to implement.

Problem Statement:

We have a collection of strings (filenames). We're trying to filter down that collection based on user input. The user input can be partial strings from the filename. Let's walk this through with an example. Here is a collection of filenames:

When the user types 'djm' we are supposed to match 'django_migrations.py' and 'django_admin_log.py'. The simplest route to achieve this is to use regular expressions. 

Solutions:Naive Regex Matching:

Convert 'djm' into 'd.*j.*m' and try to match this regex against every item in the list. Items that match are the possible candidates.

This got us the desired results for input 'djm'. But the suggestions are not ranked in any particular order.

In fact, for the second example with user input 'mig' the best possible suggestion 'migrations.py' was listed as the last item in the result.

Ranking based on match position:

We can rank the results based on the position of the first occurrence of the matching character. For user input 'mig' the position of the matching characters are as follows:

Here's the code:

We made the list of suggestions to be tuples where the first item is the position of the match and second item is the matching filename. When this list is sorted, python will sort them based on the first item in tuple and use the second item as a tie breaker. On line 14 we use a list comprehension to iterate over the sorted list of tuples and extract just the second item which is the file name we're interested in.

This got us close to the end result, but as shown in the example, it's not perfect. We see 'main_generator.py' as the first suggestion, but the user wanted 'migration.py'.

Ranking based on compact match:

When a user starts typing a partial string they will continue to type consecutive letters in a effort to find the exact match. When someone types 'mig' they are looking for 'migrations.py' or 'django_migrations.py' not 'main_generator.py'. The key here is to find the most compact match for the user input.

Once again this is trivial to do in python. When we match a string against a regular expression, the matched string is stored in the match.group(). 

For example, if the input is 'mig', the matching group from the 'collection' defined earlier is as follows:

We can use the length of the captured group as our primary rank and use the starting position as our secondary rank. To do that we add the len(match.group()) as the first item in the tuple, match.start() as the second item in the tuple and the filename itself as the third item in the tuple. Python will sort this list based on first item in the tuple (primary rank), second item as tie-breaker (secondary rank) and the third item as the fall back tie-breaker. 

This produces the desired behavior for our input. We're not quite done yet.

Non-Greedy Matching

There is one more subtle corner case that was caught by Daniel Rocco. Consider these two items in the collection ['api_user', 'user_group']. When you enter the word 'user' the ideal suggestion should be ['user_group', 'api_user']. But the actual result is:

Looking at this output, you'll notice that api_user appears before user_group. Digging in a little, it turns out the search user expands to u.*s.*e.*r; notice that user_group has two rs, so the pattern matches user_gr instead of the expected user. The longer match length forces the ranking of this match down, which again seems counterintuitive. This is easy to change by using the non-greedy version of the regex (.*? instead of .*) on line 4. 

Now that works for all the cases we've outlines. We've just implemented a fuzzy finder in 10 lines of code.

Conclusion:

That was the design process for implementing fuzzy matching for my side project pgcli, which is a repl for Postgresql that can do auto-completion. 

I've extracted fuzzyfinder into a stand-alone python package. You can install it via 'pip install fuzzyfinder' and use it in your projects.

Thanks to Micah Zoltu and Daniel Rocco for reviewing the algorithm and fixing the corner cases.

If you found this interesting, you should follow me on twitter

Epilogue:

When I first started looking into fuzzy matching in python, I encountered this excellent library called fuzzywuzzy. But the fuzzy matching done by that library is a different kind. It uses levenshtein distance to find the closest matching string from a collection. Which is a great technique for auto-correction against spelling errors but it doesn't produce the desired results for matching long names from partial sub-strings.

Categories: FLOSS Project Planets

Sandipan Dey: Hand-Gesture Classification using Deep Convolution and Residual Neural Network with Tensorflow / Keras in Python

Fri, 2018-01-19 20:22
In this article, first an application of convolution net to classify a set of hand-sign images is going to be discussed.  Later the accuracy of this classifier will be improved using a deep res-net. These problems appeared as assignments in the Coursera course Convolution Neural Networks (a part of deep-learning specialization) by the Stanford Prof. Andrew Ng. (deeplearning.ai). The problem descriptions are taken straightaway … Continue reading Hand-Gesture Classification using Deep Convolution and Residual Neural Network with Tensorflow / Keras in Python
Categories: FLOSS Project Planets

Techiediaries - Django: An Introduction to REST with PHP by Building a Simple Real World API

Fri, 2018-01-19 19:00

In this tutorial we are going to learn how to build a simple real world REST API with plain PHP. This API will be the basis of the next tutorials for adding JWT-based authentication and building your front-ends with modern JavaScript/TypeScript frameworks and libraries such as Angular, React.js and Vue.js etc.

Throughout the tutorial we'll create a simple API (but in the same time it's a real-world API. In fact you can use it to build a small stock tracking app) with the most straightforward and simplest architecture (and file structure) i.e we are not going to cover advanced concepts such as MVC, routing or template languages (we will use PHP itself as the template language. I know it's a bad practice but this is how you do things when you first get started using PHP also if you are looking for these concepts you better use a PHP framework, most of them are built around these advanced concepts) so this tutorial can be as beginners-friendly as possible.

What is an API?

API stands for Application Interface Programming. It's an interface that allows applications to communicate with each other. In case of the web it refers to the interface (a set of URLs that allows you to exchange data with a web application via a set of operations commonly known as CRUD--Create, Read, Update and Delete operations by sending HTTP requests such as POST, GET, PUT and DELETE etc.

What is REST?

REST stands for REpresentational State Transfer". It's a set of rules that define how to exchange resources in a distributed system such as stateleness i.e the server doesn't keep any information about the previous requests which means the current request should include every information the server needs for fulfilling the desired operation. Data is usually exchanged in JSON (JavaScript Object Notation) format.

So REST API refers to the interface that allows mobile devices and web browsers (or also other web servers) to create, read, update and delete resources in the server respecting the REST rules (such as being stateless).

Using REST you can build one back-end and then build different client apps or front-ends for web browsers and mobile devices (iOS and Android etc.) because the back-end is decoupled from the front-end--the communication between the client and the server apps takes place via the REST interface. You can offer your users another app or you can build more apps to support the other mobile platforms without touching the back-end code.

Database Design

In order to build a web API, you need a way to store data, behind the scene, in your server's database. For this tutorial we'll use MySQL RDMS (Relational Database Management System) which is the most used database system in the PHP world.

The first step is to design our database so we'll use the Entity-Relationship Diagram

An entity relationship diagram, or also an entity-relationship model, is a graphical representation of entities and how they relate to each other. They are used to model relational databases. In ER diagrams you use entities (boxes) to represent real world concepts or objects and relationships (arrows) to represent a relation between two entities.

There are three types of relationships: One-to-One, One-to-Many and Many-to-Many.

Here is a screenshot of an example ER model for our database

We have four entities that are related with each other: A product has a family, belongs to a location and can have many related transactions.

After creating an ER model you can easily write SQL CREATE statements to create the SQL tables in the MySQL database. You can simply map each entity to a SQL table and relationships to foreign keys.

Any decent ER diagramming tool will include an export button that can help you generate the SQL script from your ER model without having to write it manually.

Now let's create SQL for our database

CREATE TABLE `Product` ( `id` int(11) UNSIGNED AUTO_INCREMENT PRIMARY KEY, `sku` varchar(255), `barcode` varchar(255), `name` varchar(100), `price` float, `unit` varchar(20), `quantity` float, `minquantity` float, `createdAt` datetime NOT NULL, `updatedAt` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, `familyid` int(11) NOT NULL, `locationid` int(11) NOT NULL ); CREATE TABLE `Family` ( `id` int(11) UNSIGNED AUTO_INCREMENT PRIMARY KEY, `reference` varchar(50), `name` varchar(100), `createdAt` datetime NOT NULL, `updatedAt` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ); CREATE TABLE `Transaction` ( `id` int(11) UNSIGNED AUTO_INCREMENT PRIMARY KEY, `comment` text, `price` float, `quantity` float, `reason` enum('New Stock','Usable Return','Unusable Return'), `createdAt` datetime NOT NULL, `updatedAt` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, `productid` int(11) NOT NULL ); CREATE TABLE `Location` ( `id` int(11) UNSIGNED AUTO_INCREMENT PRIMARY KEY, `reference` varchar(50), `description` text, `createdAt` datetime NOT NULL, `updatedAt` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP );

You can PhpMyAdmin or the MySQL CLI-based client to create a new database then copy and run the previous SQL queries to create the new tables.

You can also use an installer.php script which gets called one time to execute a SQL script that creates a database and create the database tables.

If you want to take this approach, create config/data/database.sql then copy the following code plus the previous SQL CREATE statements to create the tables

CREATE DATABASE mydb; use mydb; /* COPY THE PREVIOUS STATEMENTS HERE*/

Now we need to execute this script from PHP. So go ahead and create a file config/install.php then copy the following code:

include_once './dbclass.php'; try { $dbclass = new DBClass(); $connection = $dbclass.getConnection(); $sql = file_get_contents("data/database.sql"); $connection->exec($sql); echo "Database and tables created successfully!"; } catch(PDOException $e) { echo $e->getMessage(); }

We’ll be placing the contents of the config/data/database.sql file into a variable using the filegetcontents() function, and executing it with the exec() function.

You can see the implementation of the DBClass below.

File Structure

Our API project's file structure will be simple. We'll use a config folder for storing the configuration file(s), an entities folder for storing PHP classes that encapsulate the entities used by our API i.e products, locations, families and transactions.

Connecting to A MySQL Database in PHP

Inside the the config folder add a dbclass.php file that contains the following code to connect your API back-end to the underlying MySQL database.

<?php class DBClass { private $host = "localhost"; private $username = "root"; private $password = "<YOUR_DB_PASSWORD>"; private $database = "<YOUR_DB_NAME>"; public $connection; // get the database connection public function getConnection(){ $this->connection = null; try{ $this->connection = new PDO("mysql:host=" . $this->host . ";dbname=" . $this->database, $this->username, $this->password); $this->connection->exec("set names utf8"); }catch(PDOException $exception){ echo "Error: " . $exception->getMessage(); } return $this->connection; } } ?> What is PDO?

The PHP Data Objects (PDO) extension defines a lightweight, consistent interface for accessing databases in PHP. Each database driver that implements the PDO interface can expose database-specific features as regular extension functions. Note that you cannot perform any database functions using the PDO extension by itself; you must use a database-specific PDO driver to access a database server. Source.

The PDO object will ask for four parameters:

DSN (data source name), which includes type of database, host name, database name (optional) Username to connect to host Password to connect to host Additional options

Next we'll create the PHP classes that encapsulate the entities (or database tables). Each class will contain a hard-coded string storing the name of the corresponding SQL table, a member variable that will be holding an instance of the Connection class which will be passed via the class constructor and other fields mapping to the table columns. Each entity class will also encapsulate the CRUD operations needed for creating, reading, updating and deleting the corresponding table rows.

The Product Class <?php class Product{ // Connection instance private $connection; // table name private $table_name = "Product"; // table columns public $id; public $sku; public $barcode; public $name; public $price; public $unit; public $quantity; public $minquantity; public $createdAt; public $updatedAt; public $family_id; public $location_id; public function __construct($connection){ $this->connection = $connection; } //C public function create(){ } //R public function read(){ $query = "SELECT c.name as family_name, p.id, p.sku, p.barcode, p.name, p.price, p.unit, p.quantity , p.minquantity, p.createdAt, p.updatedAt FROM" . $this->table_name . " p LEFT JOIN Family c ON p.family_id = c.id ORDER BY p.createdAt DESC"; $stmt = $this->connection->prepare($query); $stmt->execute(); return $stmt; } //U public function update(){} //D public function delete(){} } The Transaction Class <?php class Transaction{ // Connection instance private $connection; // table name private $table_name = "Transaction"; // table columns public $id; public $comment; public $price; public $quantity; public $reason; public $createdAt; public $updatedAt; public $product_id; public function __construct($connection){ $this->connection = $connection; } //C public function create(){} //R public function read(){} //U public function update(){} //D public function delete(){} } The Family Class <?php class Family{ // Connection instance private $connection; // table name private $table_name = "Family"; // table columns public $id; public $reference; public $name; public $createdAt; public $updatedAt; public function __construct($connection){ $this->connection = $connection; } //C public function create(){} //R public function read(){} //U public function update(){} //D public function delete(){} } The Location Class <?php class Location{ // Connection instance private $connection; // table name private $table_name = "Location"; // table columns public $id; public $reference; public $description; public $createdAt; public $updatedAt; public function __construct($connection){ $this->connection = $connection; } //C public function create(){} //R public function read(){} //U public function update(){} //D public function delete(){} } Creating the API Endpoints

We have four entities that we want to CRUD with our API so create four folders products, transactions, families and locations and then in each folder create create.php, read.php, update.php, delete.php.

Implementing products/read.php

Open the products/read.php file then add the following code:

header("Content-Type: application/json; charset=UTF-8"); include_once '../config/dbclass.php'; include_once '../entities/product.php'; $dbclass = new DBClass(); $connection = $dbclass->getConnection(); $product = new Product($connection); $stmt = $product->read(); $count = $stmt->rowCount(); if($count > 0){ $products = array(); $products["body"] = array(); $products["count"] = $count; while ($row = $stmt->fetch(PDO::FETCH_ASSOC)){ extract($row); $p = array( "id" => $id, "sku" => $sku, "barcode" => $barcode, "name" => $name, "price" => $price, "unit" => $unit, "quantity" => $quantity, "minquantity" => $minquantity, "createdAt" => $createdAt, "createdAt" => $createdAt, "updatedAt" => $updatedAt, "family_id" => $family_id, "location_id" => $location_id ); array_push($products["body"], $p); } echo json_encode($products); } else { echo json_encode( array("body" => array(), "count" => 0); ); } ?> Implementing product/create.php <?php header("Content-Type: application/json; charset=UTF-8"); header("Access-Control-Allow-Methods: POST"); header("Access-Control-Max-Age: 3600"); header("Access-Control-Allow-Headers: Content-Type, Access-Control-Allow-Headers, Authorization, X-Requested-With"); include_once '../config/dbclass.php'; include_once '../entities/product.php'; $dbclass = new DBClass(); $connection = $dbclass->getConnection(); $product = new Product($connection); $data = json_decode(file_get_contents("php://input")); $product->name = $data->name; $product->price = $data->price; $product->description = $data->description; $product->category_id = $data->category_id; $product->created = date('Y-m-d H:i:s'); if($product->create()){ echo '{'; echo '"message": "Product was created."'; echo '}'; } else{ echo '{'; echo '"message": "Unable to create product."'; echo '}'; } ?>
Categories: FLOSS Project Planets

Techiediaries - Django: Building an Ionic 3/Angular 4|5 Application with a GraphQL API

Fri, 2018-01-19 19:00

Let's see how to build a CRUD Ionic 3 mobile application (or if you prefer an Angular 4+ web application) using the modern GraphQL-based API (instead of REST-based API). Since we need a backend to serve the API, we'll cover two options, we'll first see how to use the GraphQL Launchpad to easilly create an Apollo server in 60 lines of code then we'll see how you can build your own self hosted backend using Python and Django

Introduction to GraphQL

GraphQL is a modern standard for creating web Application Programming Interfaces, commonly known as web APIs. For the last decade REST become the standard way for building APIs but thanks to Facebook there is now a more powerful alternative that have many advantages over REST.

GraphQL is a query language for APIs and a runtime for fulfilling those >queries with your existing data. GraphQL provides a complete and >understandable description of the data in your API, gives clients the power >to ask for exactly what they need and nothing more, makes it easier to >evolve APIs over time, and enables powerful developer tools. --- graphql.org

So let's break this:

  • GraphQL is a standard and runtime for building APIs not a programming language or a developer tool
  • GraphQL gives you more control over your data i.e you can specify exactly what data attributes you want returned with the same query.
  • You can use one query to fetch related-data
  • Unlike Rest APIs you are not dependent of the server implementation of your endpoints
  • less round-trips to the server for getting all data you need

GraphQL is created and used internally by Facebook then open sourced in 2015.

Many big industry players are using GraphQL for implemeting their API layer such as:

  • Facebook: the creator of GraphQL
  • Github
  • Shopify
  • Product Hunt
  • Pinterest
  • Twitter
  • yelp etc.

You can find more companies that are using GraphQL from this link.

Queries and Mutations

For working with GraphQL, you need to be familair with two important concepts which are Queries and Mutations.

Queries are used to query and retrieve data from the GraphQL server. For example, suppose you want to get the list of products from a GraphQL backend. Here is an example of a query you would send:

query { products{ id, reference, quantity } }

A Query is a JSON object which has a root field and a payload(a set of fields). Using a query, you can specify the name and the exact fields of the object to retrieve.

You can also pass parameters to a query, for example for getting an object with its id. Here an example of a query you would send:

query { product(id: 1) { id, reference } }

You can also nest a query inside another query for getting related objects data. For example to get the products of the queryied families you would send something like:

query { famillies { id, reference, products{ id, reference } } }

A mutation is a write that generally returns the newly modified data object (as opposed to a query, which is meant to be read-only)

Using GraphQL: Servers

You can either build a backend which exposes a GraphQL-based API using your preferred language such as JavaScript/Node.js, PHP, Python or Ruby etc. Or you can also GraphQL based hosted services or headless content management systems such as:

  • GraphCMS - GraphQL based Headless Content Management System.
  • Graphcool - Your own GraphQL backend in under 5 minutes. Works with every * GraphQL client such as Relay and Apollo.
  • Reindex - Instant GraphQL Backend for Your React Apps.
  • Scaphold - GraphQL as a service that includes API integrations such as Stripe and Mailgun.
  • Tipe - Next Generation API-first CMS with a GraphQL or REST API. Stop letting your CMS decide how you build your apps.
Using GraphQL: Clients Consuming GraphQL API with Ionic 3/Angular 4

Apollo makes fetching the exact data you need for your component easy and allows you to put your queries exactly where you need them. All we need is to install apollo-angular, angular-client, and angular-tag packages.

Create a New Ionic 3 App

Let’s start by creating a new Ionic 3 application using The Ionic CLI.

ionic start ionic-graphql-demo blank
Categories: FLOSS Project Planets

Techiediaries - Django: 3+ Ways to Add Authentication to Ionic 3 (Angular 4|5) Applications

Fri, 2018-01-19 19:00

Let's look at the available options for adding authentication (login and registration) into your mobile application built using Ionic 3 and Angular 4|5 such as SaaS (Software As a Service) providers like Firebase, Auth0 and Okta, free third party (Single Sign On) services like Facebook, GitHub and Google, self hosted servers like Parse or building your own auth back-end with PHP, Python, Ruby or Node.js etc.

More often than not, when building your Ionic 3 mobile application (for Android or iOS), NativeScript mobile app or your Angular 4|5 web application, you would want to authenticate users with a remote HTTP server before authorizing them to access some protected resource(s) or restful API endpoint(s). You would say that's authorization not authentication? You are correct! Authentication i.e verifying the identity of a user is the simplest form of authorization (You can of course build a more advanced authorization system but that's not required in most cases except for multi-tenant apps where there are many users with different roles for the same account).

I recently intended to build an Ionic app with authentication so I looked for the available choices to build an authentication system with different features such as login, signup, user verification and password recovery via email etc. And found that there are many viable options from building your own hosted solution with a back-end technology, if you've got the required skills in some server side language such as PHP or Python (Django or Flask etc.) to hosted solutions (such as Firebase or Auth0) that allow you to build a back-end for your mobile/web applications with authentication, data storage and many extra features without the prior knowledge of a server side language, without reinventing the wheel and without hiring a back-end developer.

First of all, this article is not intended to show you how to create an Ionic 3 project since we have previously covered this in many tutorials.

With Ionic 3 and Angular you can literally build a fully fledged and complete mobile application for popular platforms such as Android, iOS and the Windows Universal Platform around these hosted services (we'll see them next) or around your own crafted back-end (but it's not that easy if you are not a skilled server side developer)

In this article, we'll look briefly at different ways to build an authentication system in Ionic 3 and Angular 4 without in-depth details on how to use each option but I will add links for more detailed tutorials on specific technologies if they are available or update the article once I have time to write more tutorials. Also please feel free to ask for a specific tutorial or for more information using the comments area below or via twitter(@techiediaries).

Adding User Authentication with SaaS/PaaS Services

Wikipedia defines SaaS as:

Software as a service is a software licensing and delivery model in which software is licensed on a subscription basis and is centrally hosted. It is sometimes referred to as "on-demand software",and was formerly referred to as "software plus services" by Microsoft. SaaS is typically accessed by users using a thin client via a web browser. SaaS has become a common delivery model for many business applications, including office software, messaging software, payroll processing software, DBMS software, management software etc.

So simply put, a SaaS is a software delivery model i.e a way to deliver software, to users, without downloading it from the Internet or copying it from a USB/CD medium and installing it in the local machine.

Also from Wikipedia, here is the definition of PaaS

Platform as a Service (PaaS) or application platform as a Service (aPaaS) is a category of cloud computing services that provides a platform allowing customers to develop, run, and manage applications without the complexity of building and maintaining the infrastructure typically associated with developing and launching an app. PaaS can be delivered in two ways: as a public cloud service from a provider, where the consumer controls software deployment with minimal configuration options, and the provider provides the networks, servers, storage, operating system (OS), middleware (e.g. Java runtime, .NET runtime, integration, etc.), database and other services to host the consumer's application; or as a private service (software or appliance) inside the firewall, or as software deployed on a public infrastructure as a service.

A PaaS is different than SaaS because it provides developers with a whole platform to develop and run their software.

Angular Authentication with Firebase (PaaS)

Firebase is a PaaS that provides developers with a back-end which has many features for building mobile applications. Firebase provides many essential services:

  • hosting and a real-time database: all your users get updates (like a broadcasting system) when the database is updated (with Create, Read, Update and Delete operations)
  • user authentication and push notifications: you can easily add authentication to your app without building any server side logic
  • client SDKs: SDKs provide quick and easy integration with different languages and platforms
  • analytics
  • Cloud Firestore: a new and better alternative to Firebase's real-time database
Angular Authentication with Auth0 (SaaS)

Auth0 is a SaaS provider that helps you to:

  • Add authentication with multiple authentication sources, either social like Google, Facebook, Microsoft Account, LinkedIn, GitHub, Twitter, Box, Salesforce, among others, or enterprise identity systems like Windows Azure AD, Google Apps, Active Directory, ADFS or any SAML Identity Provider.
  • Add authentication through more traditional username/password databases.
  • Add support for linking different user accounts with the same user.
  • Support for generating signed Json Web Tokens to call your APIs and flow the user identity securely.
  • Analytics of how, when and where users are logging in.
  • Pull data from other sources and add it to the user profile, through JavaScript rules.

  • Go to http://auth0.com login with your credentials

  • Once done with above step, click on clients tab from left tab navigation

  • Create new Client with app name and client type as single page app

  • If you want to show social login in login widget then enable corresponding social login through clicking on connections -> Social.

  • Up to now we’re ok to go ahead.

npm install –g angular2-jwt auth0-lock --save

angular2-jwt is a small and unopinionated library that is useful for automatically attaching a JSON Web Token (JWT) as an Authorization header when making HTTP requests from an Angular 2 app. It also has a number of helper methods that are useful for doing things like decoding JWTs.

This library does not have any functionality for (or opinion about) implementing user authentication and retrieving JWTs to begin with. Those details will vary depending on your setup, but in most cases, you will use a regular HTTP request to authenticate your users and then save their JWTs in local storage or in a cookie if successful.

Angular Authentication with Okta

Okta provides an API service that allows developers to create, edit, and securely store user accounts and user account data, and connect them with one or multiple applications. We make user account management easier, more secure, and scalable so you can get to production sooner.

The Okta Sign-in Widget provides an embeddable JavaScript sign-in implementation that can be easily customized. The Sign-in Widget carries the same feature set in the standard Okta sign-in page of every tenant – with the added flexibility to change the look-and-feel. Included in the widget is support for password reset, forgotten password and strong authentication – all of which are driven by policies configured in Okta. Developers don’t have to write a single line of code to trigger these functions from within the widget. For consumer facing sites, social providers are also supported in the widget.

Similar or Alternative Services Passport.js Self Hosted Server: The Open Source Parse Server Authentication with SSO Services: Facebook, Google or GitHub
Categories: FLOSS Project Planets

Techiediaries - Django: Adding JWT Authentication to Python and Django REST Framework Using Auth0

Fri, 2018-01-19 19:00

In this tutorial we'll learn how to add JWT authentication to an API built with Django REST framework. Basically we'll use the djangorestframework-jwt package for adding JWT authentication as you would normally do except that we'll change JWT_AUTH to use Auth0.

This tutorial assumes you already have a development machine with Python 3 and pip installed and will cover the following points:

  • We'll see how to create a virtual environment, install Django and the other dependencies (Django REST framework and djangorestframework-jwt)
  • We'll see how to create an Auth0 API
  • We'll see how to integrate Auth0 JWT authentication with Django
  • We'll briefly talk about using Auth0 Rules for detecting signup
  • We'll see how to add some Django views for testing JWT
  • We'll see how to use Postman for testing JWT authentication with Auth0
Creating the Django Project

So head over to your terminal then create a new virtual environment and activate it using the venv module in your current working directory:

python3 -m venv ./myenv source myenv/bin/activate

Next install Django using pip:

pip install django

Now you'll need to create a new Django project using:

django-admin startproject auth0-django-example

Next create a new application in your project

cd auth0-django-example python manage.py startapp customers

Add customers to the installed apps in your project' settings.py file:

INSTALLED_APPS = [ 'django.contrib.admin', 'django.contrib.auth', 'django.contrib.contenttypes', 'django.contrib.sessions', 'django.contrib.messages', 'django.contrib.staticfiles', 'customers' ]

Next migrate your database then start the development server

python manage.py migrate python manage.py runserver

You can visit your app at http://localhost:8000

Create an Auth0 API

Head over to your Auth0 dashboard then create an API

Go to the API section then click on the CREATE API button which will show a form where you need to enter your API details

Integrating Auth0 with Django

Now head back to your terminal then install Django REST framework and djangorestframework-jwt package for handling JWT authentication using pip

pip install djangorestframework pip install djangorestframework-jwt pip install cryptography pip install python-jose

Add rest_framework and rest_framework_jwt to the installed apps in settings.py:

INSTALLED_APPS = [ 'rest_framework', 'rest_framework_jwt' ]

Next you'll need to setup djangorestframework-jwt to use Auth0 central server for JWT authentication by follwing a few steps.

First add JSONWebTokenAuthentication to DEFAULT_AUTHENTICATION_CLASSES:

REST_FRAMEWORK = { 'DEFAULT_PERMISSION_CLASSES': ( 'rest_framework.permissions.IsAuthenticated', ), 'DEFAULT_AUTHENTICATION_CLASSES': ( 'rest_framework_jwt.authentication.JSONWebTokenAuthentication', ), }

Secondly import the follwing libs in your settings.py file:

import json from six.moves.urllib import request from cryptography.x509 import load_pem_x509_certificate from cryptography.hazmat.backends import default_backend

Finally add this code to settings.py:

AUTH0_DOMAIN = '<YOUR_AUTH0_DOMAIN>' API_IDENTIFIER = '<YOUR_API_IDENTIFIER>' PUBLIC_KEY = None JWT_ISSUER = None if AUTH0_DOMAIN: jsonurl = request.urlopen('https://' + AUTH0_DOMAIN + '/.well-known/jwks.json') jwks = json.loads(jsonurl.read().decode('utf-8')) cert = '-----BEGIN CERTIFICATE-----\n' + jwks['keys'][0]['x5c'][0] + '\n-----END CERTIFICATE-----' certificate = load_pem_x509_certificate(cert.encode('utf-8'), default_backend()) PUBLIC_KEY = certificate.public_key() JWT_ISSUER = 'https://' + AUTH0_DOMAIN + '/' def jwt_get_username_from_payload_handler(payload): return 'someusername' JWT_AUTH = { 'JWT_PAYLOAD_GET_USERNAME_HANDLER': jwt_get_username_from_payload_handler, 'JWT_PUBLIC_KEY': PUBLIC_KEY, 'JWT_ALGORITHM': 'RS256', 'JWT_AUDIENCE': API_IDENTIFIER, 'JWT_ISSUER': JWT_ISSUER, 'JWT_AUTH_HEADER_PREFIX': 'Bearer', }

But of course you need to replace AUTH0_DOMAIN with your own Auth0 domain and API_IDENTIFIER with your own API identifier.

Please note that you need to create a user in your Django database with a someusername username for the JWT authentication to work.

The custom jwt_get_username_from_payload_handler that we are using is very simple, it maps your Auth0 users to one user in your Django database.

Because Auth0 already takes care of managing users and profiles for you so most of the time you don't have to store users locally i.e in your Django database unless you need to have users information in your database for some reason.

In this case you'll need to create a more advanced implementation. You can use this custom method instead:

def jwt_get_username_from_payload_handler(payload): return payload.get('sub').replace('|', '.')

But that's not the end of story: You need to create a Django user when a user successfully signs up using Auth0.

Using Auth0 Rules for Detecting Signup

For this task you need to use Auth0 Rules

Rules are functions written in JavaScript that are executed in Auth0 as part of the transaction every time a user authenticates to your application. They are executed after the authentication and before the authorization.

Rules allow you to easily customize and extend Auth0's capabilities. They can be chained together for modular coding and can be turned on and off individually. Source

You can also see this example of a signup rule

Adding Django Views

Now let's add the code to test the Auth0 JWT authentication:

In customers/views.py add two view functions

from rest_framework.decorators import api_view from django.http import HttpResponse def public(request): return HttpResponse("You don't need to be authenticated to see this") @api_view(['GET']) def private(request): return HttpResponse("You should not see this message if not authenticated!");

In urls.py add:

from django.conf.urls import url from . import views urlpatterns = [ url(r'^api/public/', views.public), url(r'^api/private/', views.private) ] Testing JWT Authentication with Postman

Go to your API dashboard then to the Test tab then get a token you can use to test authentication

Next navigate with your web browser to http://localhost:8000/api/private/. You should get Authentication credentials were not provided.

Now let's use Postman for testing our endpoint: Open Postman then enter the URL for the endpoint then select Authorization tab.

For the TYPE select Bearer Token and in the right area enter the access token you get from Auth0 for testing.

Finally press the Send button, you should get: You should not see this message if not authenticated! as in the screenshot

Conclusion

In this tutorial we have created a simple Django application that uses Django REST framework and Auth0 for adding JWT authentication.

Categories: FLOSS Project Planets

Techiediaries - Django: QuickTip: Django and AngularJS Conflicting Interpolation Symbols

Fri, 2018-01-19 19:00

When using the Django framework with the AngularJS MVC framework for building modern single page applications or SPAs, one of the issues you will encouter is related to both frameworks using the same symbols for template tags i.e { { and } }. So in this quick tip post we'll see how to change the interpolation symbols in AngularJS to avoid these conflicts.

Luckliy for us, AngularJS provides the $interpolateProvider provider which allows developers to customize the interpolation symbols which default to { { and } }.

Used for configuring the interpolation markup. Defaults to . This feature is sometimes used to mix different markup languages, e.g. to wrap an AngularJS template within a Python Jinja template (or any other template language). Mixing templating languages is very dangerous. The embedding template language will not safely escape AngularJS expressions, so any user-controlled values in the template will cause Cross Site Scripting (XSS) security bugs! -- https://docs.angularjs.org/api/ng/provider/$interpolateProvider

Simple AngularJS Example

Let's see a simple example:

Go ahead and create a base template ng-base.html file in your templates folder then add the following content

{ % load staticfiles % } <!DOCTYPE html> <html lang="en" ng-app='demoApp'> <head> <base href="/"> <meta charset="utf-8"> <meta http-equiv="X-UA-Compatible" content="IE=edge"> <meta name="viewport" content="width=device-width, initial-scale=1"> <title>Integrate Angular 1.6.6 with Django</title> <script src='https://ajax.googleapis.com/ajax/libs/angularjs/1.6.6/angular.min.js'></script> <script src='{ % static "js/app.js" % }' ></script> </head> <body> <div class='content'> { % block content % }{ % endblock content % } </div> <div ng-controller="TestController as ctrl"> {$ ctrl.mymodel $} </div> </body> </html>

Next create js/app.js in your project's static folder and add the following code to create a new AngularJS app and inject $interpolateProvider in the config function.

'use strict'; var app = angular.module('demoApp', []); app.config(function($locationProvider,$interpolateProvider){ $locationProvider.html5Mode({ enabled:true }); $interpolateProvider.startSymbol('{$'); $interpolateProvider.endSymbol('$}'); }); app.controller('TestController', function() { this.mymodel = "I'm using the custom symbols"; });

So we have inject the interpolation provider $interpolateProvider then used two methods $interpolateProvider.startSymbol('{$'); and $interpolateProvider.endSymbol('$}'); to change the default sysmbols to custom ones.

Now you can use { { and } } for Django templates and {$ and $} for AngularJS templates.

Categories: FLOSS Project Planets