FLOSS Project Planets

Norbert Preining: Making fun of Trump – thanks France

Planet Debian - Sat, 2017-07-22 08:59

I mean, it is easy to make fun of Trump, he is just too stupid and incapable and uneducated. But what the French president Emmanuel Macron did on Bastille Day, in presence of the usual Trumpies, was just above the usual level of making fun of Trump. The French made Trump watch a French band playing a medley of Daft Punk. And as we know – Trump seemed to be very unimpressed, most probably because he doesn’t have a clue.

I mean, normally you play these pathetic rubbish, look at the average US (or Chinese or North Korean) parades, and here we have the celebration of an event much older then anything the US can put on the table, and they are playing Daft Punk!

France, thanks. You made my day – actually not only one!

Categories: FLOSS Project Planets

Krita 3.2.0: Second Beta Available

Planet KDE - Sat, 2017-07-22 07:46

We’re releasing the second beta for Krita 3.2.0 today! These beta builds contain the following fixes, compared to the first 3.2.0 beta release. Keep in mind that this is a beta: you’re supposed to help the development team out by testing it, and reporting issues on bugs.kde.org.

  • There are still problems on Windows with the integration with the gmic-qt plugin, but several lockups have been fixed.
  • The smart patch tool merge was botched: this is fixed now.
  • It wasn’t possible anymore to move vector objects with the mouse (finger and tablet worked fine). This is fixed now.
  • Fixed the size and flow sliders
  • Fixes to saving jpg or png images without a transparency channel
Download

The KDE download site has been updated to support https now.

Windows

Note for Windows users: if you encounter crashes, please follow these instructions to use the debug symbols so we can figure out where Krita crashes.

Linux

(If, for some reason, Firefox thinks it needs to load this as text: to download, right-click on the link.)

A snap image for the Ubuntu App Store will be available from the Ubuntu application store. When it is updated, you can also use the Krita Lime PPA to install Krita 3.2.0-beta.2 on Ubuntu and derivatives.

OSX Source code md5sums

For all downloads:

Key

The Linux appimage and the source tarball are signed. You can retrieve the public key over https here:
0x58b9596c722ea3bd.asc
. The signatures are here.

Support Krita

Krita is a free and open source project. Please consider supporting the project with donations or by buying training videos or the artbook! With your support, we can keep the core team working on Krita full-time.

Categories: FLOSS Project Planets

Catalin George Festila: About py-translate python module.

Planet Python - Sat, 2017-07-22 07:46
This python module is used for translating text in the terminal.
You can read and see examples with this API on this web page.
Features

  • Fast! Translate an entire book in less than 5 seconds.
  • Made for Python 3 but still works on Python 2
  • Fast and easy to install, easy to use
  • Supports translation from any language
  • Highly composable interface, the power of Unix pipes and filters.
  • Simple API and documentation

Installation 
C:\>cd Python27

C:\Python27>cd Scripts

C:\Python27\Scripts>pip install py-translate
Collecting py-translate
Downloading py_translate-1.0.3-py2.py3-none-any.whl (61kB)
100% |################################| 61kB 376kB/s
Installing collected packages: py-translate
Successfully installed py-translate-1.0.3

C:\Python27\Scripts>Let's test it with a simple example:
>>> import translate
>>> dir(translate)
['TestLanguages', 'TestTranslator', '__author__', '__build__', '__builtins__', '__copyright__', '__doc__', '__file__', '__license__', '__name__', '__package__', '__path__', '__title__', '__version__', 'accumulator', 'coroutine', 'coroutines', 'languages', 'print_table', 'push_url', 'set_task', 'source', 'spool', 'tests', 'translation_table', 'translator', 'write_stream']
>>> from translate import translator
>>> translator('ro', 'en', 'Consider ca dezvoltarea personala este un pas important')
[[[u'I think personal development is an important step', u'Consider ca dezvoltarea personala este un pas important', None, None, 0]], None, u'ro']
>>>
Categories: FLOSS Project Planets

Niels Thykier: Improving bulk performance in debhelper

Planet Debian - Sat, 2017-07-22 07:45

Since debhelper/10.3, there has been a number of performance related changes.  The vast majority primarily improves bulk performance or only have visible effects at larger “input” sizes.

Most visible cases are:
  • dh + dh_* now scales a lot better for large number of binary packages.  Even more so with parallel builds.
  • Most dh_* tools are now a lot faster when creating many directories or installing files.
  • dh_prep and dh_clean now bulk removals.
  • dh_install can now bulk some installations.  For a concrete corner-case, libssl-doc went from approximately 11 seconds to less than a second.  This optimization is implicitly disabled with –exclude (among other).
  • dh_installman now scales a lot better with many manpages.  Even more so with parallel builds.
  • dh_installman has restored its performance under fakeroot (regression since 10.2.2)

 

For debhelper, this mostly involved:
  • avoiding fork+exec of commands for things doable natively in perl.  Especially, when each fork+exec only process one file or dir.
  • bulking as many files/dirs into the call as possible, where fork+exec is still used.
  • caching / memorizing slow calls (e.g. in parts of pkgfile inside Dh_Lib)
  • adding an internal API for dh to do bulk check for pkgfiles. This is useful for dh when checking if it should optimize out a helper.
  • and, of course, doing things in parallel where trivially possible.

 

How to take advantage of these improvements in tools that use Dh_Lib:
  • If you use install_{file,prog,lib,dir}, then it will come out of the box.  These functions are available in Debian/stable.  On a related note, if you use “doit” to call “install” (or “mkdir”), then please consider migrating to these functions instead.
  • If you need to reset owner+mode (chown 0:0 FILE + chmod MODE FILE), consider using reset_perm_and_owner.  This is also available in Debian/stable.
    • CAVEAT: It is not recursive and YMMV if you do not need the chown call (due to fakeroot).
  • If you have a lot of items to be processed by a external tool, consider using xargs().  Since 10.5.1, it is now possible to insert the items anywhere in the command rather than just in the end.
  • If you need to remove files, consider using the new rm_files function.  It removes files and silently ignores if a file does not exist. It is also available since 10.5.1.
  • If you need to create symlinks, please consider using make_symlink (available in Debian/stable) or make_symlink_raw_target (since 10.5.1).  The former creates policy compliant symlinks (e.g. fixup absolute symlinks that should have been relative).  The latter is closer to a “ln -s” call.
  • If you need to rename a file, please consider using rename_path (since 10.5).  It behaves mostly like “mv -f” but requires dest to be a (non-existing) file.
  • Have a look at whether on_pkgs_in_parallel() / on_items_in_parallel() would be suitable for enabling parallelization in your tool.
    • The emphasis for these functions is on making parallelization easy to add with minimal code changes.  It pre-distributes the items which can lead to unbalanced workloads, where some processes are idle while a few keeps working.
Credits:

I would like to thank the following for reporting performance issues, regressions or/and providing patches.  The list is in no particular order:

  • Helmut Grohne
  • Kurt Roeckx
  • Gianfranco Costamagna
  • Iain Lane
  • Sven Joachim
  • Adrian Bunk
  • Michael Stapelberg

Should I have missed your contribution, please do not hesitate to let me know.

 


Filed under: Debhelper, Debian
Categories: FLOSS Project Planets

Catalin George Festila: Make one executable from a python script.

Planet Python - Sat, 2017-07-22 07:12
The official website of this tool tells us:
PyInstaller bundles a Python application and all its dependencies into a single package. The user can run the packaged app without installing a Python interpreter or any modules. PyInstaller supports Python 2.7 and Python 3.3+, and correctly bundles the major Python packages such as numpy, PyQt, Django, wxPython, and others.

PyInstaller is tested against Windows, Mac OS X, and Linux. However, it is not a cross-compiler: to make a Windows app you run PyInstaller in Windows; to make a Linux app you run it in Linux, etc. PyInstaller has been used successfully with AIX, Solaris, and FreeBSD, but is not tested against them.

The manual of this tool can be see it here.
C:\Python27>cd Scripts

C:\Python27\Scripts>pip install pyinstaller
Collecting pyinstaller
Downloading PyInstaller-3.2.1.tar.bz2 (2.4MB)
100% |################################| 2.4MB 453kB/s
....
Collecting pypiwin32 (from pyinstaller)
Downloading pypiwin32-219-cp27-none-win32.whl (6.7MB)
100% |################################| 6.7MB 175kB/s
...
Successfully installed pyinstaller-3.2.1 pypiwin32-219Also this will install PyWin32 python module.
Let's make one test python script and then to make it executable.
I used this python script to test it:
from tkinter import Tk, Label, Button

class MyFirstGUI:
def __init__(self, master):
self.master = master
master.title("A simple GUI")

self.label = Label(master, text="This is our first GUI!")
self.label.pack()

self.greet_button = Button(master, text="Greet", command=self.greet)
self.greet_button.pack()

self.close_button = Button(master, text="Close", command=master.quit)
self.close_button.pack()

def greet(self):
print("Greetings!")

root = Tk()
my_gui = MyFirstGUI(root)
root.mainloop()The output of the command of pyinstaller:
C:\Python27\Scripts>pyinstaller.exe --onefile --windowed ..\tk_app.py
92 INFO: PyInstaller: 3.2.1
92 INFO: Python: 2.7.13
93 INFO: Platform: Windows-10-10.0.14393
93 INFO: wrote C:\Python27\Scripts\tk_app.spec
95 INFO: UPX is not available.
96 INFO: Extending PYTHONPATH with paths
['C:\\Python27', 'C:\\Python27\\Scripts']
96 INFO: checking Analysis
135 INFO: checking PYZ
151 INFO: checking PKG
151 INFO: Building because toc changed
151 INFO: Building PKG (CArchive) out00-PKG.pkg
213 INFO: Redirecting Microsoft.VC90.CRT version (9, 0, 21022, 8) -> (9, 0, 30729, 9247)
2120 INFO: Building PKG (CArchive) out00-PKG.pkg completed successfully.
2251 INFO: Bootloader c:\python27\lib\site-packages\PyInstaller\bootloader\Windows-32bit\runw.exe
2251 INFO: checking EXE
2251 INFO: Rebuilding out00-EXE.toc because tk_app.exe missing
2251 INFO: Building EXE from out00-EXE.toc
2267 INFO: Appending archive to EXE C:\Python27\Scripts\dist\tk_app.exe
2267 INFO: Building EXE from out00-EXE.toc completed successfully.Then I run the executable output:
C:\Python27\Scripts>C:\Python27\Scripts\dist\tk_app.exe

C:\Python27\Scripts>...and working well.

The output file come with this icon:

Also you can make changes by using your icons or set the type of this file, according to VS_FIXEDFILEINFO structure.
You need to have the icon file and / or version.txt file for VS_FIXEDFILEINFO structure.
Let's see the version.txt file:
# UTF-8
#
# For more details about fixed file info 'ffi' see:
# http://msdn.microsoft.com/en-us/library/ms646997.aspx
VSVersionInfo(
ffi=FixedFileInfo(
# filevers and prodvers should be always a tuple with four items: (1, 2, 3, 4)
# Set not needed items to zero 0.
filevers=(2017, 1, 1, 1),
prodvers=(1, 1, 1, 1),
# Contains a bitmask that specifies the valid bits 'flags'
mask=0x3f,
# Contains a bitmask that specifies the Boolean attributes of the file.
flags=0x0,
# The operating system for which this file was designed.
# 0x4 - NT and there is no need to change it.
OS=0x4,
# The general type of file.
# 0x1 - the file is an application.
fileType=0x1,
# The function of the file.
# 0x0 - the function is not defined for this fileType
subtype=0x0,
# Creation date and time stamp.
date=(0, 0)
),
kids=[
StringFileInfo(
[
StringTable(
u'040904b0',
[StringStruct(u'CompanyName', u'python-catalin'),
StringStruct(u'ProductName', u'test'),
StringStruct(u'ProductVersion', u'1, 1, 1, 1'),
StringStruct(u'InternalName', u'tk_app'),
StringStruct(u'OriginalFilename', u'tk_app.exe'),
StringStruct(u'FileVersion', u'2017, 1, 1, 1'),
StringStruct(u'FileDescription', u'test tk'),
StringStruct(u'LegalCopyright', u'Copyright 2017 free-tutorials.org.'),
StringStruct(u'LegalTrademarks', u'tk_app is a registered trademark of catafest.'),])
]),
VarFileInfo([VarStruct(u'Translation', [0x409, 1200])])
]
)Now you can use this command for tk_app.py and version.txt files from the C:\Python27 folder:
pyinstaller.exe --onefile --windowed --version-file=..\version.txt ..\tk_app.pyLet's see this info into the executable file:

If you wand to change the icon then you need to add the --icon=tk_app.ico, where tk_app.ico is the new icon of the executable.



Categories: FLOSS Project Planets

Catalin George Festila: Python Qt4 - part 001.

Planet Python - Sat, 2017-07-22 07:11
Today I started with PyQt4 and python version :
Python 2.7.13 (v2.7.13:a06454b1afa1, Dec 17 2016, 20:42:59) [MSC v.1500 32 bit (Intel)] on win32To install PyQt4 I used this link to take the executable named: PyQt4-4.11.4-gpl-Py2.7-Qt4.8.7-x32.exe.
The name of this executable shows us: can be used with python 2.7.x versions and come with Qt4.8.7 for our 32 bit python.
I start with a default Example class to make a calculator interface with PyQt4.
This is my example:
#!/usr/bin/python
# -*- coding: utf-8 -*-

import sys
from PyQt4 import QtGui

"""
Qt.Gui calculator example
"""

class Example(QtGui.QWidget):

def __init__(self):
super(Example, self).__init__()

self.initUI()

def initUI(self):
title = QtGui.QLabel('Title')
titleEdit = QtGui.QLineEdit()
grid = QtGui.QGridLayout()
grid.setSpacing(10)

grid.addWidget(title, 0, 0)

grid.addWidget(titleEdit,0,1,1,4)

self.setLayout(grid)

names = ['Cls', 'Bck', 'OFF',
'/', '.', '7', '8',
'9', '*', 'SQR', '3',
'4', '5', '-', '=',
'0', '1', '2', '+']

positions = [(i,j) for i in range(1,5) for j in range(0,5)]

for position, name in zip(positions, names):

if name == '':
continue
button = QtGui.QPushButton(name)
grid.addWidget(button, *position)

self.move(300, 250)
self.setWindowTitle('Calculator')
self.show()

def main():
app = QtGui.QApplication(sys.argv)
ex = Example()
sys.exit(app.exec_())

if __name__ == '__main__':
main()The example is simple.
First you need a QGridLayout - this make a matrix.
I used labels, line edit and buttons all from QtGui: QLabel, QLineEdit and QPushButton.
First into this matrix - named grid is: Title and edit area named titleEdit.
This two is added to the grid - matrix with addWidget.
The next step is to put all the buttons into one array.
This array will be add to the grid matrix with a for loop.
To make this add from array to matrix I used the zip function.
The zip function make an iterator that aggregates elements from each of the iterables.
Also I set the title to Calculator with setWindowTitle.
I have not implemented the part of the events and the calculation.
The main function will start the interface by using the QApplication.
The goal of this tutorial was the realization of the graphical interface with PyQt4.
This is the result of my example:

Categories: FLOSS Project Planets

Catalin George Festila: The pyquery python module.

Planet Python - Sat, 2017-07-22 07:11
This tutorial is about pyquery python module and python 2.7.13 version.
First I used pip command to install it.
C:\Python27>cd Scripts

C:\Python27\Scripts>pip install pyquery
Collecting pyquery
Downloading pyquery-1.2.17-py2.py3-none-any.whl
Requirement already satisfied: lxml>=2.1 in c:\python27\lib\site-packages (from pyquery)
Requirement already satisfied: cssselect>0.7.9 in c:\python27\lib\site-packages (from pyquery)
Installing collected packages: pyquery
Successfully installed pyquery-1.2.17I try to install with pip and python 3.4 version but I got errors.
The development team tells us about this python module:
pyquery allows you to make jquery queries on xml documents. The API is as much as possible the similar to jquery. pyquery uses lxml for fast xml and html manipulation.
Let's try a simple example with this python module.
The base of this example is find links by html tag.
from pyquery import PyQuery

seeds = [
'https://twitter.com',
'http://google.com'
]

crawl_frontiers = []

def start_crawler():
crawl_frontiers = crawler_seeds()

print(crawl_frontiers)

def crawler_seeds():
frontiers = []
for index, seed in enumerate(seeds):
frontier = {index: read_links(seed)}
frontiers.append(frontier)

return frontiers

def read_links(seed):
crawler = PyQuery(seed)
return [crawler(tag_a).attr("href") for tag_a in crawler("a")]

start_crawler()The read_links function take links from seeds array.
To do that, I need to read the links and put in into another array crawl_frontiers.
The frontiers array is used just for crawler process.
Also this simple example allow you to understand better the arrays.
You can read more about this python module here .
Categories: FLOSS Project Planets

Guest Post: Retired From KDE, by Paul Adams

Planet KDE - Sat, 2017-07-22 05:33

Hey Community!

Long time no see, huh? Yes, I neglected my blog and as such didn't post anything since Akademy 2014... Interestingly this is the last one where my dear Paul Adams held a famous talk. Talk he is referring to in his latest piece. Since his blog aggregation to Planet KDE is broken, I thought it would be a good idea to relay it on my own blog to give it more exposure. It is reproduced below, if you want to read it in its original form, click through the title.

Paul, the mic is yours now!

Retired From KDE, by Paul Adams

Many of you reading this are probably already aware, long-time maintainer of glibc Roland McGrath has recently retired from maintaining that project. Inspired by his words, I wanted to say a few things about why I no longer contribute to KDE; a project I “retired” from some time ago now.

Recently two very good friends of mine, both long-term KDE contributors, inquired if I was going to be attending this year’s Akademy (the annual KDE conference). Neither were particularly surprised when I said I wasn’t.

I was surprised that they asked.

Getting Into KDE

My first experiences of KDE were many moons ago; sometime in the very early 00s I guess. I had installed Linux on an old machine and was not particularly enjoying the desktop experience.

There wasn’t a desktop.

I cannot remember which distro this was. It had come off some magazine’s cover CD. There was X. And the tiling window manager which allowed me to fill my screen with x-term. This, for a long time, was just how I got stuff done. Emacs, Mutt, Lynx and some weird terminal-based MP3 player were my jam.

Some time later I was reading another magazine (Linux Format?) and it had a review of a recent beta of KDE 2. The sources were included on the cover CD. KDE looked kinda nice. Less boxy and purple than the only other *nix desktop I had seen, CDE. Until I finished my undergraduate degree KDE was my go-to desktop.

Getting Deeper

For a while I had reverted to using a tiling window manager and a screenful of x-term. This was just a convenient way for me to get through my PhD and my day job.

During my PhD I was studying Free Software community productivity metrics. I was also working on research into software quality funded by the European Commission. KDE eV (the governance body1 for KDE) was also taking part in that project. At this time KDE was almost ready to release KDE 4. It was an exciting time to get involved.

So I installed whatever the Debian stable KDE desktop (3.1021933933923932) of the time and really enjoyed the experience. Having rediscovered my love for KDE and having met some of the active community, I dived in deeper.

KDE became high on my list of projects to study during my PhD. The community was going through major changes: not only was KDE 4 on its way in, but KDE SVN was on its way out.

Gitting Ready For Change

Around 2006 I discovered Ade de Groot’s tool for visualising contributions to SVN; it was part of the English Breakfast Network3. His version of this tool utilised Python’s SVN bindings to read the repo data. Git instinct told me this tool would work faster if it parsed SVN logs rather than read the repo data through a library. I turned out to be right and this was a formative moment in my career.

I created a generic SVN log parser for use by this visualisation tool and used the same parser for other purposes; mostly other visualisations and data plotting. The ultimate aim was to expose to the KDE community what we could learn about social interactions within the community from, arguably, its most important communication tool: the version control system.

KDE SVN was4 truly enormous. It was pretty much the largest SVN repo in the wild. One very large central repo which represented the entire body of KDE code/artefacts. Around this time the strains of using such a repo with such a huge (and growing) community were prompting discussion about distributed VCS.

These were remarkably mature and structured discussions. Git was, by no means, a foregone conclusion. Other distributed VCS were given headroom and this was the first (and, basically, last) time I played with Mercurial and Bazaar. The discussions were, for the most part, very technical. I raised my voice to talk about the potential social impacts of switching from SVN to distributed VCS. Any distributed VCS.

Joining KDE eV

I spoke at Akademy and other KDE events (including the KDE 4 launch at the Googleplex) about the research I was doing; either my PhD or the EC-funded stuff. I blogged. I dented5. My work was positively received and gearheads would actively reach out to me for more-detailed analysis of their corner of KDE.

I was encouraged to join KDE eV and I did. Given that I had made precisely 0 code contributions6 to KDE this, to me, felt like an achievement.

Since day one of my involvement, KDE eV had somewhat of an identity crisis. It was really not 100% clear what it did… but anyone who had been involved with KDE for more than 6 months was highly encouraged to join. Before long it had become bloated; lots of members contributing almost nothing and the few people wanting to do something not getting enough support to do it.

KDE has switched to Git and the social changes were a-happenin’. The KDE project was starting to lose its social cohesion. Post KDE 4.0-release blues, the switch to GIT and a lack of care from KDE eV all contributed here. Other things, too. No one thing started the KDE community’s cohesion degradation. But we felt it. We even went though a rebranding… KDE was not a desktop project, it had become a suite of projects and the desktop was just one of them.

KDE had evolved and I had not.

Cohesion Degredation?

One of the metrics I worked on during my PhD was a simple use of graph theory to measure how well-connected a community is. The contribution I made here was intriguing: as project get bigger they become less cohesive, but through careful community management, luck and clever structure, KDE avoided this.

The last time I properly attended Akademy (the KDE community conference) was back in 2014. I’d been frustrated for some time with my inability to drive home the message that the switch to GIT had o be managed properly. I’d been frustrated that nobody seemed to have noticed that my warnings were coming true.

So I gave a talk that year.

Deep down, I knew this was my last public outing on behalf of KDE. It was. After my talk a lot of people came up to discuss the mic I had just dropped. But as the days and weeks passed after the event, the message disappeared. And so did I.

So Why Are You Telling Us This Now?

This year’s KDE conference starts tomorrow. Two of my all-time best buddy KDE community members reached out to see if I was turning up.

They knew I wasn’t.

While we briefly reminisced by email, one of them pointed out that my talk from 2014 had recently come up in conversation on a KDE mailing list. That, 3 years later, the talk was being used as part of a great discussion about change in the project.

I’m really not sure what my emotion about that was. But, I did not feel compelled to join the discussion. I did not feel a need to remind people about what I was trying to achieve all that time ago. Nope. Instead, I went and pushed some changes to a core plan I had been working on for Habitat, the new home for my free time.

The Thanks

To all my friends in KDE:

Enjoy Akademy. Enjoy the opportunity to do some navel gazing. Enjoy the food, the drinks, the sun. Hack. Break shit and put it back together again. Remind yourselves of why KDE is special. Remind yourselves of why it is important. Very important.

I thank you all for the time we spent together.

We were all part of the solution.

Footnotes
  1. Countdown to flamewar… 3… 2… 1… I know many will object to me calling KDE eV a “governance body” but, no matter how you cut it, that is what it is. At least it should be, imo. 

  2. There were approximately this number of KDE 3 releases. 

  3. Is the EBN still a thing? 

  4. Is? 

  5. Is identi.ca still a thing? 

  6. Thanks to the EBN, I did actually fix a spelling error in a comment in a .h file for Marble7

  7. This makes me a true C++ h4xX0r, right? 

  • otherwise throw away no feature branches needed when:
  • focused team
  • effort predictability

experiments and collaboration implies quantum effect branches

in any case lifetime upper bound

Categories: FLOSS Project Planets

Junichi Uekawa: asterisk fails to start on my raspberry pi.

Planet Debian - Sat, 2017-07-22 04:02
asterisk fails to start on my raspberry pi. I don't quite understand what the error message is but systemctl tells me there was a timeout. Don't know which timeout it hits.

Categories: FLOSS Project Planets

Full Stack Python: How to Make Phone Calls in Python

Planet Python - Sat, 2017-07-22 00:00

Good old-fashioned phone calls remain one of the best forms of communication despite the slew of new smartphone apps that have popped up over the past several years. With just a few lines of Python code plus a web application programming interface we can make and receive phone calls from any application.

Our example calls will say a snippet of text and put all incoming callers into a recorded conference call. You can modify the instructions using Twilio's TwiML verbs when you perform different actions in your own application's phone calls.

Our Tools

You should have either Python 2 or 3 installed to build this application. Throughout the post we will also use:

You can snag all the open source code for this tutorial in the python-twilio-example-apps GitHub repository under the no-framework/phone-calls directory. Use and copy the code for your own applications. Everything in that repository and in this blog post are open source under the MIT license.

Install App Dependencies

Our application will use the Twilio Python helper library to create an HTTP POST request to Twilio's API. The Twilio helper library is installable from PyPI into a virtual environment. Open your terminal and use the virtualenv command to create a new virtualenv:

virtualenv phoneapp

Invoke the activate script within the virtualenv bin/ directory to make this virtualenv the active Python executable. Note that you will need to perform this step in every terminal window that you want the virtualenv to be active.

source phoneapp/bin/activate

The command prompt will change after activating the virtualenv to something like (phoneapp) $.

Next use the pip command to install the Twilio Python package into the virtualenv.

pip install twilio==5.7.0

We will have the required dependency ready for project as soon as the installation script finishes. Now we can write and execute Python code to dial phone numbers.

Our Python Script

Create a new file named phone_calls.py and copy or type in the following lines of code.

from twilio.rest import TwilioRestClient # Twilio phone number goes here. Grab one at https://twilio.com/try-twilio # and use the E.164 format, for example: "+12025551234" TWILIO_PHONE_NUMBER = "" # list of one or more phone numbers to dial, in "+19732644210" format DIAL_NUMBERS = ["",] # URL location of TwiML instructions for how to handle the phone call TWIML_INSTRUCTIONS_URL = \ "http://static.fullstackpython.com/phone-calls-python.xml" # replace the placeholder values with your Account SID and Auth Token # found on the Twilio Console: https://www.twilio.com/console client = TwilioRestClient("ACxxxxxxxxxx", "yyyyyyyyyy") def dial_numbers(numbers_list): """Dials one or more phone numbers from a Twilio phone number.""" for number in numbers_list: print("Dialing " + number) # set the method to "GET" from default POST because Amazon S3 only # serves GET requests on files. Typically POST would be used for apps client.calls.create(to=number, from_=TWILIO_PHONE_NUMBER, url=TWIML_INSTRUCTIONS_URL, method="GET") if __name__ == "__main__": dial_numbers(DIAL_NUMBERS)

There are a few lines that you need to modify in this application before it will run. First, insert one or more phone numbers you wish to dial into the DIAL_NUMBERS list. Each one should be a string, separated by a comma. For example, DIAL_NUMBERS = ["+12025551234", "+14155559876", "+19735551234"].

Next, TWILIO_PHONE_NUMBER and the Account SID and Authentication Token, found on the client = TwilioRestClient("ACxxxxxxxxxx", "yyyyyyyyyy") line, need to be set. We can get these values from the Twilio Console.

In your web browser go to the Twilio website and sign up for a free account or sign into your existing Twilio account.

Copy the Account SID and Auth Token from the Twilio Console and paste them into your application's code:

The Twilio trial account allows you to dial and receive phone calls to your own validated phone number. To handle calls from any phone number then you need to upgrade your account (hit the upgrade button on the top navigation bar).

Once you are signed into your Twilio account, go to the manage phone numbers screen. On this screen you can buy one or more phone numbers or click on an existing phone number in your account to configure it.

After clicking on a number you will reach the phone number configuration screen. Paste in the URL with TwiML instructions and change the dropdown from "HTTP POST" to "HTTP GET". In this post we'll use http://static.fullstackpython.com/phone-calls-python.xml, but that URL can be more than just a static XML file.

The power of Twilio really comes in when that URL is handled by your web application so it can dynamically respond with TwiML instructions based on the incoming caller number or other properties stored in your database.

Under the Voice webhook, paste in http://static.fullstackpython.com/phone-calls-python.xml and change the drop-down to the right from "HTTP POST" to "HTTP GET". Click the "Save" button at the bottom of the screen.

Now try calling your phone number. You should hear the snippet of text read by the Alice voice and then you will be placed into a conference call. If no one else calls the number then hold music should be playing.

Making Phone Calls

We just handled inbound phone calls to our phone number. Now it's time to dial outbound phone calls. Make sure your phone_calls.py file is saved and that your virtualenv is still activated and then execute the script:

python phone_calls.py

In a moment all the phone numbers you write in the DIAL_NUMBERS list should light up with calls. Anyone that answers will hear our message read by the "Alice" voice and then they'll be placed together into a recorded conference call, just like when someone dials into the number.

Here is my inbound phone call:

Not bad for just a few lines of Python code!

Next Steps

Now that we know how to make and receive phone calls from a Twilio number that follows programmatic instructions we can do a whole lot more in our applications. Next you can use one of these tutorials to do more with your phone number:

Questions? Contact me via Twitter @fullstackpython or @mattmakai. I'm also on GitHub as mattmakai.

See something wrong in this post? Fork this page's source on GitHub and submit a pull request.

Categories: FLOSS Project Planets

Justin Mason: Links for 2017-07-21

Planet Apache - Fri, 2017-07-21 19:58
  • awslabs/aws-ec2rescue-linux

    Amazon Web Services Elastic Compute Cloud (EC2) Rescue for Linux is a python-based tool that allows for the automatic diagnosis of common problems found on EC2 Linux instances. Most of the modules appear to be log-greppers looking for common kernel issues.

    (tags: ec2 aws kernel linux ec2rl ops)

Categories: FLOSS Project Planets

A small Update

Planet KDE - Fri, 2017-07-21 17:00

I planned on writing about the Present extension this week, but I’ll postpone this since I’m currently strongly absorbed into finding the last rough edges of a first patch I can show off. I then hope to get some feedback on this from other developers in the xorg-devel mailing list.

Another reason is that I stalled my work on the Present extension for now and try to get first my Xwayland code working. My mentor Daniel recommended that to me since the approach I pursued in my work on Present might be more difficult than I first assessed. At least it is something similar to what other way more experienced developers than myself tried in the past and weren’t able to do according to Daniel. My idea was to make Present flip per CRTC only, but this would clash with Pixmaps being linked to the whole screen only. There are no Pixmaps only for CRTCs in X.

On the other hand when accepting the restriction of only being able to flip one window at a time my code already works quite good. The flipping is smooth and at least in a short test also improved the frame rate. But the main problem I had and still to some degree have, is that stopping the flipping can fail. The reason seems to be that the Present extension sets always the Screen Pixmap on flips. But when I test my work with KWin, it drives Xwayland in rootless mode, i.e. without a Screen Pixmap and only the Window Pixmaps. I’m currently looking into how to circumvent this in Xwayland. I think it’s possible, but I need to look very carefully on how to change the process in order to not forget necessary cleanups on the flipped Pixmaps. I hope though that I’m able to solve these issues already this weekend and then get some feedback on the xorg-devel mailing list.

As always you can find my latest work on my working branch on GitHub.

Categories: FLOSS Project Planets

Drupal Commerce: See what’s new in Drupal Commerce 2.0-rc1

Planet Drupal - Fri, 2017-07-21 16:34

Eight months ago we launched the first beta version of Commerce 2.x for Drupal 8. Since then we’ve made 304 code commits by 58 contributors, and we've seen dozens of attractive, high-performing sites go live. We entered the release candidate phase this month with the packaging of Commerce 2.0-rc1 (release notes), the final part of our long and fruitful journey to a full 2.0.

Introducing a new Promotions UI:

Some of the most exciting updates this Summer center around our promotions system. This work represents a huge leap forward from Commerce 1.x, as we've made promotions first class citizens in core. They power a variety of discount types and coupons, and now that they are in core we can ensure the systems are designed to look and work well on both the front end and back end.

Read on to learn more about what's new in promotions, payment, taxes, and more...

Categories: FLOSS Project Planets

Sandipan Dey: SIR Epidemic model for influenza A (H1N1): Modeling the outbreak of the pandemic in Kolkata, West Bengal, India in 2010 (Simulation in Python & R)

Planet Python - Fri, 2017-07-21 13:47
This appeared as a project in the edX course DelftX: MathMod1x Mathematical Modelling Basics and the project report can be found here. This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Summary In this report, the spread of the pandemic influenza A (H1N1) that had an outbreak in Kolkata, West Bengal, India, 2010 is going to be simulated. … Continue reading SIR Epidemic model for influenza A (H1N1): Modeling the outbreak of the pandemic in Kolkata, West Bengal, India in 2010 (Simulation in Python & R)
Categories: FLOSS Project Planets

Glassdimly tech Blog: Parse HTML in Drupal 8 to Get Attributes

Planet Drupal - Fri, 2017-07-21 12:50

Often one finds oneself needing to parse HTML. Back in the day, we used regexes, and smoked inside. We didn't even know about caveman coders back then. Later, we'd use SimpleHtmlDom and mostly just swore when things didn't quite work as expected. Now, we use PHP's DomDocument, and in Drupal we create them using Drupal's HTML utility.

Categories: FLOSS Project Planets

Continuum Analytics News: Galvanize Capstone Series: Geolocation of Twitter Users

Planet Python - Fri, 2017-07-21 12:45
Developer Blog Monday, July 24, 2017 Shawn Terryah Guest Blogger

This post is part of our Galvanize Capstone featured projects. This post was written by Shawn Terryah and posted here with his permission. 

In June of this year, I completed the Data Science Immersive program at Galvanize in Austin, TX. The final few weeks of the program were dedicated to individual capstone projects of our choosing. I have a background in infectious disease epidemiology and, when I was in graduate school, there was a lot of interest in using things like Google search queries, Facebook posts, and tweets to try to track the spread of infectious diseases in real-time. One of the limitations to using Twitter is that only about 1% of tweets are geotagged with the tweet's location, which can make much of this work very difficult. For my capstone project, I chose to train a model using the 1% of tweets that are geotagged to predict the US city-level location of Twitter users who do not geotag their tweets. This is how I did it:

Streaming Training Tweets Using Tweepy

Tweepy is a Python wrapper for the Twitter API that allowed me to easily collect tweets in real-time and store them in MongoBD. The script below was run on an Amazon Web Services EC2 instance with 200 GiB of storage for roughly two weeks using tmux. By filtering based on location, I only received geotagged tweets with a known location to use for training the model.

import tweepy import json from pymongo import MongoClient class StreamListener(tweepy.StreamListener):   """tweepy.StreamListener is a class provided by tweepy used to access   the Twitter Streaming API to collect tweets in real-time.   """   def on_connect(self):   """Called when the connection is made"""   print("You're connected to the streaming server.")   def on_error(self, status_code):   """This is called when an error occurs"""   print('Error: ' + repr(status_code))   return False   def on_data(self, data):   """This will be called each time we receive stream data"""   client = MongoClient() # I stored the tweet data in a database called 'training_tweets' in MongoDB, if   # 'training_tweets' does not exist it will be created for you.   db = client.training_tweets   # Decode JSON   datajson = json.loads(data)   # I'm only storing tweets in English. I stored the data for these tweets in a collection   # called 'training_tweets_collection' of the 'training_tweets' database. If   # 'training_tweets_collection' does not exist it will be created for you.   if "lang" in datajson and datajson["lang"] == "en":   db.training_tweets_collection.insert_one(datajson) if __name__ == "__main__":   # These are provided to you through the Twitter API after you create a account   consumer_key = "your_consumer_key"   consumer_secret = "your_consumer_secret"   access_token = "your_access_token"   access_token_secret = "your_access_token_secret"   auth1 = tweepy.OAuthHandler(consumer_key, consumer_secret)   auth1.set_access_token(access_token, access_token_secret)   # LOCATIONS are the longitude, latitude coordinate corners for a box that restricts the   # geographic area from which you will stream tweets. The first two define the southwest   # corner of the box and the second two define the northeast corner of the box.   LOCATIONS = [-124.7771694, 24.520833, -66.947028, 49.384472, # Contiguous US   -164.639405, 58.806859, -144.152365, 71.76871, # Alaska   -160.161542, 18.776344, -154.641396, 22.878623] # Hawaii   stream_listener = StreamListener(api=tweepy.API(wait_on_rate_limit=True))   stream = tweepy.Stream(auth=auth1, listener=stream_listener)   stream.filter(locations=LOCATIONS) Feature Selection, Feature Engineering, and Data Cleaning Feature Selection

At the end of two weeks, I had collected data from over 21 million tweets from over 15,500 cities. In addition to the tweet itself, the API provides a number of other fields. These are the fields I used to build the model:

FIELD TYPE DESCRIPTION

'text' 

String The actual UTF-8 text of the tweet 'country_code' String Country code representing the country that tweet was sent from 'full_name' String Full representation of the place the tweet was sent from. For the US, often in the form of 'City, State,' but not always. 'coordinates' Array of Array of Array of Float A series of longitude and latitude points that define a bounding box from where the tweet was sent 'screen_name'

String

The screen name chosen by the user

'favourites_count'

Int

The number of tweets this user has liked in the account’s lifetime

'followers_count'

Int

The number of followers the user currently has

'statuses_count'

Int

The number of tweets (including retweets) issued by the user

'friends_count'

Int

The number of users the user is following (AKA their “followings”)

'listed_count'

Int

The number of public lists the user is a member of

'location'

String

The user-defined location for the account’s profile, which is not necessarily a geographic location (e.g., 'the library,' 'watching a movie,' 'in my own head,' 'The University of Texas') (Nullable) 'created_at'

String

The UTC datetime of when the tweet was issued

'utc_offset'

Int

The offset from GMT/UTC in seconds based the Time Zone that the user selects for their profile (Nullable)

To pull these fields I first exported the data from MongoDB as a json file:

$ mongoexport --db training_tweets --collection training_tweets_collection --out training_tweets.json

I then converted training_tweets.json to a csv file and pulled only the fields from the table above:

import json import unicodecsv as csv # unicodecsv ensures that emojis are preserved def tweets_json_to_csv(file_list, csv_output_file):   '''   INPUT: list of JSON files   OUTPUT: single CSV file   This function takes a list of JSON files containing tweet data and reads   each file line by line, parsing the revelent fields, and writing it to a CSV file.   '''   count = 0   f = csv.writer(open(csv_output_file, "wb+"))   # Column names   f.writerow(['tweet', # relabeled: the API calls this 'text'   'country_code',   'geo_location', # relabeled: the API calls this 'full_name'   'bounding_box',   'screen_name',   'favourites_count',   'followers_count',   'statuses_count',   'friends_count',   'listed_count',   'user_described_location', # relabeled: the API calls this 'location'   'created_at',   'utc_offset'])   for file_ in file_list:   with open(file_, "r") as r:   for line in r:   try:   tweet = json.loads(line)   except:   continue   if tweet and tweet['place'] != None:   f.writerow([tweet['text'],   tweet['place']['country_code'],   tweet['place']['full_name'],   tweet['place']['bounding_box']['coordinates'],   tweet['user']['screen_name'],   tweet['user']['favourites_count'],   tweet['user']['followers_count'],   tweet['user']['statuses_count'],   tweet['user']['friends_count'],   tweet['user']['listed_count'],   tweet['user']['location'],   tweet['created_at'],   tweet['user']['utc_offset']])   count += 1   # Status update   if count % 100000 == 0:   print 'Just stored tweet #{}'.format(count) if __name__ == "__main__":   tweets_json_to_csv(['training_tweets.json'], 'training_tweets.csv')

From this point forward, I was able to read and manipulate the csv file as a pandas DataFrame:

import pandas as pd df = pd.read_csv('training_tweets.csv', encoding='utf-8') # 'utf-8' ensures that emojis are preserved Feature Engineering

'centroid'

Instead of providing the exact latitude and longitude of the tweet, the Twitter API provides a polygonal bounding box of coordinates that encloses the place where the tweet was sent. To plot the tweets on a map and perform other functions, I found the centroid of each bounding box:

def find_centroid(row):   '''   Helper function to return the centroid of a polygonal bounding box of longitude, latitude coordinates   '''   try:   row_ = eval(row)   lst_of_coords = [item for sublist in row_ for item in sublist]   longitude = [p[0] for p in lst_of_coords]   latitude = [p[1] for p in lst_of_coords]   return (sum(latitude) / float(len(latitude)), sum(longitude) / float(len(longitude)))   except:   return None # Create a new column called 'centroid' df['centroid'] = map(lambda row: find_centroid(row), df['bounding_box'])

Using the centroids, I was able to plot the training tweets on a map using the Matplotlib Basemap Toolkit. Below is the code for generating a plot of the tweets that originated in or around the contiguous US. The same was also done for Alaska and Hawaii.

from mpl_toolkits.basemap import Basemap import matplotlib.pyplot as plt def plot_contiguous_US_tweets(lon, lat, file_path):   '''   INPUT: List of longitudes (lon), list of latitudes (lat), file path to save the plot (file_path)   OUTPUT: Plot of tweets in the contiguous US.   '''   map = Basemap(projection='merc',   resolution = 'h',   area_thresh = 10000,   llcrnrlon=-140.25, # lower left corner longitude of contiguous US   llcrnrlat=5.0, # lower left corner latitude of contiguous US   urcrnrlon=-56.25, # upper right corner longitude of contiguous US   urcrnrlat=54.75) # upper right corner latitude of contiguous US   x,y = map(lon, lat)   map.plot(x, y, 'bo', markersize=2, alpha=.3)   map.drawcoastlines() map.drawstates() map.drawcountries()   map.fillcontinents(color = '#DAF7A6', lake_color='#a7cdf2')   map.drawmapboundary(fill_color='#a7cdf2')   plt.gcf().set_size_inches(15,15)   plt.savefig(file_path, format='png', dpi=1000)

The resulting plots for the contiguous US, Alaska, and Hawaii were joined in Photoshop and are shown on the left. The plot on the right is from the Vulcan Project at Purdue University and shows carbon footprints in the contiguous US. As you can see, the plots are very similiar, providing an indication that streaming tweets in this way provides a representative sample of the US population in terms of geographic location.

training_tweets.png

'tweet_time_secs'

The field 'created_at' is the UTC datetime of when the tweet was issued. Here is an example:

u'created_at': u'Sun Apr 30 01:23:27 +0000 2017'

I was interested in the UTC time, rather than the date, that a tweet was sent, because there are likely geographic differences in these values. I therefore parsed this information from the time stamp and reported this value in seconds.

from dateutil import parser def get_seconds(row):   '''   Helper function to parse time from a datetime stamp and return the time in seconds   '''   time_str = parser.parse(row).strftime('%H:%M:%S')   h, m, s = time_str.split(':')   return int(h) * 3600 + int(m) * 60 + int(s) # Create a new column called 'tweet_time_secs' df['tweet_time_secs'] = map(lambda row: get_seconds(row), df['created_at']) Data Cleaning

Missing Data

Both 'user_described_location' (note: the API calls this field 'location') and 'utc_offset' are nullable fields that frequently contain missing values. When this was the case, I filled them in with indicator values:

df['user_described_location'].fillna('xxxMISSINGxxx', inplace=True) df['utc_offset'].fillna(999999, inplace=True)

Additionally, a small percentage of tweets contained missing values for 'country_code.' When this or other information was missing, I chose to drop the entire row:

df.dropna(axis=0, inplace=True)

Tweets Outside the US

The bounding box I used to stream the tweets included areas outside the contiguous US. Since the goal for this project was to predict the US city-level location of Twitter users, I relabeled tweets that originated from outside the US. For these tweets 'country_code' was relabeled to 'NOT_US' and 'geo_location' (note: the API calls this field 'full_name') was relabeled to 'NOT_IN_US, NONE':

def relabel_geo_locations(row):   '''   Helper function to relabel the geo_locations from tweets outside the US   to 'NOT_IN_US, NONE'   '''   if row['country_code'] == 'US':   return row['geo_location']   else:   return 'NOT_IN_US, NONE' # Relabel 'country_code' for tweets outside the US to 'NOT_US'   df['country_code'] = map(lambda cc: cc if cc == 'US' else 'NOT_US', df['country_code']) # Relabel 'geo_location' for tweets outside the US to 'NOT_IN_US, NONE' df['geo_location'] = df.apply(lambda row: relabel_geo_locations(row), axis=1)

Tweets Lacking a 'City, State' Location Label

Most tweets that originated in the US had a 'geo_location' in the form of 'City, State' (e.g., 'Austin, TX'). For some tweets, however, the label was less specific and in the form of 'State, Country' (e.g., 'Texas, USA') or, even worse, in the form of a totally unique value (e.g., 'Tropical Tan'). Since this data was going to be used to train the model, I wanted to have as granular of a label as possible for each tweet. Therefore, I only kept tweets that were in the form of 'City, State' and dropped all others:

def geo_state(row):   '''   Helper function to parse the state code for 'geo_location' labels   in the form of 'City, State'   '''   try:   return row['geo_location'].split(', ')[1]   except:   return None # Create a new column called 'geo_state' df['geo_state'] = df.apply(lambda row: geo_state(row),axis=1) # The 'geo_state' column will contain null values for any row where 'geo_location' was not # comma separated (e.g., 'Tropical Tan'). We drop those rows here: df.dropna(axis=0, inplace=True) # list of valid geo_state labels. "NONE" is the label I created for tweets outside the US states = ["AL", "AK", "AZ", "AR", "CA", "CO", "CT", "DC", "DE", "FL", "GA", "HI", "ID",   "IL", "IN", "IA", "KS", "KY", "LA", "ME", "MD", "MA", "MI", "MN", "MS", "MO",   "MT", "NE", "NV", "NH", "NJ", "NM", "NY", "NC", "ND", "OH", "OK", "OR", "PA",   "RI", "SC", "SD", "TN", "TX", "UT", "VT", "VA", "WA", "WV", "WI", "WY", "NONE"] # Keep only rows with a valid geo_state, among others this will drop all rows that had # a 'geo_location' in the form of 'State, Country' (e.g., 'Texas, USA') df = df[df['geo_state'].isin(states)]

Aggregating the Tweets by User

During the two week collection period, many users tweeted more than once. To prevent potential leakage, I grouped the tweets by user ('screen_name'), then aggregated the remaining fields.

from collections import Counter # aggregation functions agg_funcs = {'tweet' : lambda x: ' '.join(x),   'geo_location' : lambda x: Counter(x).most_common(1)[0][0],   'geo_state' : lambda x: Counter(x).most_common(1)[0][0],   'user_described_location': lambda x: Counter(x).most_common(1)[0][0],   'utc_offset': lambda x: Counter(x).most_common(1)[0][0],   'geo_country_code': lambda x: Counter(x).most_common(1)[0][0],   'tweet_time_secs' : np.median,   'statuses_count': np.max,   'friends_count' :np.mean,   'favourites_count' : np.mean,   'listed_count' : np.mean,   'followers_count' : np.mean} # Groupby 'screen_name' and then apply the aggregation functions in agg_funcs df = df.groupby(['screen_name']).agg(agg_funcs).reset_index()

Remapping the Training Tweets to the Closest Major City

Since the training tweets came from over 15,500 cities, and I didn't want to do a 15,500-wise classification problem, I used the centroids to remap all the training tweets to their closest major city from a list of 378 major US cities based on population (plus the single label for tweets outside the US, which used Toronto's coordinates). This left me with a 379-wise classification problem. Here is a plot of those major cities and the code to remap all the US training tweets to their closest major US city:

379_cities.png

import numpy as np import pickle def load_US_coord_dict():   '''   Input: n/a   Output: A dictionary whose keys are the location names ('City, State') of the   378 US classification locations and the values are the centroids for those locations   (latitude, longittude)   '''   pkl_file = open("US_coord_dict.pkl", 'rb')   US_coord_dict = pickle.load(pkl_file)   pkl_file.close()   return US_coord_dict def find_dist_between(tup1, tup2):   '''   INPUT: Two tuples of latitude, longitude coordinates pairs for two cities   OUTPUT: The distance between the cities   '''   return np.sqrt((tup1[0] - tup2[0])**2 + (tup1[1] - tup2[1])**2) def closest_major_city(tup):   '''   INPUT: A tuple of the centroid coordinates for the tweet to remap to the closest major city   OUTPUT: String, 'City, State', of the city in the dictionary 'coord_dict' that is closest to the input city   '''   d={}   for key, value in US_coord_dict.iteritems():   dist = find_dist_between(tup, value)   if key not in d:   d[key] = dist   return min(d, key=d.get) def get_closest_major_city_for_US(row):   ''' Helper function to return the closest major city for US users only. For users   outside the US it returns 'NOT_IN_US, NONE'   '''   if row['geo_location'] == 'NOT_IN_US, NONE':   return 'NOT_IN_US, NONE'   else:   return closest_major_city(row['centroid']) if __name__ == "__main__":   # Load US_coord_dict   US_coord_dict = load_US_coord_dict()   # Create a new column called 'closest_major_city'   df['closest_major_city'] = df.apply(lambda row: get_closest_major_city_for_US(row), axis=1)   Building the Predictive Model

The steps below were run on an Amazon Web Services r3.8xlarge EC2 instance with 244 GiB of memory. Here is a high-level overview of the final model:

High-level Overview of the Stacked Model

model_overview (1).png

Step 1: Load dependencies and prepare the cleaned data for model fitting import pandas as pd import numpy as np import nltk from nltk.tokenize import TweetTokenizer from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.svm import LinearSVC from sklearn.ensemble import RandomForestClassifier from sklearn.externals import joblib # Tokenizer to use for text vectorization def tokenize(tweet): tknzr = TweetTokenizer(strip_handles=True, reduce_len=True, preserve_case=False) return tknzr.tokenize(tweet) # Read cleaned training tweets file into pandas and randomize it df = pd.read_pickle('cleaned_training_tweets.pkl') randomized_df = df.sample(frac=1, random_state=111) # Split randomized_df into two disjoint sets half_randomized_df = randomized_df.shape[0] / 2 base_df = randomized_df.iloc[:half_randomized_df, :] # used to train the base classifiers meta_df = randomized_df.iloc[half_randomized_df:, :] # used to train the meta classifier # Create variables for the known the geotagged locations from each set base_y = base_df['closest_major_city'].values meta_y = meta_df['closest_major_city'].values Step 2: Train a base-level Linear SVC classifier on the user described locations # Raw text of user described locations base_location_doc = base_df['user_described_location'].values meta_location_doc = meta_df['user_described_location'].values # fit_transform a tf-idf vectorizer using base_location_doc and use it to transform meta_location_doc location_vectorizer = TfidfVectorizer(stop_words='english', tokenizer=tokenize, ngram_range=(1,2)) base_location_X = location_vect.fit_transform(base_location_doc.ravel()) meta_location_X = location_vect.transform(meta_location_doc) # Fit a Linear SVC Model with 'base_location_X' and 'base_y'. Note: it is important to use # balanced class weights otherwise the model will overwhelmingly favor the majority class. location_SVC = LinearSVC(class_weight='balanced') location_SVC.fit(base_location_X, base_y) # We can now pass meta_location_X into the fitted model and save the decision # function, which will be used in Step 4 when we train the meta random forest location_SVC_decsfunc = location_SVC.decision_function(meta_location_X) # Pickle the location vectorizer and the linear SVC model for future use joblib.dump(location_vectorizer, 'USER_LOCATION_VECTORIZER.pkl') joblib.dump(location_SVC, 'USER_LOCATION_SVC.pkl') Step 3: Train a base-level Linear SVC classifier on the tweets # Raw text of tweets base_tweet_doc = base_df['tweet'].values meta_tweet_doc = meta_df['tweet'].values # fit_transform a tf-idf vectorizer using base_tweet_doc and use it to transform meta_tweet_doc tweet_vectorizer = TfidfVectorizer(stop_words='english', tokenizer=tokenize) base_tweet_X = tweet_vectorizer.fit_transform(base_tweet_doc.ravel()) meta_tweet_X = tweet_vectorizer.transform(meta_tweet_doc) # Fit a Linear SVC Model with 'base_tweet_X' and 'base_tweet_y'. Note: it is important to use # balanced class weights otherwise the model will overwhelmingly favor the majority class. tweet_SVC = LinearSVC(class_weight='balanced') tweet_SVC.fit(base_tweet_X, base_y) # We can now pass meta_tweet_X into the fitted model and save the decision # function, which will be used in Step 4 when we train the meta random forest tweet_SVC_decsfunc = tweet_SVC.decision_function(meta_tweet_X) # Pickle the tweet vectorizer and the linear SVC model for future use joblib.dump(tweet_vectorizer, 'TWEET_VECTORIZER.pkl') joblib.dump(tweet_SVC, 'TWEET_SVC.pkl') Step 4: Train a meta-level Random Forest classifier # additional features from meta_df to pull into the final model friends_count = meta_df['friends_count'].values.reshape(meta_df.shape[0], 1) utc_offset = meta_df['utc_offset'].values.reshape(meta_df.shape[0], 1) tweet_time_secs = meta_df['tweet_time_secs'].values.reshape(meta_df.shape[0], 1) statuses_count = meta_df['statuses_count'].values.reshape(meta_df.shape[0], 1) favourites_count = meta_df['favourites_count'].values.reshape(meta_df.shape[0], 1) followers_count = meta_df['followers_count'].values.reshape(meta_df.shape[0], 1) listed_count = meta_df['listed_count'].values.reshape(meta_df.shape[0], 1) # np.hstack these additional features together add_features = np.hstack((friends_count, utc_offset, tweet_time_secs, statuses_count, favourites_count, followers_count, listed_count)) # np.hstack the two decision function variables from steps 2 & 3 with add_features meta_X = np.hstack((location_SVC_decsfunc, # from Step 2 above tweet_SVC_decsfunc, # from Step 3 above add_features)) # Fit Random Forest with 'meta_X' and 'meta_y' meta_RF = RandomForestClassifier(n_estimators=60, n_jobs=-1) meta_RF.fit(meta_X, meta_y) # Pickle the meta Random Forest for future use joblib.dump(meta_RF, 'META_RF.pkl') Testing the Model Collecting and Preparing a Fresh Data Set

A week after I collected the training data set, I collected a fresh data set to use to evaluate the model. For this, I followed the same data collection and preparation procedures as above with a few of exceptions: 1) I only ran the Tweepy script for 48 hours, 2) I removed any users from the evaluation data set that were in the training data set, and 3) I went back to the Twitter API and pulled the 200 most recent tweets for each user that remained in the data set. Remember, the goal for the model is to predict the US city-level location of Twitter users, not individual tweets; therefore, by giving the model a larger corpus of tweets for each user, I hoped to increase the model's accuracy. Here is the script for pulling the 200 most recent tweets for each user:

import tweepy import pandas as pd # these are provided to you through the Twitter API after you create a account consumer_key = "your_consumer_key" consumer_secret = "your_consumer_secret" access_token = "your_access_token" access_token_secret = "your_access_token_secret" count = 0 def get_200_tweets(screen_name): ''' Helper function to return a list of a Twitter user's 200 most recent tweets ''' auth = tweepy.OAuthHandler(consumer_key, consumer_secret) auth.set_access_token(access_key, access_secret) api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True) # Initialize a list to hold the user's 200 most recent tweets tweets_data = [] global count try: # make request for most recent tweets (200 is the maximum allowed per distinct request) recent_tweets = api.user_timeline(screen_name = screen_name, count=200) # save data from most recent tweets tweets_data.extend(recent_tweets) count += 1   # Status update if count % 1000 == 0: print 'Just stored tweets for user #{}'.format(count) except: count += 1 pass # pull only the tweets and encode them in utf-8 to preserve emojis list_of_recent_tweets = [''.join(tweet.text.encode("utf-8")) for tweet in tweets_data] return list_of_recent_tweets # Create a new column in evaluation_df called '200_tweets' evaluation_df['200_tweets'] = map(lambda x: get_200_tweets(x), evaluation_df['screen_name']) Making Predictions on the Fresh Data Set

To make predictions on evaluation_df, the script below was run on the same Amazon Web Services r3.8xlarge EC2 instance that was used to build the model:

import pandas as pd import numpy as np from sklearn.externals import joblib import nltk from nltk.tokenize import TweetTokenizer def tokenize(tweet): tknzr = TweetTokenizer(strip_handles=True, reduce_len=True, preserve_case=False) return tknzr.tokenize(tweet) class UserLocationClassifier: def __init__(self): ''' Load the stacked classifier's pickled vectorizers, base classifiers, and meta classifier ''' self.location_vectorizer = joblib.load('USER_LOCATION_VECTORIZER.pkl') self.location_SVC = joblib.load('USER_LOCATION_SVC.pkl') self.tweet_vectorizer = joblib.load('TWEET_VECTORIZER.pkl') self.tweet_SVC = joblib.load('TWEET_SVC.pkl') self.meta_RF = joblib.load('META_RF.pkl') def predict(self, df): ''' INPUT: Cleaned and properly formatted dataframe to make predictions for OUTPUT: Array of predictions ''' # Get text from 'user_described_location' column of DataFrame location_doc = df['user_described_location'].values # Convert the '200_tweets' column from a list to just a string of all tweets df.loc[:, '200_tweets'] = map(lambda x: ''.join(x), df['200_tweets']) # Get text from '200_tweets' column of DataFrame tweet_doc = df['200_tweets'].values # Vectorize 'location_doc' and 'tweet_doc' location_X = self.location_vectorizer.transform(location_doc.ravel()) tweet_X = self.tweet_vectorizer.transform(tweet_doc.ravel()) # Store decision functions for 'location_X' and 'tweet_X' location_decision_function = self.location_SVC.decision_function(location_X) tweet_decision_function = self.tweet_SVC.decision_function(tweet_X) # Get additional features to pull into the Random Forest friends_count = df['friends_count'].values.reshape(df.shape[0], 1) utc_offset = df['utc_offset'].values.reshape(df.shape[0], 1) tweet_time_secs = df['tweet_time_secs'].values.reshape(df.shape[0], 1) statuses_count = df['statuses_count'].values.reshape(df.shape[0], 1) favourites_count = df['favourites_count'].values.reshape(df.shape[0], 1) followers_count = df['followers_count'].values.reshape(df.shape[0], 1) listed_count = df['listed_count'].values.reshape(df.shape[0], 1) # np.hstack additional features together add_features = np.hstack((friends_count, utc_offset, tweet_time_secs, statuses_count, favourites_count, followers_count, listed_count)) # np.hstack the two decision function variables with add_features meta_X = np.hstack((location_decision_function, tweet_decision_function, add_features)) # Feed meta_X into Random Forest and make predictions return self.meta_RF.predict(meta_X) if __name__ == "__main__": # Load evaluation_df into pandas DataFrame evaluation_df = pd.read_pickle('evaluation_df.pkl') # Load UserLocationClassifier clf = UserLocationClassifier() # Get predicted locations predictions = clf.predict(evaluation_df) # Create a new column called 'predicted_location' evaluation_df.loc[:, 'predicted_location'] = predictions # Pickle the resulting DataFrame with the location predictions evaluation_df.to_pickle('evaluation_df_with_predictions.pkl') Plotting the Locations of Twitter Users on a Map Using Bokeh

Here are some examples of how the model performed on a few selected cities. For each of the maps shown below, the dots indicate the user's true location, while the title of the map indicates where the model predicted them to be. As you can see, for each city there is a tight cluster in and around the correct location, with only a handfull of extreme misses. Here is the code for generating these plots (note: the final plots shown here were constructed in Photoshop after first using the 'pan' and 'wheel_zoom' tools in Bokeh to capture screenshots of the contiguous US, Alaska, and Hawaii):

import pandas as pd import pickle from bokeh.plotting import figure, output_notebook, output_file, show from bokeh.tile_providers import STAMEN_TERRAIN output_notebook() from functools import partial from shapely.geometry import Point from shapely.ops import transform import pyproj # Web mercator bounding box for the US US = ((-13884029, -7453304), (2698291, 6455972)) x_range, y_range = US plot_width = int(900) plot_height = int(plot_width*7.0/12) def base_plot(tools='pan,wheel_zoom,reset',plot_width=plot_width, plot_height=plot_height, **plot_args): p = figure(tools=tools, plot_width=plot_width, plot_height=plot_height, x_range=x_range, y_range=y_range, outline_line_color=None, min_border=0, min_border_left=0, min_border_right=0, min_border_top=0, min_border_bottom=0, **plot_args) p.axis.visible = False p.xgrid.grid_line_color = None p.ygrid.grid_line_color = None return p def plot_predictions_for_a_city(df, name_of_predictions_col, city): ''' INPUT: DataFrame with location predictions; name of column in DataFrame that contains the predictions; city ('City, State') to plot predictions for OUTPUT: Bokeh map that shows the actual location of all the users predicted to be in the selected city ''' df_ = df[df[name_of_predictions_col] == city] # Initialize two lists to hold all the latitudes and longitudes all_lats = [] all_longs = [] # Pull all latitudes in 'centroid' column append to all_lats for i in df_['centroid']: all_lats.append(i[0]) # Pull all longitudes in 'centroid' column append to all_longs for i in df_['centroid']: all_longs.append(i[1]) # Initialize two lists to hold all the latitudes and longitudes # converted to web mercator all_x = [] all_y = [] # Convert latittudes and longitudes to web mercator x and y format for i in xrange(len(all_lats)): pnt = transform( partial( pyproj.transform, pyproj.Proj(init='EPSG:4326'), pyproj.Proj(init='EPSG:3857')), Point(all_longs[i], all_lats[i])) all_x.append(pnt.x) all_y.append(pnt.y) p = base_plot() p.add_tile(STAMEN_TERRAIN) p.circle(x=all_x, y=all_y, line_color=None, fill_color='#380474', size=15, alpha=.5) output_file("stamen_toner_plot.html") show(p) if __name__ == "__main__": # Load pickled evaluation_df with location predictions evaluation_df_with_predictions = pd.read_pickle('evaluation_df_with_predictions.pkl') # Plot actual locations for users predicted to be in Eugene, OR plot_predictions_for_a_city(evaluation_df_with_predictions, 'predicted_location', 'Eugene, OR') Example 1: Eugene, OR

Eugene_OR.png

Example 2: Durham, NC

Durham_NC.png

Example 3: Shreveport, LA

Shreveport_LA.png

Tweet Term Importances for these Cities

To get an idea of what tweet terms were important for predicting these cities, I went through and calculated mean tf-idf values for each of these cities. Below are some of the more interesting terms for each of these cities. To generate these plots, I followed an excellent guide written by Thomas Buhrmann.

city_tweet_term_importances.png

Emoji Skin Tone Modifications

Screen Shot 2017-07-21 at 12.43.13 PM.png

One of the more interesting things to fall out of the model was the colored boxes shown above. These represent the skin tone modifications you can add to certain emojis. For most emojis there was not a strong geographic signal; however, for the skin tone modifications there was. As you can see in the term importances plots, Twitter users in Eugene, OR, tended to use lighter colored skin tone modifications while users in Durham, NC, and Shreveport, LA, tended to use darker skin tone modifications.

Scoring the Model

Median Error: 49.6 miles

To score the model I chose to use median error, which came out to be 49.6 miles. This was calculated by using the centroids to find the great-circle distance between the predicted city and the true location. Here is how it was calculated (note: if the user was predicted to be in the correct city, the error was scored as 0.0 miles, regardless of the actual distance between the centroids):

from math import sin, cos, sqrt, atan2, radians import pickle def load_coord_dict(): ''' Input: n/a Output: A dictionary whose keys are the location names ('City, State') of the 379 classification labels and the values are the centroids for those locations (latitude, longitude) ''' pkl_file = open("coord_dict.pkl", 'rb') coord_dict = pickle.load(pkl_file) pkl_file.close() return coord_dict def compute_error_in_miles(zipped_predictions): ''' INPUT: Tuple in the form of (predicted city, centroid of true location) OUTPUT: Float of the great-circle error distance between the predicted city and the true locaiton. ''' radius = 3959 # approximate radius of earth in miles predicted_city = zipped_predictions[0] actual_centroid = zipped_predictions[1] lat1 = radians(coord_dict[predicted_city][0]) lon1 = radians(coord_dict[predicted_city][1]) lat2 = radians(actual_centroid[0]) lon2 = radians(actual_centroid[1]) delta_lon = lon2 - lon1 delta_lat = lat2 - lat1 a = sin(delta_lat / 2)**2 + cos(lat1) * cos(lat2) * sin(delta_lon / 2)**2 c = 2 * atan2(sqrt(a), sqrt(1 - a)) error_distance = radius * c return error_distance def correct_outside_the_us_errors(row): ''' Helper function to correct the errors to 0.0 for the users that were correctly predicted. This is especially important for users outside the US since they were all given the same label ('NOT_IN_US, NONE') even though their centroids were all different. ''' if row['predicted_location'] == row['geo_location']: error = 0.0 else: error = row['error_in_miles'] return error if __name__ == "__main__": # Load coord_dict coord_dict = load_coord_dict() centroid_of_true_location = evaluation_df['centroid'].values zipped_predictions = zip(predictions, centroid_of_true_location) # Create a new column with the error value for each prediction evaluation_df['error_in_miles'] = map(lambda x: compute_error_in_miles(x), zipped_predictions) # Change the error of correct predictions to 0.0 miles evaluation_df['error_in_miles'] = evaluation_df.apply(lambda x: correct_outside_the_us_errors(x), axis=1) median_error = evaluation_df['error_in_miles'].median()

Histogram of Error Distances

histogram.png

Influence of Tweet Number on the Model's Accuracy

Recall that, for each user I wanted to make a prediction on, I went back to the API and pulled 200 of their most recent tweets. The plot below was generated using the same procedure as above with increasing numbers of tweets for each user. I originally chose 200 because this is the maximum number the API allows you to pull per distinct request. However, as you can see in the plot below, after about 100 tweets there is negligible improvement in the model's accuracy, meaning for future use it might not be necessary to pull so many tweets for each user.

error_by_tweet_number.png

Final Notes

While a median error of 49.6 miles is pretty good, there is still plenty of room for improvement. Running the Tweepy streaming script for a longer period of time and having a larger collection of training data would likely give an immediate improvement. Additionally, with more training data, you could also include more than 379 classification labels, which would also help to decrease the median error of the model. That said, given the time constraints of the project, I'm satisfied with the current model's accuracy and think it could be a valuable resource to many projects where having an extremely granular estimate of a Twitter user's location is not required.

Tags: BokehOpen SourceVisualization
Categories: FLOSS Project Planets

Hook 42: Hook 42 goes to Washington!

Planet Drupal - Fri, 2017-07-21 12:38

Hook 42 is expanding our enterprise Drupal services to the public sector. It’s only logical that our next trek is to Drupal GovCon!

We are bringing some of our colorful San Francisco Bay Area love to DC. We will be sharing our knowledge about planning and managing migrations, as well as core site building layout technologies. The most exciting part of the conference will be meeting up with our east coast Drupal community and government friends in person.

Categories: FLOSS Project Planets

Jaysinh Shukla: PyDelhi Conf 2017: A beautiful conference happened in New Delhi, India

Planet Python - Fri, 2017-07-21 08:44

TL;DR

PyDelhi conf 2017 was a two-day conference which featured workshops, dev sprints, both full-length and lightning talks. There were workshop sessions without any extra charges. Delhiites should not miss the chance to attend this conference in future. I conducted a workshop titled “Tango with Django” helping beginners to understand the Django web framework.

Detailed Review About the PyDelhi community

PyDelhi conf 2017 volunteers

The PyDelhi community was known as NCR Python Users Group before few years. This community is performing a role of an umbrella organization for other FLOSS communities across New Delhi, India. They are actively arranging monthly meetups on interesting topics. Last PyCon India which is a national level conference of Python programming language was impressively organized by this community. This year too they took the responsibility of managing it. I am very thankful to this community for their immense contribution to this society. If you are around New Delhi, India then you should not miss the chance to attend their meetups. This community has great people who are always happy to mentor.

PyDelhi conf 2017

Conference T-shirt

PyDelhi conf is a regional level conference of Python programming language organized by PyDelhi community. It is their second year organizing this conference. Last year it was located at JNU University. This year it happened at IIM, Lucknow campus based in Noida, New Delhi, India. I enjoyed various talks which I will mention later here, a workshops section because I was conducting one and some panel discussions because people involved were having a good level of experience. 80% of the time slot was divided equally between 30 minutes talk and 2-hour workshop section. 10% were given to panel discussions and 10% was reserved for lightning talks. The dev sprints were happening in parallel with the conference. The early slot was given to workshops for both the days. One large conference hall was located on a 2nd floor of the building and two halls at the ground floor. Food and beverages were served on the base floor.

Panel Discussion

Registration desk

Tea break

Keynote speakers

  • Mr. Ricardo Rocha: Mr. Rocha is a software engineer at CERN. I got some time to talk with him post-conference. We discussed his responsibilities at CERN. I was impressed when he explained how he is managing infrastructure with his team. On inquiring opportunities available at CERN he mentioned that the organization is always looking for the talented developers. New grads can keep an eye on various Summer Internship Programs which are very similar to Google Summer of Code program.

  • Mr. Chris Stucchio: Mr. Stucchio is director of Data Science at Wingify/ VWO. I found him physically fit compared to other software developers (mostly of India). I didn’t get much time to have a word with him.
Interesting Talks

Because I took the wrong metro train, I was late for the inaugural ceremony. I also missed a keynote given by Mr. Rocha. Below talks were impressively presented at the conference.

I love discussing with people rather than sit in on sessions. With that ace-reason, I always lose some important talks presented at the conference. I do not forget to watch them once they are publicly available. This year I missed following talks.

Volunteer Party

I got a warm invitation by the organizers to join the volunteer party, but I was little tensed about my session happening on the next day. So, I decided to go home and improve the slides. I heard from friends that the party was awesome!

My workshop session

Me conducting workshop

I conducted a workshop on Django web framework. “Tango with Django” was chosen as a title with a thought of attracting beginners. I believe this title is already a name of famous book solving the same purpose.

Dev sprints

Me hacking at dev sprints section

The dev sprints were happening parallel with the conference. Mr. Pillai was representing Junction. I decided to test few issues of CPython but didn’t do much. There were a bunch of people hacking but didn’t find anything interesting. The quality of chairs was so an impressive that I have decided to buy the same for my home office.

Why attend this conference?
  • Free Workshops: The conference has great slot of talks and workshops. Workshops are being conducted by field experts without expecting any other fees. This can be one of the great advantages you leverage from this conference.

  • Student discounts: If you are a student then you will receive a discount on the conference ticket.

  • Beginner friendly platform: If you are novice speaker than you will get mentorship from this community. You can conduct a session for beginners.

  • Networking: You will find senior employees of tech giants, owner of innovative start-ups and professors from well-known universities participating in this conference. It can be a good opportunity for you to network with them.

What was missing?
  • Lecture hall arrangement: It was difficult to frequently travel to the second floor and come back to the ground floor. I found most people were spending their time on the ground floor rather than attending talks going on upstairs.

  • No corporate stalls: Despite having corporate sponsors like Microsoft I didn’t find any stall of any company.

  • The venue for dev sprints: The rooms were designed for teleconference containing circularly arranged wooden tables. This was not creating a collaborative environment. Involved projects were not frequently promoted during the conference.

Thank you PyDelhi community!

I would like to thank all the known, unknown volunteers who performed their best in arranging this conference. I am encouraging PyDelhi community for keep organizing such an affable conference.

Proofreaders: Mr. Daniel Foerster, Mr. Dhavan Vaidya, Mr. Sayan Chowdhury, Mr. Trent Buck
Categories: FLOSS Project Planets

Anubavam Blog: Dependency injection and Service Containers

Planet Drupal - Fri, 2017-07-21 07:43
Dependency injection and Service Containers

A dependency is an object that can be used (a service). An injection is the passing of a dependency to a dependent object (a client) that would use it. The service is made part of the client's state. Passing the service to the client, rather than allowing a client to build or find the service, is the fundamental requirement of the pattern." Dependency injection is an advanced software design pattern and applying it will increase flexibility. Once you wrap your head around this pattern, you will be unstoppable.

A practical example of accessing services in objects using dependency injection

For the following example, let's assume we are creating a method that will use the service of A, we need to pull the dependencies of B and C into the plugin which we can use to inject whichever services we require.

  • Application needs A so:
  • Application gets A from the Container, so:
  • Container creates C
  • Container creates B and gives it C
  • Container creates A and gives it B
  • Application calls A
  • A calls B
  • B does something

Types of Dependency Injection

There are different types of Dependency Injection:

  • Constructor injection
  • Method injection
  • Setter and property injection
  • PHP callable injection

Constructor Injection

The DI container supports constructor injection with the help of type hints(Type hinting we can specify the expected data type) for constructor parameters. The type hints tell the container which classes or interfaces are dependent when it is used to create a new object. The container will try to get the instances of the dependent classes or interfaces and then inject them into the new object through the constructor.

Method Injection 

In constructor injection we saw that the dependent class will use the same concrete class for its entire lifetime. Now if we need to pass separate concrete class on each invocation of the method, we have to pass the dependency in the method only.

Setter & Property Injection

Now we have discussed two scenarios where in constructor injection we knew that the dependent class will use one concrete class for the entire lifetime. The second approach is to use the method injection where we can pass the concrete class object in the action method itself. But what if the responsibility of selection of concrete class and invocation of method are in separate places. In such cases we need property injection.

PHP Callable Injection

Container will use a registered PHP callable to build new instances of a class. Each time when yii\di\Container::get() is called, the corresponding callable will be invoked. The callable is responsible to resolve the dependencies and inject them appropriately to the newly created objects

Dependency Injection: Advantages & Disadvantages

Advantages

Reducing the dependency to each other of objects in application.
Unit testing is made easier.
Loosely couple 
Promotes re-usability of code or objects in different applications
Promotes logical abstraction of components.

Disadvantages

DI increases complexity, usually by increasing the number of classes since responsibilities are separated more, which is not always beneficial.
Code will be coupled to the dependency injection framework.
It takes time to learn
If misunderstood it can lead to more harm than good

Summary

Dependency injection is a very simple concept of decoupling your code and easier to read. By injecting dependencies to objects we can isolate their purpose and easily swap them with others. 

The service container is basically there to manage some classes. It keeps track of what a certain service needs before getting instantiated, does it for you and all you have to do is access the container to request that service. Using it the right way will save time and frustration, while Drupal developers will even make it easier for the layman. 

 

admin Fri, 07/21/2017 - 07:43 Drupal developer Drupal Application Development
Categories: FLOSS Project Planets

Michal Čihař: Making Weblate more secure and robust

Planet Debian - Fri, 2017-07-21 06:00

Having publicly running web application always brings challenges in terms of security and in generally in handling untrusted data. Security wise Weblate has been always quite good (mostly thanks to using Django which comes with built in protection against many vulnerabilities), but there were always things to improve in input validation or possible information leaks.

When Weblate has joined HackerOne (see our first month experience with it), I was hoping to get some security driven core review, but apparently most people there are focused on black box testing. I can certainly understand that - it's easier to conduct and you need much less knowledge of the tested website to perform this.

One big area where reports against Weblate came in was authentication. Originally we were mostly fully relying on default authentication pipeline coming with Python Social Auth, but that showed some possible security implications and we ended up with having heavily customized authentication pipeline to avoid several risks. Some patches were submitted back, some issues reported, but still we've diverged quite a lot in this area.

Second area where scanning was apparently performed, but almost none reports came, was input validation. Thanks to excellent XSS protection in Django nothing was really found. On the other side this has triggered several internal server errors on our side. At this point I was really happy to have Rollbar configured to track all errors happening in the production. Thanks to having all such errors properly recorded and grouped it was really easy to go through them and fix them in our codebase.

Most of the related fixes have landed in Weblate 2.14 and 2.15, but obviously this is ongoing effort to make Weblate better with every release.

Filed under: Debian English SUSE Weblate

Categories: FLOSS Project Planets
Syndicate content