# Feeds

### Drupal Association blog: 2017 Drupal Association at-large election winner announced

Planet Drupal - Mon, 2017-03-27 11:51

*/

The staff and board of the Drupal Association would like to congratulate our newest board member:

Ryan Szrama.

Thank you, Ryan, for stepping forward to serve the Drupal community. On behalf of the community I also want to thank the 13 candidates who put themselves out there in service of Drupal and nominated themselves. We are grateful that our community has so many brave and generous people willing to contribute this way.

Ryan's election to the board represents the sixth year of elections to a community-at-large seat on the Drupal Association board. Each year we've focused on improving the elections process, and this year was no different. We focused on two goals:

1. Improve the user experience of participating in the elections process.
• We added more in-line help materials throughout the elections process.
• For candidates, we added information about the responsibilities of a board member to the nomination form, as well as a video message from the Executive Director.
• For voters we improved the elections navigation, and provided more educational materials about the IRV voting process.
• We implemented a drag and drop ballot, to make it easier for voters to rank candidates.
2. Make it easier to get to know the candidates.
• We updated the candidate profile form, to ask more detailed questions to help voters get to know the candidates.
• Based on feedback from previous years, we eliminated the three virtual meet-the-candidates sessions, in favor of giving each candidate the option to post a statement-of-candidacy video.  In conjunction with the question and answer section on each candidate profile, we felt this gave the electorate the opportunity to get to know their candidates at their own pace and on their own terms.

Our next steps will be to reach out to the candidates for their evaluation of the elections experience.

We also want to hear from the voters. Please tell us about your experience with the elections process in the comments below. Your feedback is important to us so that we can make the 2018 elections process even better.

About the Elections Methodology: Instant Run-off Voting(IRV)

Elections for the Community-at-large positions on the Drupal Association board are conducted through Instant Run-off Voting. This means that voters can rank candidates according to their preference. When tabulating ballots, the voters' top-ranked choices are considered first. If no candidate has more than 50% of the vote, the candidate with the lowest votes is eliminated. Then the ballots are tabulated again, with all the ballots that had the eliminated candidate as their first rank now recalculated with their second rank choices. This process is repeated until only two candidates remain and a clear winner can be determined. This voting method helps to ensure that the candidate who is most preferred by the most number of voters is ultimately elected. You can learn more about IRV (also known as Alternative Vote) in this video.

Voting Results

There were 13 candidates in contention for the single vacancy among the two community-at-large seats on the Board. 1,240 voters cast their ballots out of a pool of 94,499 eligible voters (1.3%). Voters ranked an average of 3.6 candidates on their ballots.

The bar charts below show the vote counts for each candidate in each round.
Place the mouse over a bar to see the number of votes.

• Yellow — Votes carried over from the previous round.
• Green — Votes received in this round.
• Red — Votes transferred away in this round.

A candidate's votes in a round is the sum of the yellow and green bars.
Since the green and red bars represent votes being transferred, the sum of the
green and red bars is the same.

The exhausted bar represents votes where the voter did not indicate a next
preference and thus there were no candidates to transfer the vote to.

Round 1

(next)

Count of first choices.

Round 2

(prev)(next)

Count after eliminating gurubryan and transferring votes.

Round 3

(prev)(next)

Count after eliminating mehuls and transferring votes.

Round 4

(prev)(next)

Count after eliminating zet and transferring votes.

Round 5

(prev)(next)

Count after eliminating Rahul Seth and transferring votes.

Round 6

(prev)(next)

Count after eliminating redacted and transferring votes.

Round 7

(prev)(next)

Count after eliminating MatthewS and transferring votes.

Round 8

(prev)(next)

Count after eliminating Riaan Burger and transferring votes.

Round 9

(prev)(next)

Count after eliminating johnkennedy and transferring votes.

Round 10

(prev)(next)

Count after eliminating jackniu and transferring votes.

Round 11

(prev)(next)

Count after eliminating ok_lyndsey and transferring votes.

Round 12

(prev)

Count after eliminating Prasad Shir and transferring votes. Candidate rszrama is elected.

Winners

Winner is rszrama.

Categories: FLOSS Project Planets

### Virtuoso Performance: Thoughts on the Drupal community

Planet Drupal - Mon, 2017-03-27 11:47
Thoughts on the Drupal community

I did make one comment on Dries’ blog in the immediate aftermath of learning about the situation which is roiling the Drupal community, but since then have taken some time to listen and ponder. The community is in deep pain now, and many of us are reacting to that pain with anger. Trust is in short supply. Healing seems nearly impossible.

We need to start from compassion for all involved. The pain is deepest for those in the middle. Larry has already expressed his pain eloquently - I know losing the Drupal community would cut me deeply, and pragmatically this is a major blow to his career as well. But, let’s also consider Drupal leadership - Dries, the Drupal Association, the Community Working Group. Regardless of whether we agree with their decision, I see no reason to believe it was done arbitrarily or with malice. Reaching such a decision against someone who has given so much to the community over the years must have been extraordinarily difficult, and the fact that this decision seems to have eroded much of the community’s trust in them is surely agonizing.

We need to recognize and address the asymmetries in this situation. The power in the relationship is unbalanced - Drupal leadership has an ability to affect Larry’s life profoundly that is not reflected. On the other hand, the information is also unbalanced - Larry is able to say what he chooses publicly, but the Drupal leadership has a responsibility to maintain confidentiality. Yes, “confidentiality” can be used as a smokescreen - but there really is a legitimate need to respect it - to protect those who gave evidence to the CWG, and to protect Larry himself from public accusations without public evidence. Transparency and confidentiality are at odds, and it is exceedingly difficult to find a “perfect” balance between them.

That all being said, and recognizing that the information I have is incomplete, my main thoughts on the three parties involved:

1. The impression Larry’s blog post leaves is that his dismissal was primarily due to BDSM/Gorean practices in individual personal relationships (that certainly appears to be the main takeaway in much of the criticism online). On the other hand, statements from the other side suggest it may have had more to do with broader statements of belief (and commitment to living that belief totally) which seem in conflict with the Drupal community’s values (although it’s difficult for me to be sure of whether they were meant to be taken literally in the real world, or as a form of cosplay - as portrayal of a Gorean character). Just to be clear - although I strongly disagree with some statements I saw, as long as they were not reflected in Larry’s behavior within the Drupal community I don’t see standing to dismiss him (except, perhaps, from representation to the PHP community if it seemed like it might diminish his effectiveness in that role). But, if this was indeed the main issue presented to Larry, I would have liked to see him address it head-on. He does deal with it somewhat in the section “Larry is a proponent for the enslavement of women!”, but the section title itself looks like an exaggeration of the actual accusation, and it is down at the bottom of the accusations he addresses, de-emphasizing it.

2. I think Drupal leadership needs to tilt the balance at least a little more towards transparency. The community does need to better understand broadly why Larry was dismissed. Dries’ post stated “I did this because it came to my attention that he holds views that are in opposition with the values of the Drupal project”. This suggests that the primary reason for the dismissal was those statements outside of Drupal - I (and many others) feel that what happens outside of the Drupal community, should stay outside of the Drupal community. Then, the DA stated “We want to be clear that the decision to remove Larry's DrupalCon session and track chair role was not because of his private life or personal beliefs... Our decision was based on confidential information conveyed in private by many sources.” This contradicts Dries’ original statement, which is concerning. It also fails to address the central concern many people have - did Larry do or say anything within the Drupal community?

3. I don’t think the Drupal community has acquitted itself well here. The immediate outpouring has been based on one point of view - admittedly, there is little hard information otherwise, but we should all be slower to react when we know we don’t have all the facts, and lead off with questions rather than diatribes. One thing to be concerned about is that the one-sided onslaught is likely to discourage expressions that disagree with the crowd - anyone who might agree with the Drupal leadership’s decision, or who might know of some concrete reasons they may have made that decision, has reason to fear speaking up. I’m thinking here of GamerGate. No, I’m not saying the people criticizing the decision are like the GamerGaters - but what I am saying is that, given that the center of this controversy is around beliefs and statements that look an awful lot like misogyny, and that much of the rhetoric has carried a whiff of torches and pitchforks, I would not be at all surprised if women (and feminists of all gender identities) felt good reason to fear a GamerGate-like backlash if they did speak up. We need to leave more room for all voices and not flood the space unilaterally.

So, where do we go from here? In reverse order,

1. The Drupal community is certainly Internet-savvy - we’ve all seen so many cases where based on one piece of information the flamers descend without waiting for fact-checking, the other side of the story, etc. We need to jerk our knees a little more slowly. We need to recognize we don’t have (and will never have) all the information, and the fact that we won’t have it is not in and of itself proof that the decisions of Dries/DA/CWG were wrong.

2. Drupal leadership does need to tell us more, and I think it can be done without violating confidentiality. Simply put - did Larry’s dismissal involve anything he did within the Drupal community? If they say it did, I for one, am willing to accept it and move on - I’m not in a position to know the specifics (nor should I be), but I recognize this as a legitimate exercise of authority, even if I don’t know if I would have voted the same way if I had seen the evidence. If, on the other hand, this is based purely on statements and actions outside of the Drupal community, it’s important for all of us in the Drupal community to understand that - we all need to know if the DA and CWG will hold us accountable for our online presence outside of Drupal. On that score, I’ve created a separate Twitter account for my Drupal and professional communications under my DBA, Virtuoso Performance. To be fair, I’ve considered doing this anyway just in terms of work/life balance, but now it seems all the more important to keep things separate.

3. I know it’s a lot to ask - actually, I know that it’s too much to ask and don’t actually expect it, but in the interests of symmetry I’m putting it out there… It would be great if Larry could share (without violating the confidentiality of anyone involved - redacting names and details) precisely what he was told were the specific charges that led to his dismissal. We’re not going to get any specifics from the Drupal leadership side - he’s the only person who can provide us with any hard information. Again, I don’t expect this - Larry has suffered more than anyone here, and has already really put himself out there. Edit: Larry has now responded with more information, which I will take some time to review before making further comment.

Ultimately, while this is a painful episode in Drupal’s history, I hope we can find a way to get through it and come out the other side with a better understanding of each other, and rebuild trust within the community.

mikeryan Monday, March 27, 2017 - 10:47am Tags Planet Drupal Drupal Add new comment
Categories: FLOSS Project Planets

### Sooper Drupal Themes: New: Customize Glazed Builder With Glazed Theme 2.6.4 and Glazed Builder 1.1.3!

Planet Drupal - Mon, 2017-03-27 11:40

Today I'm excited to announce a new Glazed Builder and Theme release... Imagine having a meeting with your client tomorrow. You've promised your client the ability to update landing pages without needing any help. The client imagines he'll just be changing images and simple text blocks. Then you show him the Glazed Builder Sidebar and it's filled with your custom designed icon boxes, testimonial block, and even custom branded sliders. All accessible and editable without needing even basic HTML skills.

Custom HTML Drag and Drop Elements in the Glazed Builder Sidebar!

Adding elements to the sidebar is now extremely easy, you don't need to have a custom module or even any PHP code. You just drop a folder with your custom elements in your theme or subtheme folder and your custom elements will magically appear as editable drag and drop elements. You just to add a class or two to indicate editable portions of your HTML elements and that's it. Of course, you can find all the details and an example zip file in the sidebar elements documentation

Various Fixes

We made various improvements to the Glazed Builder and Theme user experience, details of which you can read in the Glazed Builder Changelog and Glazed Theme changelog. We're ironing out as many little issues as possible while working on the Drupal 8 theme releases!

Need any help with sidebar elements? Just create a ticket in the support forum and we'll try to help you out and simultaneously improve our product to match any expectations you have that we did not think of.

Categories: FLOSS Project Planets

### DataCamp: DataChats: An Interview with Andreas Müller

Planet Python - Mon, 2017-03-27 11:25

Hi pythonistas! We just released episode 15 of our DataChats video series.

In this episode, Hugo interviews Andreas Müller. Andy is a lecturer at the Data Science Institute at Columbia University and author of the O'Reilly book "Introduction to machine learning with Python", describing a practical approach to machine learning with python and scikit-learn. He is one of the core developers of the scikit-learn machine learning library, and he has been co-maintaining it for several years. He's also a Software Carpentry instructor. In the past, he worked at the NYU Center for Data Science on open source and open science, and as a Machine Learning Scientist at Amazon. His mission is to create open tools to lower the barrier of entry for machine learning applications, promote reproducible science and democratize the access to high-quality machine learning algorithms. You can take his course, Supervised Learning with scikit-learn, here.

Andy answers Hugo's questions about his work at Columbia, gives advice to people starting with data science and answers what the most difficult part of his job is.

&amp;amp;amp;amp;amp;amp;amp;amp;lt;/center&amp;amp;amp;amp;amp;amp;amp;amp;gt;

We hope that you enjoy watching this series and make sure not to miss any of our upcoming episodes by subscribing to DataCamp's YouTube channel!

Categories: FLOSS Project Planets

### Kushal Das: Building IoT enabled power-strip with MicroPython and NodeMCU

Planet Python - Mon, 2017-03-27 11:20

This was on my TODO list for a long time. But, never managed to time to start working on it, I was also kind of scared of doing the AC wiring without adult supervision :).

Items used
• Power-strip
• USB power plug (any standard mobile phone charger)
• wires
• NodeMCU Amica
• Relay board
• MicroPython
• Mosquitto server on my home network

I ordered double relay boards (this one was marked for Arduino) from Amazon, and they were laying in the boxes in the Pune Hackerspace for a long time.

Yesterday, we had a Raspberry Pi workshop in the hackerspace as part of the Python Pune monthly meetup. Nikhil was present in the meetup, and I asked for help from him as he is a real hardware expert.

We took one of the existing power-strip from the hackerspace, and also a mobile phone charger. After taking out 2 of the power sockets we had enough space to plug-in the rest of the system inside of it. Of course, Nikhil did all the hard work of soldering the wires in the proper manner.

The relay board is connected to a NodeMCU Amica running MicroPython. It has a code like the following example:

import time from machine import Pin from umqtt.simple import MQTTClient # Received messages from subscriptions will be delivered to this callback def sub_cb(topic, msg): led1 = Pin(14,Pin.OUT) if msg == b"on_msg": led1.low() elif msg == b"off_msg": led1.high() def main(server="SERVER_IP"): c = MQTTClient("umqtt_client", server) c.set_callback(sub_cb) c.connect() c.subscribe(b"your_topic") while True: c.wait_msg() c.disconnect() if __name__ == "__main__": try: time.sleep(10) main() except: pass

I will have to cover up the holes with something, and also push the code to a proper repository. Meanwhile this was the first usable thing I made with help from friends in the Hackerspace Pune. Come and join us to have more fun and build new things.

Btw, remember to have a password protected mosquitto server :)

Categories: FLOSS Project Planets

### Xeno Media: Xeno Media's Michael Porter to present "The Butler Did It: Putting Jenkins To Work For You" at Drupal MidCamp Saturday, April 1

Planet Drupal - Mon, 2017-03-27 11:12

Xeno Media Lead Developer Michael Porter was selected to present The Butler Did It: Putting Jenkins To Work For You at Drupal MidCamp on April 1 in Chicago.

Michael's presentation shows how use the power of Continuous Integration (CI) servers for offloading some of the repetitive tasks developers and software maintainers need to do on a daily basis.

The session will demonstrate how to use Jenkins, the leading open source automation server to:

• Run Drupal core and module updates
• Run and report on behat tests
• Run and report on Coding Standards
• Trigger Offsite backup of production sites
• Use Jenkins Pipeline workflows to build branch/feature based servers.
• Triggering jobs with webhooks
• Report progress, and results to Slack

MidCamp participation is part of Xeno Media's strategic dedication to Drupal and the Open Source community. We have been a MidCamp sponsor for two years and  Web Strategist Jim Birch is an active organizer.

Categories: FLOSS Project Planets

### Virtuoso Performance: The Drupal (migration) expert is in at MidCamp

Planet Drupal - Mon, 2017-03-27 10:55
The Drupal (migration) expert is in at MidCamp

MidCamp is imminent, and I'm proud to announce that Virtuoso Performance (i.e., me!) is sponsoring a "Drupal Expert Is In" session Saturday April 1 at 1pm. I'll be in the main room (120) to answer your Drupal 8 migration questions, help you get through any tricky plugin issues, and demonstrate how to approach migration problems. The plan is to make this an open session - to allow anyone interested in Drupal 8 migration to sit around the campfire and learn from each other's issues.

Migrate all the things!

mikeryan Monday, March 27, 2017 - 09:55am Tags Planet Drupal Drupal
Categories: FLOSS Project Planets

### FeatherCast: Shawn McKinney, ApacheCon North America 2017, and Java Security

Planet Apache - Mon, 2017-03-27 10:35

At ApacheCon Miami, Shawn McKinney will give a talk on the anatomy of web application security.

In this interview, he talks about what he’ll be presenting, and who should attend.

https://feathercastapache.files.wordpress.com/2017/03/shawn_mckinney_acna2017.mp3

Register today for ApacheCon, and save $200 on your admission cost. Categories: FLOSS Project Planets ### OSI Welcomes the Journal of Open Source Software as Affiliate Member Open Source Initiative - Mon, 2017-03-27 09:24 Open Source Initiative Extends Support for Computational Science and Engineering Research and Researchers. PALO ALTO, Calif. - March 28, 2017 -- The Open Source Initiative® (OSI), a global non-profit organization formed to educate about and advocate for the benefits of open source software and communities, announced that the Journal Of Open Source Software (JOSS), a peer-reviewed journal for open source research software packages, is now an OSI affiliate member. JOSS is a response to the growing demand among academics to directly publish papers on research software. Typically, academic journals and papers related to software focus on lengthy descriptions of software (features and functionality), or original research results generated using the software. JOSS provides a means for researchers to directly publish their open source software as source code: the complete set of software with build, installation, test, and usage instructions. The primary purpose of a JOSS paper is to enable authors to receive citation credit for research software, while requiring minimal new writing. Within academia, it is critical for researchers to be able to measure the impact of their work. While there are many tools and metrics for tracking research outputs, JOSS fills the gap for software source code, which doesn't look like traditional academic research papers, allowing it to be treated as another form of research output. JOSS has a rigorous peer-review process designed to improve the quality of the software product, and a first-class editorial board experienced at building (and reviewing) high-quality research software. Review criteria assess the research statement of need, installation and build instructions, user documentation and contributing guidelines. The software is also required to have an OSI-approved license and a code of conduct. Authors must include examples of use, tests, and suitable API documentation. Reviewers and authors participate in an open and constructive review process focused on improving the quality of the software—and this interaction itself is subject to the JOSS Code of Conduct. OSI Board Director Stefano Zacchiroli noted, "JOSS is a clever hack. It addresses the idiosyncrasy of traditional academic publishing that still forces researchers to write bogus papers in order to get credit for the impactful research software they write. By requiring that published software be released under an OSI-approved license, and that a third party archive the software and associate the archive with a DOI, JOSS ensures that published research software enters the software commons and that it be always available for anyone to use, modify, and share." "On the face of it, writing papers about software is a strange thing to do, especially if there's a public software repository, documentation and perhaps even a website for users of the software. But writing a paper is currently the most recognized method for academics to gain career credit, as it creates a citable entity that can be referenced by other authors." said Arfon Smith, JOSS Editor-in-Chief. "The papers JOSS publishes are conventional papers, other than their short length: the journal is registered with the Library of Congress and has an ISSN. Every JOSS paper is automatically assigned a Crossref DOI and is associated with the ORCID profiles of the authors. If software papers are currently the best solution for gaining career credit for software, then shouldn't we make it as easy as possible to create a software paper?" In recent years, the OSI has made significant investments in higher education: extending the OSI Affiliate Member Program to institutions of higher education, creating educational materials, sponsoring curriculum development, and even developing a complete online course. Supporting academic research enabled by openly licensed software is a natural progression of the OSI's work in higher education. The OSI Affiliate Member Program is available at no-cost to non-profit or educational institutions and government agencies—independent groups with a commitment to open source—that support OSI's mission to raise awareness and adoption of open source software and to build bridges among different constituencies in the open source community. About The Journal Of Open Source Software Founded in 2016, the Journal of Open Source Software (JOSS) is a developer-friendly peer-reviewed academic journal for research software packages, designed to improve the quality of the submitted software and to make software citable as a research product. JOSS is an open-access journal committed to running at minimal cost, with zero publication fees or subscription fees. With volunteer effort from the editorial board and community reviewers, donations, and minimal infrastructure cost, JOSS can remain a free community service. Learn more about JOSS at: http://joss.theoj.org. About the Open Source Initiative Founded in 1998, the Open Source Initiative protects and promotes open source by providing a foundation for community success. It champions open source in society through education, infrastructure and collaboration. The (OSI) is a California public benefit corporation, with 501(c)(3) tax-exempt status. For more information about the OSI, or to learn how to become an affiliate, please visit: http://opensource.org. Media Contact Ed Schauweker Categories: FLOSS Research ### Doug Hellmann: locale — Cultural Localization API — PyMOTW 3 Planet Python - Mon, 2017-03-27 09:00 The locale module is part of Python’s internationalization and localization support library. It provides a standard way to handle operations that may depend on the language or location of a user. For example, it handles formatting numbers as currency, comparing strings for sorting, and working with dates. It does not cover translation (see the gettext … Continue reading locale — Cultural Localization API — PyMOTW 3 Categories: FLOSS Project Planets ### Mike Driscoll: PyDev of the Week: Roman Sirokov Planet Python - Mon, 2017-03-27 08:30 This week we welcome Roman Sirokov as our PyDev of the Week! He is the author of pywebview, which is a cross-platform lightweight native wrapper around a web view component. You can basically create a desktop user interface using web technologies and frameworks. He is quite active on Github where you can see all the projects he is involved with. Let’s take a few moments to learn more about our fellow Pythonista! Can you tell us a little about yourself (hobbies, education, etc): I am a software engineer from Helsinki, Finland. I currently work for Siili Solutions as a full-stack developer doing various client projects. I have two master degrees, one in computer science from Aalto University and the second one in bioinformatics from University of Helsinki. The first degree was about graduating and the second one about actually wanting to learn something. I have traveled quite a bit and the longest I have spent on the road was nine months. On one occasion I cycled about 3000km around Baltic Sea during a very rainy summer. I am an avid cross-country skier and try to get as much as skiing as possible with very little snow we get nowadays. I practice ashtanga yoga and vipassana meditation too and try to attend a vipassana retreat once a year. Other than that I dj mostly cosmic music and try to keep my cats entertained. Some of my mixes can be found here. Why did you start using Python? I got into Python around 2004 during my university days. In school they taught a combination of Java, Scheme and C, which were an important learning experience, but not very fun or practical for my own needs. I heard about Python from a friend and it blew me away on how easy and straightforward it was. While Java forced you to perform some arbitrate voodoo to achieve trivial things, Python got straight to the point with as little code as possible. Sometime later I had this aha moment that I could actually solve my own problems by programming, instead of relying on ready-made software. Python was an integral part of this realisation. What other programming languages do you know and which is your favorite? I have mostly worked with Python, Javascript, Java and C#. Python is my favorite language for its straightforwardness and simplicity. Javascript is a combination of love, frustration and general confusion. I like C# and have a sweet spot for WPF (hands down, the best GUI library I have ever worked with), but haven’t done anything with either for a while. I am happy to see the recent cross-platform developments in .NET. Shame that WPF is still Windows only. What projects are you working on now? Currently I devote my free time to these projects • pywebview – a simple GUI library that lets you use a HMTL/JS/CSS stack as your GUI without a browser. • Latukartta – a cross-country ski trail map for Finland with the real-time trail status of ski trails. • Next for Traktor – An app for Traktor DJ software that helps you to choose a next track to play and to keep track of good transitions • Traktor Librarian – Another app for Traktor for cleaning up and exporting music library. All of these projects are done with Python as back-end and web stack as front-end. What I like about this approach is that I can re-use code and employ the same set of tools, no matter if it is a web or desktop project. Which Python libraries are your favorite (core or 3rd party)? I guess Flask deserves a mention, but I don’t really have any favorites. What I love about Python that there’s a library for everything and it is usually dead-easy to take one into use. Where do you see Python going as a programming language? I am glad to see that Python 3 is finally getting more widely adopted. Hopefully in five years the Python 2/3 mess will be history. I would like to see Python would make a bigger impact in the mobile world. A recent announcement about Sailfish making Python a first-class citizen was a welcoming news, I wish Android would follow the suit. Finally I hope Python would bundle tools for producing an executable out of a Python script out of the box and simplify the whole building process. Thanks so much for doing the interview! Categories: FLOSS Project Planets ### Andrew Dalke: ChEMBL target sets association network Planet Python - Mon, 2017-03-27 08:00 This is part 3 of a three-part series in generating fingerprint-based set similarities using chemfp. Read part 1 to see some of the ways to compare two fingerprint sets, and part 2 where I figure out how to use the ChEMBL bioactivity data. I usually work with entity-based similarities. I have a molecule X and I want to find other molecules which are similar to it. Set-based similarities are a bit different. I think of them as comparing two objects by the clouds around them. Instead of comparing two proteins based on a more intrinsic property like their sequence or 3D structure, I might want to compare two proteins by the types of molecules which bind to them or affect them. This might reveal if two proteins have similar binding pockets or are involved in the same chemical pathway. Before jumping into the nitty-gritty, I thought I would try a non-molecular example. Suppose you read Neal Stephenson's Cryptonomicon and enjoy it so much that you want to read more like it. "Like it" can mean many things: books that talk about technology, books which combine two different time periods, books with historical fiction taking place during the Second World War, books with many vignettes, and so on. For some, Foucault's Pendulum is like Cryptonomicon. And of course "like" can mean other books by the same author, or even by the same publishing house or imprint. The "entity" in this case is a book. While there are many possible scoring functions, the end result is a book or list of books, likely ranked in preference order. Suppose however you have read many books by Stephenson and want to find another author like him. Here too there are many ways to make a comparison. One is to use the book similarity function. For each author under consideration, compare all of that author's books to all of Stephenson's books, and come up with some aggregate scoring function to give the set similarity. Use that to figure out the most similar author. If you repeat this many time you can create a network of authors, associated by similarity based on their books. IC50 activity sets from ChEMBL Back into the world of molecules. I want to compare target proteins in human assays based on the set of molecules with an IC50 of at least 1 micromolar for each molecule. I'll use the ChEMBL 21 bioactivity data to generate a data. The following SQL query is based on an example Iain Wallace sent me, adapted to use the SQLite console. I'll first turn on the timer and have it save the output to the file "chembl_sets.tsv", as tab-separated fields. The query looks for "single protein" targets in humans (tax_id=9606), where the assay activity is an IC50 with better than 1000nM. For each of those assays, get the target name and the ChEMBL id for the compound used in the assay. % sqlite3 chembl_21.db SQLite version 3.8.5 2014-08-15 22:37:57 Enter ".help" for usage hints. sqlite> ...> ...> ...> ...> ...> ...> ...> ...> ...> select distinct target_dictionary.pref_name, molecule_dictionary.chembl_id from target_dictionary, assays, activities, molecule_dictionary where target_dictionary.tax_id = 9606 and target_dictionary.target_type = "SINGLE PROTEIN" and target_dictionary.tid = assays.tid and assays.assay_id = activities.assay_id and activities.published_type = "IC50" and activities.standard_units = "nM" and activities.standard_value < 1000 and activities.molregno = molecule_dictionary.molregno; Run Time: real 25.771 user 4.440337 sys 2.698867 sqlite> quit Most of the 26 seconds was likely spent in reading from the hard disk. However, do note that this was after I did an ANALYZE on some of the tables in the database. Without the ANALYZE, I suspect the query will take a lot longer. The above console commands produce the file "chembl_sets.tsv" where the first few line for me look like: Beta-1 adrenergic receptor CHEMBL305153 Dopamine D4 receptor CHEMBL303519 Endothelin-converting enzyme 1 CHEMBL415967 Neprilysin CHEMBL415967 FK506-binding protein 1A CHEMBL140442 Coagulation factor X CHEMBL117716 Trypsin I CHEMBL117716 Retinoid X receptor alpha CHEMBL111217 Epidermal growth factor receptor erbB1 CHEMBL68920 Receptor protein-tyrosine kinase erbB-2 CHEMBL68920 Epidermal growth factor receptor erbB1 CHEMBL69960 Receptor protein-tyrosine kinase erbB-2 CHEMBL69960 Proteinase-activated receptor 1 CHEMBL330643 Tyrosine-protein kinase LCK CHEMBL69638 Neuropeptide Y receptor type 5 CHEMBL75193 Gonadotropin-releasing hormone receptor CHEMBL65614 ... (That's a copy&paste of the terminal output, which doesn't preserve spaces.) Compare two real sets In an earlier essay I came up with an ad hoc comparison function I called "sss()": from chemfp import search def sss(arena1, arena2, threshold=0.8): results = search.threshold_tanimoto_search_arena( arena1, arena2, threshold=threshold) # The scaling factor of 300 was chosen so that random ChEMBL # subsets have a score of about 0.01. It doesn't matter for # ranking as it's used as an arbitrary scaling factor. similarity = 300 * results.cumulative_score_all() / (len(arena1) * len(arena2)) return similarity What score does it give to two sets which are similar? Load set data The file "chembl_sets.tsv" which I just created in SQLite contains the set names and the compounds ids which are in the set. The file "chembl_21.rdkit2048.fpb" created in part 1 contains compound ids and fingerprints. I can combine the two to get the fingerprints for each compound id. The first step is to read the set information, which I do with the following function: import collections # Load a file with lines of the form <set_name> <tab> <compound_id> # Example: "DNA-dependent protein kinase\tCHEMBL104450\n" # Return a dictionary mapping set name to a list of all of its ids def load_set_members(filename): set_members_table = collections.defaultdict(list) with open(filename) as infile: for line in infile: set_name, chembl_id = line.rstrip("\n").split("\t") set_members_table[set_name].append(chembl_id) # Turn the defaultdict into a dict so that a lookup of # a name which doesn't exist raises an exception instead # of creating and returning an empty list. return dict(set_members_table) I'll use that function to read the set file. I also looked through the list of target names and guessed that "Estrogen receptor alpha" and "Estrogen receptor beta" might be similar, so I'll use that as my initial test case: >>> set_members_table = load_set_members("chembl_sets.tsv") >>> len(set_members_table["Estrogen receptor alpha"]) 912 >>> len(set_members_table["Estrogen receptor beta"]) 1066 >>> len(set(set_members_table["Estrogen receptor alpha"]) ... & set(set_members_table["Estrogen receptor beta"])) 697 Load set fingerprints Next I'll extract the fingerprints from the FPB file. The easiest way is to find the index for each of the compound ids and pass the list of indices to the copy() method. >>> import chemfp >>> chembl_21 = chemfp.load_fingerprints("chembl_21.rdkit2048.fpb") >>> target_ids1 = set_members_table["Estrogen receptor alpha"] >>> indices1 = [chembl_21.get_index_by_id(target_id) for target_id in target_ids1] >>> target1 = chembl_21.copy(indices1) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "chemfp/arena.py", line 576, in copy new_indices.append(range_check[i]) TypeError: sequence index must be integer, not 'NoneType' Well, that was unexpected. What happened? It looks like there are 8 None elements in the list of indices: >>> indices1.count(None) 8 How could I have an activity for a compound, but not have the compound? There can be a few reasons. Perhaps ChEMBL didn't include the structure data. Except they do. Perhaps RDKit couldn't parse the record. Except they could. The real clue came in beause Iain Watson also sent me the dataset he generated with his sample SQL. There are 76 additions in my file which aren't in his, including 8 estrogen receptor alpha records: % diff iain_sorted.tsv chembl_sets_sorted.tsv 5697a5698 > Acetylcholinesterase CHEMBL2448138 7776a7778 > Adenosine A3 receptor CHEMBL1386 9350a9353 > Alpha-2a adrenergic receptor CHEMBL1366 9410a9414 > Alpha-2a adrenergic receptor CHEMBL508338 9455a9460 > Alpha-2b adrenergic receptor CHEMBL1366 9479a9485 ... 54058a54091 > Epidermal growth factor receptor erbB1 CHEMBL1909064 58244a58278,58279 > Estrogen receptor alpha CHEMBL219003 > Estrogen receptor alpha CHEMBL219004 58246a58282 > Estrogen receptor alpha CHEMBL219390 58247a58284 > Estrogen receptor alpha CHEMBL219763 58506a58544 > Estrogen receptor alpha CHEMBL373625 58555a58594 > Estrogen receptor alpha CHEMBL385993 58556a58596,58597 > Estrogen receptor alpha CHEMBL386948 > Estrogen receptor alpha CHEMBL387335 65855a65897 > Glutathione reductase CHEMBL2068507 65874a65917 ... He generated his data from one of the dump files for a server-based database, like MySQL. I generated my data from the SQLite dump file. My guess is that the SQLite file was generated slightly later, and includes a few records which were added during that time delta. The reason my fingerprint file doesn't contain the entries is that the chembl_21.sdf file I used was also generated from the first snapshot, so doesn't include those new structures. At least, that's my working theory. It's also purely coincidence that I happened to start with one of the few set names where this was a problem. I'll write a function to skip ids when the id can't be found in the fingerprint arena: def create_subset(arena, ids): indices = [] for id in ids: idx = arena.get_index_by_id(id) if idx is not None: indices.append(idx) return arena.copy(indices=indices) >>> target1 = create_subset(chembl_21, set_members_table["Estrogen receptor alpha"]) >>> target2 = create_subset(chembl_21, set_members_table["Estrogen receptor beta"]) Compare two fingerprint sets With the two data sets in hand, it's a simple matter of calling the scoring function: >>> sss(target1, target2) 4.986669863305914 I'll make a little helper function to compare two classes by name: def compare(name1, name2): target1 = create_subset(chembl_21, set_members_table[name1]) target2 = create_subset(chembl_21, set_members_table[name1]) return sss(target1, target2) and use it to compare a few other sets, judiciously chosen after I implemented the next section: >>> compare("Estrogen receptor alpha", "Estrogen sulfotransferase") 6.107895939764545 >>> compare("Estrogen receptor alpha", "Vitamin D receptor") 6.107895939764545 >>> compare("Dopamine D5 receptor", "Histamine H1 receptor") 110.6279251170047 >>> compare("Dopamine D5 receptor", "Histamine H2 receptor") 110.6279251170047 >>> compare("Histamine H1 receptor", "Histamine H2 receptor") 24.037402406896796 >>> compare("NEDD8-activating enzyme E1 regulatory subunit", "RNase L") 100.0 A real chemist or biologist would need to tell me if these make sense. Compare all sets I can use the code from the previous section to generate the Nx(N-1)/2 comparison of every set to every other set. The general algorithm is to load each of the sets into its own object, which I'll call a "Subset" instance. A Subset has a "name" and an "arena", and I'll ignore empty subsets (like "Transient receptor potential cation channel subfamily M member 6"): class Subset(object): def __init__(self, name, arena): self.name = name self.arena = arena def load_subsets(arena, set_members_table): subsets = [] for i, (name, ids) in enumerate(sorted(set_members_table.items())): set_arena = create_subset(arena, ids) if not set_arena: sys.stderr.write("No members: %s\n" % (name,)) continue subsets.append(Subset(name, set_arena)) return subsets This returns N subsets. I'll iterate over the upper-triangle of the comparison matrix to generate all pair scores. If the score is at least 1.0, I'll print the score and the two set ids. Otherwise I won't report anything. I also include some progress and run-time information to stderr, which makes the code a bit more complicated to read, but helps soothe the nerves of at least this observer. Here's the full code, which I saved into the file "set_compare.py": # set_compare.py from __future__ import print_function, division import sys import collections import chemfp from chemfp import search import time # Load a file with lines of the form <set_name> <tab> <compound_id> # Example: "DNA-dependent protein kinase\tCHEMBL104450\n" # Return a dictionary mapping set name to a list of all of its ids def load_set_members(filename): set_members_table = collections.defaultdict(list) with open(filename) as infile: for line in infile: set_name, chembl_id = line.rstrip("\n").split("\t") set_members_table[set_name].append(chembl_id) # Turn the defaultdict into a dict so that a lookup of # a name which doesn't exist raises an exception instead # of creating and returning an empty list. return dict(set_members_table) # Ad hoc scoring function using the sum of scores. # Please don't use this for real work unless you've validated it. def sss(arena1, arena2, threshold=0.8): results = search.threshold_tanimoto_search_arena( arena1, arena2, threshold=threshold) # The scaling factor of 300 was chosen so that random ChEMBL # subsets have a score of about 0.01. It doesn't matter for # ranking as it's used as an arbitrary scaling factor. similarity = 300 * results.cumulative_score_all() / (len(arena1) * len(arena2)) return similarity # Make a subset arena of the given arena using the given ids def create_subset(arena, ids): indices = [] for id in ids: idx = arena.get_index_by_id(id) if idx is not None: indices.append(idx) return arena.copy(indices=indices) class Subset(object): def __init__(self, name, arena): self.name = name self.arena = arena def load_subsets(arena, set_members_table): subsets = [] for i, (name, ids) in enumerate(sorted(set_members_table.items())): set_arena = create_subset(arena, ids) if not set_arena: sys.stderr.write("No members: %s\n" % (name,)) continue subsets.append(Subset(name, set_arena)) return subsets def main(): chembl_21 = chemfp.load_fingerprints("chembl_21.rdkit2048.fpb") set_members_table = load_set_members("chembl_sets.tsv") start_time = time.time() subsets = load_subsets(chembl_21, set_members_table) load_time = time.time() N = len(subsets) for i in range(N-1): sys.stderr.write("\rProcessing %d/%d" % (i, N-1)) subset1 = subsets[i] for j in range(i+1, N): subset2 = subsets[j] score = sss(subset1.arena, subset2.arena) if score > 1.0: sys.stderr.write("\r \r") print("%.2f\t%s\t%s" % (score, subset1.name, subset2.name)) sys.stderr.write("\rProcessing %d/%d" % (i, N-1)) sys.stderr.write("\r \r") compare_time = time.time() print("load time:", load_time-start_time, file=sys.stderr) print("compare time:", compare_time-load_time, file=sys.stderr) if __name__ == "__main__": main() I ran it like this, which ran with 4 OpenMP threads: % python set_compare.py > set_compare_output.txt No members: Transient receptor potential cation channel subfamily M member 6 No members: Transient receptor potential cation channel subfamily V member 2 No members: Transient receptor potential cation channel subfamily V member 5 No members: Voltage-gated P/Q-type calcium channel alpha-1A subunit load time: 4.57191491127 compare time: 491.902451038 That command found 5410 set comparisons. I'll show the 5 smallest, 5 largest, and values near the median and quartiles: % wc -l set_compare_output.txt 5410 set_compare_output.txt % sort -n set_compare_output.txt | head -5 1.00 Apoptosis regulator Bcl-2 Membrane-associated guanylate kinase-related 3 1.00 Cyclin-dependent kinase 2 Testis-specific serine/threonine-protein kinase 2 1.00 Cytochrome P450 1B1 Tubulin beta-1 chain 1.00 Fructose-1,6-bisphosphatase Receptor tyrosine-protein kinase erbB-3 1.00 Glucagon receptor Serine/threonine-protein kinase GAK % awk 'NR==int(5410*1/4)' set_compare_output.txt # first quartile 1.55 Cyclin-dependent kinase 4 Cyclin-dependent kinase-like 1 % awk 'NR==5410/2' set_compare_output.txt # median 1.24 Glyceraldehyde-3-phosphate dehydrogenase liver Histone-lysine N-methyltransferase, H3 lysine-9 specific 3 % awk 'NR==int(5410*3/4)' set_compare_output.txt # third quartile 7.62 Neuropeptide Y receptor type 1 Neuropeptide Y receptor type 2 % sort -n set_compare_output.txt | tail -5 300.00 Serine palmitoyltransferase 1 Serine palmitoyltransferase 2 300.00 Serine/threonine-protein kinase 24 Serine/threonine-protein kinase MST1 300.00 Sodium channel protein type I alpha subunit Sodium channel protein type XI alpha subunit 300.00 Ubiquitin carboxyl-terminal hydrolase 10 Ubiquitin carboxyl-terminal hydrolase 13 300.00 Xaa-Pro aminopeptidase 2 Xaa-Pro dipeptidase The largest possible score in my similarity metric is 300.0. These last few lines indicate perfect matches. Pre-compile sets (advanced topic) It took 4-5 seconds to load the dataset. This was less than 1% of the overall run-time, so optimizing it is usually not worthwhile. However, suppose you want to write a set similarity web service. You'll often end up reloading the server during development, and the 4-5 second wait each time will become annoying. One possibility is to create an FPB file for each of the subsets. The problem is there are over 1,400 sets. By design the FPB file uses a memory map for each file. Each memory map uses a file descriptor, and many OSes limit the number of file descriptors that a process may use. On my Mac, the default resource limit ("rlimit") is 256, though that can be increased. The way I usually solve this in chemfp is to store all of the subsets sequentially in a single FPB file, and have a ".range" file which specifies the start/end range for each set. By default the FPB file reorders the fingerprints by popcount, so the fingerprints with 0 on-bits comes first, then those with 1 on-bit, etc. When they are ordered that way then I can create the sublinear search index. I can disable reordering, so that the fingerprints are stored in the same order they were added to the FPB file. If I know that set A is between indices start and end then I can use arena[start:end] to get the subset. Create an FPB file with the sets in input order The following program reads the chembl_21.rdkit2048.fpb and chembl_sets.tsv file to compile a single FPB file named chembl_sets.fpb with the fingerprint sets, in order, and an range file named "chembl_sets.range" with the start/end indices of each set and its range. from __future__ import print_function, division import collections import chemfp # Load a file with lines of the form <set_name> <tab> <compound_id> # Example: "DNA-dependent protein kinase\tCHEMBL104450\n" # Return a dictionary mapping set name to a list of all of its ids def load_set_members(filename): set_members_table = collections.defaultdict(list) with open(filename) as infile: for line in infile: set_name, chembl_id = line.rstrip("\n").split("\t") set_members_table[set_name].append(chembl_id) # Turn the defaultdict into a dict so that a lookup of # a name which doesn't exist raises an exception instead # of creating and returning an empty list. return dict(set_members_table) # return a list of (id, fingerprint) pairs for the given ids in the arena def get_indices(arena, ids): indices = [] for id in ids: idx = arena.get_index_by_id(id) if idx is None: continue indices.append(idx) return indices def main(): chembl_21 = chemfp.load_fingerprints("chembl_21.rdkit2048.fpb") set_members_table = load_set_members("chembl_sets.tsv") with open("chembl_sets.ranges", "w") as range_file: with chemfp.open_fingerprint_writer("chembl_sets.fpb", reorder=False, metadata=chembl_21.metadata) as writer: start_index = 0 for name, chembl_ids in sorted(set_members_table.items()): indices = get_indices(chembl_21, chembl_ids) if not indices: continue # no ids found end_index = start_index + len(indices) range_file.write("%d\t%d\t%s\n" % (start_index, end_index, name)) writer.write_fingerprints(chembl_21.copy(indices=indices)) start_index = end_index if __name__ == "__main__": main() I ran it and generated the two files. The "chembl_sets.ranges" file starts with the following: 0 77 1-acylglycerol-3-phosphate O-acyltransferase beta 77 1834 11-beta-hydroxysteroid dehydrogenase 1 1834 1881 11-beta-hydroxysteroid dehydrogenase 2 1881 1885 14-3-3 protein gamma 1885 1992 15-hydroxyprostaglandin dehydrogenase [NAD+] 1992 2016 2-acylglycerol O-acyltransferase 2 2016 2019 25-hydroxyvitamin D-1 alpha hydroxylase, mitochondrial 2019 2025 26S proteasome non-ATPase regulatory subunit 14 2025 2027 3-beta-hydroxysteroid dehydrogenase/delta 5--<4-isomerase type I 2027 2030 3-keto-steroid reductase If you've been following along then nothing here should be new here in the code, except this line: writer.write_fingerprints(chembl_21.copy(indices=indices)) I use the indices to make a new arena containing just those fingerprints. (By default these fingerprints will be in popcount order, which comes in useful in a bit.) I then pass the new arena to write_fingerprints(). This function takes an iterator of (id, fingerprint) pairs, which is what an arena returns if you try to iterate over it. The end result is to save the selected ids and fingerprints to a file, in popcount order. Search using pre-compiled fingerprints It's tempting to load the FPB file, get the set arena using the start/end from the ranges file, and do the comparison. This will work, but it will be slow. The FPB file is not in popcount order and has no popcount index. This means that chemfp's sublinear search optimization will not be used. If the arena is not indexed by popcount then its slice, like unordered_arena[start:end] will also not be indexed by popcount. Making a simple copy() doesn't help because the copy() by default preserves the ordering. That is, the copy will be ordered if and only if the original arena is ordered. Instead, I need to tell the copy() to always reorder, with: unordered_arena[start:end].copy(reorder=True) As a further optimization, if the copy(reorder=True) notices that the input is already in popcount order then it will skip the step to sort() the fingerprints by popcount. That's the case for us since the compiled chembl_sets.fpb file was created by passing ordered subset arenas to write_fingerprints(), and iterating through ordered arenas returns the (id, fingerprints) in increasing popcount order. I changed the previous "set_compare.py" program to use the compiled sets file. I call this new program "set_compare2.py". The new code is the function "load_subsets()", which reads the .ranges file, gets subsets from the compiled FPB file, and makes an ordered copy of it. # set_compare2.py from __future__ import print_function, division import sys import collections import chemfp from chemfp import search import time # Ad hoc scoring function using the sum of scores. # Please don't use this for real work unless you've validated it. def sss(arena1, arena2, threshold=0.8): results = search.threshold_tanimoto_search_arena( arena1, arena2, threshold=threshold) # The scaling factor of 300 was chosen so that random ChEMBL # subsets have a score of about 0.01. It doesn't matter for # ranking as it's used as an arbitrary scaling factor. similarity = 300 * results.cumulative_score_all() / (len(arena1) * len(arena2)) return similarity class Subset(object): def __init__(self, name, arena): self.name = name self.arena = arena def load_subsets(arena, filename): subsets = [] with open(filename) as range_file: for line in range_file: start, end, name = line.rstrip("\n").split("\t") start = int(start) end = int(end) subset = Subset(name, arena[start:end].copy(reorder=True)) subsets.append(subset) return subsets def main(): chembl_sets_arena = chemfp.load_fingerprints("chembl_sets.fpb") start_time = time.time() subsets = load_subsets(chembl_sets_arena, "chembl_sets.ranges") load_time = time.time() N = len(subsets) for i in range(N-1): sys.stderr.write("\rProcessing %d/%d" % (i, N-1)) subset1 = subsets[i] for j in range(i+1, N): subset2 = subsets[j] score = sss(subset1.arena, subset2.arena) if score > 1.0: sys.stderr.write("\r \r") print("%.2f\t%s\t%s" % (score, subset1.name, subset2.name)) sys.stderr.write("\rProcessing %d/%d" % (i, N-1)) sys.stderr.write("\r \r") compare_time = time.time() print("load time:", load_time-start_time, file=sys.stderr) print("compare time:", compare_time-load_time, file=sys.stderr) if __name__ == "__main__": main() I ran "set_compare2.py" and compared the results to "set_compare.py". The output files were exactly the same, as expected. The biggest difference was the load time: # times from set_compare.py load time: 4.57191491127 compare time: 491.902451038 # times from set_compare2.py load time: 0.801107883453 compare time: 451.236635923 That saves nearly 4 seconds of load time. The run time also looks faster, but I think that's due to the variability of my desktop, which also has a few Firefox windows and more than a few tabs open. Z-score Up until now I've been using an ad hoc scoring function with an arbitrary scaling factor to make it so the background score is 0.01 across the range of input set sizes. It has a few flaws: 1) it has a maximum score of 300, which is unusual, 2) no one will understand how to interpret it, without specific experience with it, and 3) the standard deviation is a function of the input set sizes. One common technique to remove those flaws is to transform the score in a "z-score" or "standard score": zscore = (score - background_score) / background_standard_deviation The "background score" is about 0.01 for my scoring function, but the "background standard deviation" varies based on the input set sizes. I can determine it by generating enough (let's say 25) pairs of subsets containing randomly selecting fingerprints, and with the same sizes as the pair I'm interested in, then computing the standard deviation of all of those comparisons. With this approach my "300*" scaling factor is irrelevant as the numerator and denominator are equally scaled. The division by the product of the sizes also disappears, for the same reason. You likely see the downside of this scoring function - each Z-score requires 25 additional Tanimoto searches! There are a few ways to mitigate this. First, while I have 1,400+ sets, there are only 365 different set sizes. I can reuse values for multiple comparisons of the same pair of sizes, which will reduce the number of tests I need to do by about a factor of 15. Second, random sets rarely have matches with a 0.8 similarity. The sublinear index should quickly reject those obvious mismatches. Third, I could cache the values to the file system, database, or other permanent storage and re-use them for future searches. It won't help the first time, but I end up with mistakes in my code, or have something I want to tweak, and will often re-run the code many times before it finally works. Fourth, and not something I'll do in this essay (if at all), I could do some curve fitting and come up with an equation which does a reasonable job of interpolating or predicting values. (Oh, my. I was certainly optimistic by using 25 samples. After over an hour it had only processed 229 of 1408 elements. I killed the code, changed the number of samples to 10, and restarted it. I'm glad I put that cache in! If you do this for real, you might vary the number of samples as a function of the sizes. It's much harder getting a non-zero standard deviation for a 1x1 comparison than for a 1000x1000 comparison.) The biggest change to this code is a new "ZScorer" class, which replaces the "sss" function. The main entry point is "compute_zscore()". That in turn calls "compute_raw_score()" to compute the cumulative score between two arenas, and "compute_background_values()", which computes the mean and standard deviation for the requested set sizes. The latter function also caches to an internal dictionary. The ZScorer class also has a way to load and save the cache to a file. Before computing the set similarities I ask it to "load()" from the file named ""zscore.cache". I then wrapped the main code with a try/finally block so I can save() the cache no matter if the code went to completion or if it was interrupted by a ^C or coding error. I also modified the scoring threshold so the Z-score had to be at least 10 (that is, the difference from the average background must be at least 10 standard deviations). To make the data a bit more useful, I also included information about the number of elements in each set. These are node properties easily determined by getting the length of the corresponding arena. I also added the number of ids common to both sets (this is a new edge property). For this I turned to Python's native 'set' type. Each Subset instance make a set of arena ids: class Subset(object): def __init__(self, name, arena): self.name = name self.arena = arena self.id_set = set(arena.ids) # used to find intersection counts so the number of elements in common is: num_in_common = len(subset1.id_set & subset2.id_set) As a final note before presenting the code, I called this program "set_zcompare.py". It's based on the original "set_compare.py" and does not use the pre-compiled FPB file of set_compare2.py". # I call this "set_zcompare.py" from __future__ import print_function, division import sys import random import collections import time import numpy as np import chemfp from chemfp import search # Support both Python 2 and Python 3 try: xrange # Check if this is Python 2 except NameError: xrange = range # Use 'range' for Python 3 # Load a file with lines of the form <set_name> <tab> <compound_id> # Example: "DNA-dependent protein kinase\tCHEMBL104450\n" # Return a dictionary mapping set name to a list of all of its ids def load_set_members(filename): set_members_table = collections.defaultdict(list) with open(filename) as infile: for line in infile: set_name, chembl_id = line.rstrip("\n").split("\t") set_members_table[set_name].append(chembl_id) # Turn the defaultdict into a dict so that a lookup of # a name which doesn't exist raises an exception instead # of creating and returning an empty list. return dict(set_members_table) # Z-score based on the sum of the similar scores. Please don't use # this scoring function for real work unless you've validated it. def make_random_subarena(arena, n): indices = random.sample(xrange(len(arena)), n) return arena.copy(indices=indices) class ZScorer(object): def __init__(self, arena, threshold=0.8, num_samples=25): self.arena = arena self.threshold = threshold self.num_samples = num_samples self.cached_values = {} def compute_zscore(self, arena1, arena2): # The main entry point score = self.compute_raw_score(arena1, arena2) mean, std = self.compute_background_values(len(arena1), len(arena2)) if std == 0.0: return 0.0 return (score - mean) / std def compute_raw_score(self, arena1, arena2): return search.threshold_tanimoto_search_arena( arena1, arena2, threshold=self.threshold).cumulative_score_all() def compute_background_values(self, i, j): # The scoring function is symmetric so normalize so i <= j if i > j: i, j = j, i # Check if it exists in the cache key = (i, j) try: return self.cached_values[key] except KeyError: pass # Does not already exist, so compute the mean and standard deviation scores = [] for _ in range(self.num_samples): subarena1 = make_random_subarena(self.arena, i) subarena2 = make_random_subarena(self.arena, j) scores.append(self.compute_raw_score(subarena1, subarena2)) mean, std = np.mean(scores), np.std(scores) values = (mean, std) self.cached_values[key] = values # cache the result and return it return values def load(self, filename): # Load values from filename into self.cached_values try: infile = open(filename) except IOError: sys.stderr.write("Warning: cache file %r does not exist\n" % (filename,)) with infile: for line in infile: terms = line.split() i, j, mean, std = int(terms[0]), int(terms[1]), float(terms[2]), float(terms[3]) self.cached_values[(i, j)] = (mean, std) def save(self, filename): # Save values from self.cached_values into the named file with open(filename, "w") as outfile: for key, value in sorted(self.cached_values.items()): i, j = key mean, std = value outfile.write("%s %s %s %s\n" % (i, j, mean, std)) # Make a subset arena of the given arena using the given ids def create_subset(arena, ids): indices = [] for id in ids: idx = arena.get_index_by_id(id) if idx is not None: indices.append(idx) return arena.copy(indices=indices) class Subset(object): def __init__(self, name, arena): self.name = name self.arena = arena self.id_set = set(arena.ids) # used to find intersection counts def load_subsets(arena, set_members_table): subsets = [] for i, (name, ids) in enumerate(sorted(set_members_table.items())): set_arena = create_subset(arena, ids) if not set_arena: sys.stderr.write("No members: %s\n" % (name,)) continue subsets.append(Subset(name, set_arena)) return subsets def main(): chembl_21 = chemfp.load_fingerprints("chembl_21.rdkit2048.fpb") set_members_table = load_set_members("chembl_sets.tsv") zscorer = ZScorer(chembl_21, threshold=0.8, num_samples=10) # 25 took too much time zscorer.load("zscore.cache") try: start_time = time.time() subsets = load_subsets(chembl_21, set_members_table) load_time = time.time() print("name1\tsize1\tname2\tsize2\t#in_common\tzscore") # write a header N = len(subsets) for i in range(N-1): sys.stderr.write("\rProcessing %d/%d" % (i, N-1)) subset1 = subsets[i] for j in range(i+1, N): subset2 = subsets[j] zscore = zscorer.compute_zscore(subset1.arena, subset2.arena) if zscore > 10.0: # Use 10 standard deviations as my threshold of importance sys.stderr.write("\r \r") num_in_common = len(subset1.id_set & subset2.id_set) print("%s\t%d\t%s\t%d\t%d\t%.2f" % (subset1.name, len(subset1.arena), subset2.name, len(subset2.arena), num_in_common, zscore)) sys.stderr.write("\rProcessing %d/%d" % (i, N-1)) sys.stderr.write("\r \r") compare_time = time.time() finally: zscorer.save("zscore.cache") print("load time:", load_time-start_time, file=sys.stderr) print("compare time:", compare_time-load_time, file=sys.stderr) if __name__ == "__main__": main() I ran this and saved the 6351 lines of output to "chembl_target_network_zscore_10.tsv. The first few lines look like: name1 size1 name2 size2 #in_common zscore 1-acylglycerol-3-phosphate O-acyltransferase beta 77 TRAF2- and NCK-interacting kinase 16 0 23.56 11-beta-hydroxysteroid dehydrogenase 1 1757 11-beta-hydroxysteroid dehydrogenase 2 47 35 92.93 14-3-3 protein gamma 4 Androgen Receptor 847 0 13.05 14-3-3 protein gamma 4 Histone deacetylase 1 1598 0 12.41 14-3-3 protein gamma 4 Histone deacetylase 6 799 0 20.43 14-3-3 protein gamma 4 Histone deacetylase 8 396 0 33.06 15-hydroxyprostaglandin dehydrogenase [NAD+] 107 Aldehyde dehydrogenase 1A1 16 1 91.58 15-hydroxyprostaglandin dehydrogenase [NAD+] 107 Serine/threonine-protein kinase PIM1 540 0 17.87 15-hydroxyprostaglandin dehydrogenase [NAD+] 107 Serine/threonine-protein kinase PIM2 346 0 17.48 Network Visualization Code, code, everywhere, and not an image to see! How do I know that the above code works as I epxected, much less gives useful information which a biochemist might be interested in? Association network visualization is widely used in bioinformatics. With some pointers from Iain, I downloaded the Java-based Cytoscape 3.4 and used it to visualize the Z-score results from the previous section. Actually, that network proved too complicated to visualize. I used awk to reduce it to 1450 node pairs with a Z-score of at least 100, in the file named chembl_target_network_zscore_100.tsv. I loaded the file into Cytoscape (File->Import->Network->File...), selected chembl_target_network_zscore_100.tsv, made sure the 'name1' column was the query, 'size1' a query attribute, 'name2' was the target, 'size2' a target attribute, '#in_common' an integer node attribute, and 'zscore' a floating point node attribute. I tried different layouts, but the "preferred layout" seemed best. Here's a screenshot of one part: It looks reasonable, in that I know dopamine and serotonin are related. I'm neither a chemist nor biologist, nor do I play one on the internet. My goal was to show how you might use chemfp to do this sort of analysis. This image shows I'm in the right neighborhood, so it's good enough for me to say I'm done. Categories: FLOSS Project Planets ### Semaphore Community: Dockerizing a Python Django Web Application Planet Python - Mon, 2017-03-27 07:45 This article is brought with ❤ to you by Semaphore. Introduction This article will cover building a simple 'Hello World'-style web application written in Django and running it in the much talked about and discussed Docker. Docker takes all the great aspects of a traditional virtual machine, e.g. a self contained system isolated from your development machine, and removes many of the drawbacks such as system resource drain, setup time, and maintenance. When building web applications, you have probably reached a point where you want to run your application in a fashion that is closer to your production environment. Docker allows you to set up your application runtime in such a way that it runs in exactly the same manner as it will in production, on the same operating system, with the same environment variables, and any other configuration and setup you require. By the end of the article you'll be able to: • Understand what Docker is and how it is used, • Build a simple Python Django application, and • Create a simple Dockerfile to build a container running a Django web application server. What is Docker, Anyway? Docker's homepage describes Docker as follows: "Docker is an open platform for building, shipping and running distributed applications. It gives programmers, development teams, and operations engineers the common toolbox they need to take advantage of the distributed and networked nature of modern applications." Put simply, Docker gives you the ability to run your applications within a controlled environment, known as a container, built according to the instructions you define. A container leverages your machines resources much like a traditional virtual machine (VM). However, containers differ greatly from traditional virtual machines in terms of system resources. Traditional virtual machines operate using Hypervisors, which manage the virtualization of the underlying hardware to the VM. This means they are large in terms of system requirements. Containers operate on a shared Linux operating system base and add simple instructions on top to execute and run your application or process. The difference being that Docker doesn't require the often time-consuming process of installing an entire OS to a virtual machine such as VirtualBox or VMWare. Once Docker is installed, you create a container with a few commands and then execute your applications on it via the Dockerfile. Docker manages the majority of the operating system virtualization for you, so you can get on with writing applications and shipping them as you require in the container you have built. Furthermore, Dockerfiles can be shared for others to build containers and extend the instructions within them by basing their container image on top of an existing one. The containers are also highly portable and will run in the same manner regardless of the host OS they are executed on. Portability is a massive plus side of Docker. Prerequisites Before you begin this tutorial, ensure the following is installed to your system: Setting Up a Django web application Starting a Django application is easy, as the Django dependency provides you with a command line tool for starting a project and generating some of the files and directory structure for you. To start, create a new folder that will house the Django application and move into that directory.$ mkdir project $cd project Once in this folder, you need to add the standard Python project dependencies file which is usually named requirements.txt, and add the Django and Gunicorn dependency to it. Gunicorn is a production standard web server, which will be used later in the article. Once you have created and added the dependencies, the file should look like this:$ cat requirements.txt Django==1.9.4 gunicorn==19.6.0

With the Django dependency added, you can then install Django using the following command:

$pip install -r requirements.txt Once installed, you will find that you now have access to the django-admin command line tool, which you can use to generate the project files and directory structure needed for the simple "Hello, World!" application.$ django-admin startproject helloworld

Let's take a look at the project structure the tool has just created for you:

. ├── helloworld │ ├── helloworld │ │ ├── __init__.py │ │ ├── settings.py │ │ ├── urls.py │ │ └── wsgi.py │ └── manage.py └── requirements.txt

You can read more about the structure of Django on the official website. django-admin tool has created a skeleton application. You control the application for development purposes using the manage.py file, which allows you to start the development test web server for example:

$cd helloworld$ python manage.py runserver

The other key file of note is the urls.py, which specifies what URL's route to which view. Right now, you will only have the default admin URL which we won't be using in this tutorial. Lets add a URL that will route to a view returning the classic phrase "Hello, World!".

First, create a new file called views.py in the same directory as urls.py with the following content:

from django.http import HttpResponse def index(request): return HttpResponse("Hello, world!")

Now, add the following URL url(r'', 'helloworld.views.index') to the urls.py, which will route the base URL of / to our new view. The contents of the urls.py file should now look as follows:

from django.conf.urls import url from django.contrib import admin urlpatterns = [ url(r'^admin/', admin.site.urls), url(r'', 'helloworld.views.index'), ]

Now, when you execute the python manage.py runserver command and visit http://localhost:8000 in your browser, you should see the newly added "Hello, World!" view.

The final part of our project setup is making use of the Gunicorn web server. This web server is robust and built to handle production levels of traffic, whereas the included development server of Django is more for testing purposes on your local machine only. Once you have dockerized the application, you will want to start up the server using Gunicorn. This is much simpler if you write a small startup script for Docker to execute. With that in mind, let's add a start.sh bash script to the root of the project, that will start our application using Gunicorn.

#!/bin/bash # Start Gunicorn processes echo Starting Gunicorn. exec gunicorn helloworld.wsgi:application \ --bind 0.0.0.0:8000 \ --workers 3

The first part of the script writes "Starting Gunicorn" to the command line to show us that it is starting execution. The next part of the script actually launches Gunicorn. You use exec here so that the execution of the command takes over the shell script, meaning that when the Gunicorn process ends so will the script, which is what we want here.

You then pass the gunicorn command with the first argument of helloworld.wsgi:application. This is a reference to the wsgi file Django generated for us and is a Web Server Gateway Interface file which is the Python standard for web applications and servers. Without delving too much into WSGI, the file simply defines the application variable, and Gunicorn knows how to interact with the object to start the web server.

You then pass two flags to the command, bind to attach the running server to port 8000, which you will use to communicate with the running web server via HTTP. Finally, you specify workers which are the number of threads that will handle the requests coming into your application. Gunicorn recommends this value to be set at (2 x $num_cores) + 1. You can read more on configuration of Gunicorn in their documentation. Finally, make the script executable, and then test if it works by changing directory into the project folder helloworld and executing the script as shown here. If everything is working fine, you should see similar output to the one below, be able to visit http://localhost:8000 in your browser, and get the "Hello, World!" response.$ chmod +x start.sh $cd helloworld$ ../start.sh Starting Gunicorn. [2016-06-26 19:43:28 +0100] [82248] [INFO] Starting gunicorn 19.6.0 [2016-06-26 19:43:28 +0100] [82248] [INFO] Listening at: http://0.0.0.0:8000 (82248) [2016-06-26 19:43:28 +0100] [82248] [INFO] Using worker: sync [2016-06-26 19:43:28 +0100] [82251] [INFO] Booting worker with pid: 82251 [2016-06-26 19:43:28 +0100] [82252] [INFO] Booting worker with pid: 82252 [2016-06-26 19:43:29 +0100] [82253] [INFO] Booting worker with pid: 82253 Dockerizing the Application

You now have a simple web application that is ready to be deployed. So far, you have been using the built-in development web server that Django ships with the web framework it provides. It's time to set up the project to run the application in Docker using a more robust web server that is built to handle production levels of traffic.

Installing Docker

One of the key goals of Docker is portability, and as such is able to be installed on a wide variety of operating systems.

For this tutorial, you will look at installing Docker Machine on MacOS. The simplest way to achieve this is via the Homebrew package manager. Instal Homebrew and run the following:

$brew update && brew upgrade --all && brew cleanup && brew prune$ brew install docker-machine

With Docker Machine installed, you can use it to create some virtual machines and run Docker clients. You can run docker-machine from your command line to see what options you have available. You'll notice that the general idea of docker-machine is to give you tools to create and manage Docker clients. This means you can easily spin up a virtual machine and use that to run whatever Docker containers you want or need on it.

You will now create a virtual machine based on VirtualBox that will be used to execute your Dockerfile, which you will create shortly. The machine you create here should try to mimic the machine you intend to run your application on in production. This way, you should not see any differences or quirks in your running application neither locally nor in a deployed environment.

Create your Docker Machine using the following command:

$docker-machine create development --driver virtualbox --virtualbox-disk-size "5000" --virtualbox-cpu-count 2 --virtualbox-memory "4096" This will create your machine and output useful information on completion. The machine will be created with 5GB hard disk, 2 CPU's and 4GB of RAM. To complete the setup, you need to add some environment variables to your terminal session to allow the Docker command to connect the machine you have just created. Handily, docker-machine provides a simple way to generate the environment variables and add them to your session:$ docker-machine env development export DOCKER_TLS_VERIFY="1" export DOCKER_HOST="tcp://123.456.78.910:1112" export DOCKER_CERT_PATH="/Users/me/.docker/machine/machines/development" export DOCKER_MACHINE_NAME="development" # Run this command to configure your shell: # eval "$(docker-machine env development)" Complete the setup by executing the command at the end of the output:$(docker-machine env development)

Execute the following command to ensure everything is working as expected.

$docker images REPOSITORY TAG IMAGE ID CREATED SIZE You can now dockerize your Python application and get it running using the docker-machine. Writing the Dockerfile The next stage is to add a Dockerfile to your project. This will allow Docker to build the image it will execute on the Docker Machine you just created. Writing a Dockerfile is rather straightforward and has many elements that can be reused and/or found on the web. Docker provides a lot of the functions that you will require to build your image. If you need to do something more custom on your project, Dockerfiles are flexible enough for you to do so. The structure of a Dockerfile can be considered a series of instructions on how to build your container/image. For example, the vast majority of Dockerfiles will begin by referencing a base image provided by Docker. Typically, this will be a plain vanilla image of the latest Ubuntu release or other Linux OS of choice. From there, you can set up directory structures, environment variables, download dependencies, and many other standard system tasks before finally executing the process which will run your web application. Start the Dockerfile by creating an empty file named Dockerfile in the root of your project. Then, add the first line to the Dockerfile that instructs which base image to build upon. You can create your own base image and use that for your containers, which can be beneficial in a department with many teams wanting to deploy their applications in the same way. # Dockerfile # FROM directive instructing base image to build upon FROM python:2-onbuild It's worth noting that we are using a base image that has been created specifically to handle Python 2.X applications and a set of instructions that will run automatically before the rest of your Dockerfile. This base image will copy your project to /usr/src/app, copy your requirements.txt and execute pip install against it. With these tasks taken care of for you, your Dockerfile can then prepare to actually run your application. Next, you can copy the start.sh script written earlier to a path that will be available to you in the container to be executed later in the Dockerfile to start your server. # COPY startup script into known file location in container COPY start.sh /start.sh Your server will run on port 8000. Therefore, your container must be set up to allow access to this port so that you can communicate to your running server over HTTP. To do this, use the EXPOSE directive to make the port available: # EXPOSE port 8000 to allow communication to/from server EXPOSE 8000 The final part of your Dockerfile is to execute the start script added earlier, which will leave your web server running on port 8000 waiting to take requests over HTTP. You can execute this script using the CMD directive. # CMD specifcies the command to execute to start the server running. CMD ["/start.sh"] # done! With all this in place, your final Dockerfile should look something like this: # Dockerfile # FROM directive instructing base image to build upon FROM python:2-onbuild # COPY startup script into known file location in container COPY start.sh /start.sh # EXPOSE port 8000 to allow communication to/from server EXPOSE 8000 # CMD specifcies the command to execute to start the server running. CMD ["/start.sh"] # done! You are now ready to build the container image, and then run it to see it all working together. Building and Running the Container Building the container is very straight forward once you have Docker and Docker Machine on your system. The following command will look for your Dockerfile and download all the necessary layers required to get your container image running. Afterwards, it will run the instructions in the Dockerfile and leave you with a container that is ready to start. To build your container, you will use the docker build command and provide a tag or a name for the container, so you can reference it later when you want to run it. The final part of the command tells Docker which directory to build from.$ cd <project root directory> $docker build -t davidsale/dockerizing-python-django-app . Sending build context to Docker daemon 237.6 kB Step 1 : FROM python:2-onbuild # Executing 3 build triggers... Step 1 : COPY requirements.txt /usr/src/app/ ---> Using cache Step 1 : RUN pip install --no-cache-dir -r requirements.txt ---> Using cache Step 1 : COPY . /usr/src/app ---> 68be8680cbc4 Removing intermediate container 75ed646abcb6 Step 2 : COPY start.sh /start.sh ---> 9ef8e82c8897 Removing intermediate container fa73f966fcad Step 3 : EXPOSE 8000 ---> Running in 14c752364595 ---> 967396108654 Removing intermediate container 14c752364595 Step 4 : WORKDIR helloworld ---> Running in 09aabb677b40 ---> 5d714ceea5af Removing intermediate container 09aabb677b40 Step 5 : CMD /start.sh ---> Running in 7f73e5127cbe ---> 420a16e0260f Removing intermediate container 7f73e5127cbe Successfully built 420a16e0260f In the output, you can see Docker processing each one of your commands before outputting that the build of the container is complete. It will give you a unique ID for the container, which can also be used in commands alongside the tag. The final step is to run the container you have just built using Docker:$ docker run -it -p 8000:8000 davidsale/djangoapp1 Starting Gunicorn. [2016-06-26 19:24:11 +0000] [1] [INFO] Starting gunicorn 19.6.0 [2016-06-26 19:24:11 +0000] [1] [INFO] Listening at: http://0.0.0.0:9077 (1) [2016-06-26 19:24:11 +0000] [1] [INFO] Using worker: sync [2016-06-26 19:24:11 +0000] [11] [INFO] Booting worker with pid: 11 [2016-06-26 19:24:11 +0000] [12] [INFO] Booting worker with pid: 12 [2016-06-26 19:24:11 +0000] [17] [INFO] Booting worker with pid: 17

The command tells Docker to run the container and forward the exposed port 8000 to port 8000 on your local machine. After you run this command, you should be able to visit http://localhost:8000 in your browser to see the "Hello, World!" response. If you were running on a Linux machine, that would be the case. However, if running on MacOS, then you will need to forward the ports from VirtualBox, which is the driver we use in this tutorial so that they are accessible on your host machine.

\$ VBoxManage controlvm "development" natpf1 "tcp-port8000,tcp,,8000,,8000";

This command modifies the configuration of the virtual machine created using docker-machine earlier to forward port 8000 to your host machine. You can run this command multiple times changing the values for any other ports you require.

Once you have done this, visit http://localhost:8000 in your browser. You should be able to visit your dockerized Python Django application running on a Gunicorn web server, ready to take thousands of requests a second and ready to be deployed on virtually any OS on planet using Docker.

Next Steps

After manually verifying that the appication is behaving as expected in Docker, the next step is the deployment. You can use Semaphore's Docker platform for automating this process.

Conclusion

In this tutorial, you have learned how to build a simple Python Django web application, wrap it in a production grade web server, and created a Docker container to execute your web server process.

If you enjoyed working through this article, feel free to share it and if you have any questions or comments leave them in the section below. We will do our best to answer them, or point you in the right direction.

Categories: FLOSS Project Planets

### David MacIver: Fully Automated Luxury Boltzmann Sampling for Regular Languages

Planet Python - Mon, 2017-03-27 07:22

Suppose you have some regular language. That is, some language you can define through some mix of single characters, concatenation, repetition and alternation. e.g. the language defined by the regular expression (0|1)*2 which consists of any number of 0s and 1s followed by a single 2.

Suppose further instead of matching you want to generate an instance of such a regular language. This can be quite useful as a building block for building interesting data generators, so I’ve been looking into doing that in Hypothesis. Lucas Wiman’s revex is an example of an existing project doing similarly.

It’s easy to do this naively from our building blocks:

• To generate a single character, just return that character.
• To generate one of x|y, pick one of x or y uniformly at random and then generate from that.
• To generate xy, generate from x and y independently and concatenate the results.
• To generate from x* pick the number of repetitions (e.g. as a geometric distribution) and then generate that many instances and concatenate them.

This almost works but it has a major problem: It’s quite biased.

Consider the regular expression (00)|(1[0-9]). That is, we match either the string 00 or the string 1 followed by any digit. There are 11 strings matched by this language, but under our above model we’ll pick 00 half the time!

Additionally, it’s biased in a way that depends on the structure of our regular expression. We could just have easily have written this as (10)|((00)|(1[1-9)). Now we produce 10 half the time, 00 a quarter of the time, and one of the remaining 9 strings the remaining quarter of the time.

So how do we define a distribution that is independent of our particular choice of regular expression?

One goal that might be reasonable is to be unbiased: To produce every string in the language with equal probability. Unfortunately, this doesn’t make sense if our language is infinite: There is no uniform distribution on an infinite countable set.

We can however hope for a way that has the weaker property of being unbiased only among strings with the same length. So in the above example we’d have got a uniform distribution, but if we had say the language 0*1 of arbitrarily many zeros followed by a single 1, some lengths of string would be more likely than others.

There turns out to be a nice way of doing this!  They’re called Boltzmann Samplers (warning: The linked paper is quite information dense and I only understand about a third of it myself). The Boltzmann samplers for a language define a family of probability distributions over it, controlled by a single real valued parameter, $$x \in [0, 1]$$.

The idea is that you pick each string of length $$n$$ with probability proportional to $$x^n$$ (note: each string. So if there are $$k$$ strings of length $$n$$ then you pick some string of length $$n$$ with probability proportionate to $$k x^n$$).

In general there is not a defined Boltzmann sampler for every possible value of $$x$$. The closer $$x$$ is to $$1$$, the slower the probability of picking a string drops off. If $$x = 1$$ then we have a uniform distribution (so this is only possible when the language is finite). If $$x = 0$$ then we can only generate the empty string. The Boltzmann sampler will be defined as long as the corresponding infinite sum converges, which will definitely happen when $$x < \frac{1}{|A|}$$ (where $$A$$ is the alphabet for our language) but may happen for larger $$x$$ if the language is relatively constrained, and the average size of the strings will increase as $$x$$ increases.

It’s not a priori obvious that simulating a Boltzmann sampler is any easier than our original problem, but it turns out to be. The reason is that for certain primitive languages the Boltzmann sampler is easy to compute (e.g. for a language consisting only of a single fixed string it just returns that string), and for certain natural operations for combining languages (especially for our purposes disjoint unions) we can then simulate the Boltzmann sampler using the simulated Boltzmann samplers for the base languages.

We’ll see how this works in a moment, but first a digression.

An idea that turns out to be intimately connected to Boltzmann samplers is that of the counting generating function. If you have a language $$L$$, you can define $$g_L(x) = \sum\limits_{n=0}^\infty |L_n| x^n$$, where $$L_n$$ is the subset of the language consisting of strings.

Counting generating functions have a number of important properties about how they combine when you combine languages. The one that will be most relevant for us is that if $$L$$ and $$M$$ are two disjoint languages then $$g_{L \cup M}(x) = g_L(x) + g_M(x)$$. This is because $$(L \cup M)_n = L_n \cup L_m$$. Because these are disjoint this means that $$|(L \cup M)_n| = |L_n| + |M_n|$$. (It’s important to note that this doesn’t work when the languages aren’t disjoint: You over-count the intersection).

Counting generating functions will be useful because we can treat them as a sort of “mass function” that tells us how to weight our decisions: If we can go one of two ways, but one of them has a lot more strings below it, we should pick that one more often. This is the basis of the connection between counting generating functions and Boltzmann samplers.

In particular, if we have two disjoint languages $$L$$ and $$M$$ and we can simulate the Boltzmann sampler of parameter $$x$$ for each, we can simulate the Boltzmann sampler with parameter $$x$$ for $$L \cup M$$ as follows: We pick $$L$$ with probability $$\frac{g_L(x)}{g_L(x) + g_M(x)}$$, else we pick $$M$$, then we simulate the Boltzmann sampler for the language we ended up with.

This is almost exactly the same as our simulation from the beginnings, we’ve just changed the weightings! It’s important to remember that this only works if $$L$$ and $$M$$ are disjoint though.

This property, it turns out, is enough to compute a Boltzmann sampler for any regular language.

The key idea is this: If you compile a regular language to a deterministic finite automaton (DFA), you can now look at the languages matched from each state in that DFA. Each of these languages can be represented as a disjoint union of other languages, which gives us a system of equations for the counting generating functions that we can just solve.

If $$L[i]$$ is the language matched when starting from state i, then $$L[i] = \bigcup\limits_{c \in A} c L[\tau(i, c)] \cup E[i]$$, where $$\tau$$ is the transition function and $$E[i]$$ is the language that just matches the empty string if $$i$$ is an accepting state, else is the empty language.

That is, every string matched when starting from $$i$$ is either the empty string, or it is a single character prepended to a string from the language you get starting from the state that character causes a transition to. This is a disjoint union! Given a string in the union it is either empty, or you can look at its first character to identify which one of the branches it belongs to.

This means we can calculate the generating function as $$g_{L[i]}(x) = x \sum\limits_{c \in A} g_{L[\tau(i, c)]} + \epsilon[i]$$, where $$\epsilon[i]$$ is $$1$$ if this is an accepting state or $$0$$ otherwise.

(The extra factor of $$x$$ comes from the fact that we’ve added an extra character to the beginning language, so every string has its length increased by $$1$$).

But this is a linear system of equations in the generating functions! There’s that unknown variable $$x$$, but you can just treat that as meaning it’s a linear system of equations whose parameters live in the field of rational functions.

In particular, and importantly, we can use sympy to solve this rather than having to write our own linear solving routines (we can’t use a standard numpy-esque solver because of the symbolic parameter):

def compute_generating_functions(accepting, transitions): assert len(accepting) == len(transitions) n = len(accepting) z = sympy.Symbol('z', real=True) weights = {} for i in range(n): weights[(i, i)] = 1 for _, j in transitions[i].items(): key = (i, j) weights[key] = weights.get(key, 0) - z matrix = SparseMatrix( n, n, weights ) vector = sympy.Matrix(n, 1, list(map(int, accepting))) return z, matrix.LUsolve(vector)

This simultaneously solves the linear equations for all of the states of the DFA. So a result looks like this:

In []: rd.compute_generating_functions([False, False, True], [{0: 1}, {0: 2}, {0: 2}]) Out[]: (z, Matrix([ [z**2/(-z + 1)], [ z/(-z + 1)], [ 1/(-z + 1)]]))

This calculates the generating functions for the language that matches any string of two or more 0 bytes.

Now that we’ve calculated the generating functions, we can randomly walk the DFA to run the desired Boltzmann sampler! The counting generating function provides the weights for our random walks, and moving to one of the languages in our disjoint union is precisely either stopping the walk or transitioning to another state.

We start at the initial state, and at each stage we choose between one of $$|A| + 1$$ actions: We can either emit a single character and transition to the corresponding new state, or we can stop. Each of these are weighted by the generating function of the language applied to our Boltzmann sampling parameter: The option “halt now” is given weight $$1$$, and the action “emit character c” is given weight $$x g_{L[\tau(i, c)]}(x)$$ – i.e. $$x$$ times the state weight we’ve already calculated for the target state.

This is just a discrete distribution over a finite number of possibilities, so we can use the alias method to build a sampler for it.

Thus our algorithm becomes:

1. Build a DFA for the regular expression
2. Use sympy to calculate the counting generation function for the language matched starting from each state in our DFA
3. Pick a Boltzmann parameter
4. Calculate all the state weights
5. For each state (possibly lazily on demand) initialize an alias sampler for the actions that can be taken at that state
6. Repeatedly draw from the sampler, either drawing a stop action or drawing an emit character action. Where we emit a character, make the corresponding state transition.

I’m not going to claim that this approach is efficient by any stretch. The DFA may be exponential in the size of our starting regular expression, and we’re then solving an $$n \times n$$ linear equation on top of that. However for a lot of languages the resulting DFA won’t be too large, and this approach is fairly reasonable. It’s also nice as an existence proof.

Additionally, we can cache the first 5 steps and reuse them. Only the 6th step needs to be redone for new values of the parameter, and it at least is generally very fast – It’s O(n) in the length of the generated string, with pretty good constant factors as drawing from a pre-built alias sampler is generally very fast. You can also initialise the alias samplers lazily as you walk the graph.

If you want to see all of this fitting together, I created a small project with these concepts in it. It’s far from production ready, and is only lightly tested, but it seems to work pretty well based on initial experimentation and should be enough to understand the ideas.

(And if you want to see more posts like this, why not support my blog on Patreon?)

Categories: FLOSS Project Planets

### Zaki Akhmad: Test Python/Django Script

Planet Python - Mon, 2017-03-27 06:46

Now I am thinking a better approach on testing Python/Django script. I have a Python script which runs in a Django application. For example, this script will send an email with how many new users registered in this week.

So far, I use the manual approach, which requires me to have new user registered within this week. I am not very satisfied with this approach, as the script is getting complex, I need to manually prepare the data.

Any idea?

Categories: FLOSS Project Planets

### ADCI Solutions: Top 10 responsive Drupal themes

Planet Drupal - Mon, 2017-03-27 05:32

We love exploring Drupal themes. They save plenty of time when you need to be ahead of everybody and launch your website as soon as possible. They simplify the whole development process for a novice Drupaller. They are designed for you so that the only thing to worry about is content that you should add to your website.
In this article we present you the themes that cover one of Drupal out-of-box features - responsive design. Click here to continue.

Categories: FLOSS Project Planets

### Interview with Dolly

Planet KDE - Mon, 2017-03-27 05:17

Could you tell us something about yourself?

My nickname is Dolly, I am 11 years old, I live in Cannock, Staffordshire, England. I am at Secondary school, and at the weekends I attend drama, dance and singing lessons, I like drawing and recently started using the Krita app.

How did you find out about Krita?

My dad and my friend told me about it.

Do you draw on paper too, and which is more fun, paper or computer?

I draw on paper, and I like Krita more than paper art as there’s a lot more colours instantly available than when I do paper art.

What kind of pictures do you draw?

I mostly draw my original character (called Phantom), I draw animals, trees and stars too.

What is easy to do with Krita? What is difficult to do?

I think choosing the colour is easy, its really good, I find getting the right brush size a little difficult due to the scrolling needed to select the brush size.

Which thing about Krita is most fun?

The thing most fun for me is colouring in my pictures as there is a great range of colour available, far more than in my pencil case.

Is there anything in Krita that you’d like to be different?

I think Krita is almost perfect the way it is at the moment however if the brush selection expanded automatically instead of having to scroll through it would be better for me.

Can you show us a picture you made with Krita?

I can, I have attached some of my favourites that I have done for my friends.

How did you make it?

I usually start with the a standard base line made up of a circle for the face and the ears, I normally add the hair and the other features (eyes, noses and mouth) and finally colour and shade and include any accessories.

Is there anything else you’d like to tell us?

I really enjoy Krita, I think its one of the best drawing programs there is!

Categories: FLOSS Project Planets

### EuroPython Society: EuroPython 2017: Call for Proposals (CFP) is open

Planet Python - Mon, 2017-03-27 04:36

We’re looking for proposals on every aspect of Python: programming from novice to advanced levels, applications and frameworks, or how you have been involved in introducing Python into your organization. EuroPython is a community conference and we are eager to hear about your experience.

Please also forward this Call for Proposals to anyone that you feel may be interested.

Submissions will be open until Sunday, April 16, 23:59:59 CEST.

Please note that we will not have a second call for proposals as we did in 2016, so if you want to enter a proposal, please consider to do this in the next few days.

Presenting at EuroPython

We will accept a broad range of presentations, from reports on academic and commercial projects to tutorials and case studies. As long as the presentation is interesting and potentially useful to the Python community, it will be considered for inclusion in the program.

Can you show something new and useful? Can you show the attendees how to: use a module? Explore a Python language feature? Package an application? If so, please consider submitting a talk.

There are four different kinds of contributions that you can present at EuroPython:

• Regular Talk / approx. 150 slots

These are standard “talks with slides”, allocated in slots of

• 30 minutes
• 45 minutes
• 60 minutes

The Q&A session, if present, is included in the time slot. 3-5 Minutes for Q&A is a good practice. Please chose a time slot you see fit best to make your presentation in a compact way (So the audience may follow along but is not bored). We will only have a limited number of 60 minute slots available, so please only choose these slots for more in-depth sessions or topics which require more background information.

• Trainings / 20 slots.

Deep-dive into a subject with all details. These sessions are 2.5 - 3.5 hours long. The training attendees will be encouraged to bring a laptop. They should be prepared with less slides and more source code. Room capacity for the two trainings rooms is 70 and 180 seats.

• Panels

A panel is group of three to six experts plus a moderator discussing a matter in depth, an intensive exchange of (maybe opposite) opinions. A panel may be 60-90 minutes long. We have introduced this interactive format for EuroPython 2017 due to the many requests we have received to make the conference more interactive and have more challenging / mind-bending content in place. If you have any questions or if you want to discuss an idea for a panel upfront, please feel free to contact the Program WG to discuss.

• Interactive

This is a completely open 60-minute format. Feel free to make your suggestions. There are only two rules: it must be interactive, real-time human-to-human-interaction and of course compliant with the EuroPython Code of Conduct. If you want to discuss an idea upfront, please feel free to contact the Program WG to discuss.

• Posters / approx. 30 slots

Posters are a graphical way to describe a project or a technology, printed in large formats; posters are exhibited at the conference, can be read at any time by participants, and can be discussed face to face with their authors during the poster session.

• Helpdesk / 10 slots

Helpdesks are a great way to share your experience on a technology, by offering to help people answering their questions and solving their practical problems. You can run a helpdesk by yourself or with colleagues and friends. Each helpdesk will be open for 3 hours in total, 1.5 hours in the morning and 1.5 hours in the afternoon. People looking for help will sign up for a 30 minute slot and talk to you. There is no specific preparation needed; you just need to be proficient in the technology you run the helpdesk for.

Tracks

You may suggest your submission for a track. Tracks are groups of talks, covering the same domain (e.g. Django), all in the same room in a row. You may choose one of these specialized tracks:

• Business Track (running a business, being a freelancer)
• Django Track
• Educational Track
• Hardware/IoT Track
• Science Track
• Web Track
PyData @ EuroPython 2017

There will be a PyData track at this year’s conference. Please submit your papers for the PyData track through the EuroPython form and make sure to select “PyData” as sub community in the form.

Discounts for speakers and trainers

Since EuroPython is a not-for-profit community conference, it is not possible to pay out rewards for talks or trainings. Speakers of regular talks, panels, posters and interactive will instead have a special 25% discount on the conference ticket. Trainings get a 100% discount to compensate for the longer preparation time. Please note that we can not give discounts for helpdesks.

Topics and Goals

Suggested topics for EuroPython presentations include, but are not limited to:

• Core Python
• Alternative Python implementations: e.g. Jython, IronPython, PyPy, and Stackless
• Python libraries and extensions
• Python 2 to 3 migration
• Databases
• Documentation
• GUI Programming
• Game Programming
• Hardware (Sensors, RaspberryPi, Gadgets,…)
• Network Programming
• Open Source Python projects
• Packaging
• Programming Tools
• Project Best Practices
• Embedding and Extending
• Education, Science and Math
• Web-based Systems
• Use Cases
• Failures and Mistakes

Presentation goals are usually some of the following:

• Introduce the audience to a new topic
• Introduce the audience to new developments on a well-known topic
• Show the audience real-world usage scenarios for a specific topic (case study)
• Dig into advanced and relatively-unknown details on a topic
• Compare different solutions available on the market for a topic
Language for Talks & Trainings

Talks and trainings should, in general, be held in English.

Inappropriate Language and Imagery

Please consider that EuroPython is a conference with an audience from a broad geographical area which spans countries and regions with vastly different cultures. What might be considered a “funny, inoffensive joke” in a region might be really offensive (if not even unlawful) in another. If you want to add humor, references and images to your talk, avoid any choice that might be offensive to a group which is different from yours, and pay attention to our EuroPython Code of Conduct.

Community Based Talk Voting

Attendees who have bought a ticket in time for the Talk Voting period gain the right to vote for talks submitted during the Call For Proposals.

The Program WG will also set aside a number of slots which they will then select based on other criteria to e.g. increase diversity or give a chance to less mainstream topics.

Release agreement for submissions

All submissions will be made public during the community talk voting, to allow all registrants to discuss the proposals. After finalizing the schedule, talks that are not accepted will be removed from the public website. Accepted submissions will stay online for the foreseeable future.

We also ask all speakers/trainers to:

• accept the video recording of their presentation

• upload their talk materials to the EuroPython website

• accept the EuroPython Speaker Release Agreement which allows the EPS to make the talk recordings and uploaded materials available under a CC BY-NC-SA license

To simplify the organization, we ask all speakers and trainers to accept the video recording and publishing of their session. All talks will be recorded. Whether trainings will be recorded as well, is not yet clear. Please contact our Program WG Helpdesk for details, if you would rather not like your training to be recorded.

Talk slides will be made available on the EuroPython web site. Talk video recordings will be uploaded to the EuroPython YouTube channel and archived on archive.org.

For more privacy related information, please consult our privacy policy.

Contact

For further questions, feel free to contact our Program WG Helpdesk

Categories: FLOSS Project Planets

### EuroPython: EuroPython 2017: Call for Proposals (CFP) is open

Planet Python - Mon, 2017-03-27 04:17

We’re looking for proposals on every aspect of Python: programming from novice to advanced levels, applications and frameworks, or how you have been involved in introducing Python into your organization. EuroPython is a community conference and we are eager to hear about your experience.

Please also forward this Call for Proposals to anyone that you feel may be interested.

Submissions will be open until Sunday, April 16, 23:59:59 CEST.

Please note that we will not have a second call for proposals as we did in 2016, so if you want to enter a proposal, please consider to do this in the next few days.

Presenting at EuroPython

We will accept a broad range of presentations, from reports on academic and commercial projects to tutorials and case studies. As long as the presentation is interesting and potentially useful to the Python community, it will be considered for inclusion in the program.

Can you show something new and useful? Can you show the attendees how to: use a module? Explore a Python language feature? Package an application? If so, please consider submitting a talk.

There are four different kinds of contributions that you can present at EuroPython:

• Regular Talk / approx. 150 slots

These are standard “talks with slides”, allocated in slots of

• 30 minutes
• 45 minutes
• 60 minutes

The Q&A session, if present, is included in the time slot. 3-5 Minutes for Q&A is a good practice. Please chose a time slot you see fit best to make your presentation in a compact way (So the audience may follow along but is not bored). We will only have a limited number of 60 minute slots available, so please only choose these slots for more in-depth sessions or topics which require more background information.

• Trainings / 20 slots.

Deep-dive into a subject with all details. These sessions are 2.5 - 3.5 hours long. The training attendees will be encouraged to bring a laptop. They should be prepared with less slides and more source code. Room capacity for the two trainings rooms is 70 and 180 seats.

• Panels

A panel is group of three to six experts plus a moderator discussing a matter in depth, an intensive exchange of (maybe opposite) opinions. A panel may be 60-90 minutes long. We have introduced this interactive format for EuroPython 2017 due to the many requests we have received to make the conference more interactive and have more challenging / mind-bending content in place. If you have any questions or if you want to discuss an idea for a panel upfront, please feel free to contact the Program WG to discuss.

• Interactive

This is a completely open 60-minute format. Feel free to make your suggestions. There are only two rules: it must be interactive, real-time human-to-human-interaction and of course compliant with the EuroPython Code of Conduct. If you want to discuss an idea upfront, please feel free to contact the Program WG to discuss.

• Posters / approx. 30 slots

Posters are a graphical way to describe a project or a technology, printed in large formats; posters are exhibited at the conference, can be read at any time by participants, and can be discussed face to face with their authors during the poster session.

• Helpdesk / 10 slots

Helpdesks are a great way to share your experience on a technology, by offering to help people answering their questions and solving their practical problems. You can run a helpdesk by yourself or with colleagues and friends. Each helpdesk will be open for 3 hours in total, 1.5 hours in the morning and 1.5 hours in the afternoon. People looking for help will sign up for a 30 minute slot and talk to you. There is no specific preparation needed; you just need to be proficient in the technology you run the helpdesk for.

Tracks

You may suggest your submission for a track. Tracks are groups of talks, covering the same domain (e.g. Django), all in the same room in a row. You may choose one of these specialized tracks:

• Business Track (running a business, being a freelancer)
• Django Track
• Educational Track
• Hardware/IoT Track
• Science Track
• Web Track
PyData @ EuroPython 2017

There will be a PyData track at this year’s conference. Please submit your papers for the PyData track through the EuroPython form and make sure to select “PyData” as sub community in the form.

Discounts for speakers and trainers

Since EuroPython is a not-for-profit community conference, it is not possible to pay out rewards for talks or trainings. Speakers of regular talks, panels, posters and interactive will instead have a special 25% discount on the conference ticket. Trainings get a 100% discount to compensate for the longer preparation time. Please note that we can not give discounts for helpdesks.

Topics and Goals

Suggested topics for EuroPython presentations include, but are not limited to:

• Core Python
• Alternative Python implementations: e.g. Jython, IronPython, PyPy, and Stackless
• Python libraries and extensions
• Python 2 to 3 migration
• Databases
• Documentation
• GUI Programming
• Game Programming
• Hardware (Sensors, RaspberryPi, Gadgets,…)
• Network Programming
• Open Source Python projects
• Packaging
• Programming Tools
• Project Best Practices
• Embedding and Extending
• Education, Science and Math
• Web-based Systems
• Use Cases
• Failures and Mistakes

Presentation goals are usually some of the following:

• Introduce the audience to a new topic
• Introduce the audience to new developments on a well-known topic
• Show the audience real-world usage scenarios for a specific topic (case study)
• Dig into advanced and relatively-unknown details on a topic
• Compare different solutions available on the market for a topic
Language for Talks & Trainings

Talks and trainings should, in general, be held in English.

Inappropriate Language and Imagery

Please consider that EuroPython is a conference with an audience from a broad geographical area which spans countries and regions with vastly different cultures. What might be considered a “funny, inoffensive joke” in a region might be really offensive (if not even unlawful) in another. If you want to add humor, references and images to your talk, avoid any choice that might be offensive to a group which is different from yours, and pay attention to our EuroPython Code of Conduct.

Community Based Talk Voting

Attendees who have bought a ticket in time for the Talk Voting period gain the right to vote for talks submitted during the Call For Proposals.

The Program WG will also set aside a number of slots which they will then select based on other criteria to e.g. increase diversity or give a chance to less mainstream topics.

Release agreement for submissions

All submissions will be made public during the community talk voting, to allow all registrants to discuss the proposals. After finalizing the schedule, talks that are not accepted will be removed from the public website. Accepted submissions will stay online for the foreseeable future.

We also ask all speakers/trainers to:

• accept the video recording of their presentation

• upload their talk materials to the EuroPython website

• accept the EuroPython Speaker Release Agreement which allows the EPS to make the talk recordings and uploaded materials available under a CC BY-NC-SA license

To simplify the organization, we ask all speakers and trainers to accept the video recording and publishing of their session. All talks will be recorded. Whether trainings will be recorded as well, is not yet clear. Please contact our Program WG Helpdesk for details, if you would rather not like your training to be recorded.

Talk slides will be made available on the EuroPython web site. Talk video recordings will be uploaded to the EuroPython YouTube channel and archived on archive.org.

For more privacy related information, please consult our privacy policy.

Contact

For further questions, feel free to contact our Program WG Helpdesk

Categories: FLOSS Project Planets

### Talk Python to Me: #105 A Pythonic Database Tour

Planet Python - Mon, 2017-03-27 04:00
There are many reasons it's a great time to be a developer. One of them is because there are so many choices around data access and databases. So this week we take tour with our guest Jim Fulton of some databases you may not have heard of or given a try. <br/> <br/> You'll hear about the pure Python database ZODB. There's Zero DB, an end-to-end encrypted database in which the database server knows nothing about the data it is storing, and NewtDb spanning the world of ZODB and JSON friendly Postgres. <br/> <br/> Links from the show: <br/> <div style="font-size: .85em;"> <br/> <b>Jim on Twitter</b>: <a href='https://twitter.com/j1mfulton' target='_blank'>@j1mfulton</a> <br/> <b>ZODB</b>: <a href='http://www.zodb.org/en/latest/' target='_blank'>zodb.org</a> <br/> <b>ZODB Book</b>: <a href='https://zodb.readthedocs.io/en/latest/' target='_blank'>zodb.readthedocs.io</a> <br/> <b>ZeroDB</b>: <a href='https://opensource.zerodb.com/' target='_blank'>opensource.zerodb.com</a> <br/> <b>NewtDb</b>: <a href='http://www.newtdb.org/en/latest/' target='_blank'>newtdb.org</a> <br/> <b>Buildout</b>: <a href='http://docs.buildout.org/en/latest/' target='_blank'>docs.buildout.org</a> <br/> <b>Two-tiered Kanban</b>: <a href='https://github.com/feature-flow/twotieredkanban' target='_blank'>github.com/feature-flow/twotieredkanban</a> <br/> <b>Jim's Webcast: Why Postgres Should be your Document Database</b>: <a href='https://blog.jetbrains.com/pycharm/2017/03/why-postgres-should-be-your-document-database-webinar-recording/' target='_blank'>blog.jetbrains.com/pycharm/2017/03/why-postgres-should-be-your-document-database-webinar-recording</a> <br/> <br/> <strong>Sponsored items</strong> <br/> <b>GetStream Feed API</b>: <a href='https://getstream.io/try-the-api/?utm_source=talkpython&utm_campaign=talkpython105&utm_medium=talkpython' target='_blank'>talkpython.fm/getstream</a> <br/> <b>Our courses</b>: <a href='https://training.talkpython.fm/' target='_blank'>training.talkpython.fm</a> <br/> <b>Podcast's Patreon</b>: <a href='https://www.patreon.com/mkennedy' target='_blank'>patreon.com/mkennedy</a> <br/> </div>
Categories: FLOSS Project Planets