Russ Allbery: Spring haul

Planet Debian - Sat, 2017-03-25 17:21

Work has been hellishly busy lately, so that's pretty much all I've been doing. The major project I'm working on should be basically done in the next couple of weeks, though (fingers crossed), so maybe I'll be able to surface a bit more after that.

In the meantime, I'm still acquiring books I don't have time to read, since that's my life. In this case, two great Humble Book Bundles were too good of a bargain to pass up. There are a bunch of books in here that I already own in paperback (and hence showed up in previous haul posts), but I'm running low on shelf room, so some of those paper copies may go to the used bookstore to make more space.

Kelley Armstrong — Lost Souls (sff)
Clive Barker — Tortured Souls (horror)
Jim Butcher — Working for Bigfoot (sff collection)
Octavia E. Butler — Parable of the Sower (sff)
Octavia E. Butler — Parable of the Talents (sff)
Octavia E. Butler — Unexpected Stories (sff collection)
Octavia E. Butler — Wild Seed (sff)
Jacqueline Carey — One Hundred Ablutions (sff)
Richard Chizmar — A Long December (sff collection)
Jo Clayton — Skeen's Leap (sff)
Kate Elliot — Jaran (sff)
Harlan Ellison — Can & Can'tankerous (sff collection)
Diana Pharoh Francis — Path of Fate (sff)
Mira Grant — Final Girls (sff)
Elizabeth Hand — Black Light (sff)
Elizabeth Hand — Saffron & Brimstone (sff collection)
Elizabeth Hand — Wylding Hall (sff)
Kevin Hearne — The Purloined Poodle (sff)
Nalo Hopkinson — Skin Folk (sff)
Katherine Kurtz — Camber of Culdi (sff)
Katherine Kurtz — Lammas Night (sff)
Joe R. Lansdale — Fender Lizards (mainstream)
Robert McCammon — The Border (sff)
Robin McKinley — Beauty (sff)
Robin McKinley — The Hero and the Crown (sff)
Robin McKinley — Sunshine (sff)
Tim Powers — Down and Out in Purgatory (sff)
Cherie Priest — Jacaranda (sff)
Alastair Reynolds — Deep Navigation (sff collection)
Pamela Sargent — The Shore of Women (sff)
John Scalzi — Miniatures (sff collection)
Lewis Shiner — Glimpses (sff)
Angie Thomas — The Hate U Give (mainstream)
Catherynne M. Valente — The Bread We Eat in Dreams (sff collection)
Connie Willis — The Winds of Marble Arch (sff collection)
M.K. Wren — Sword of the Lamb (sff)
M.K. Wren — Shadow of the Swan (sff)
M.K. Wren — House of the Wolf (sff)
Jane Yolen — Sister Light, Sister Dark (sff)

Categories: FLOSS Project Planets

Community Over Code: What Apache Also Needs In A Board

Planet Apache - Sat, 2017-03-25 15:24

Some great recent discussions around the upcoming member’s meeting have got me to thinking about the larger question: how can the ASF as an organization function better, and how does the board effect that? I think there is one more important concept in a board that the ASF needs to have, along with oversight and vision.

We need a board that can foster and support environments where our many volunteers can be productive in groups in performing the work of the Foundation. Our projects run many of these environments, and do a great job for their parts. But there’s a lot more work that happens at the ASF than just the code you’ve been working on today.

What Kinds Of Work Happen At Apache?

The code. The most obvious kind of work at the ASF is the simplest: all the Apache projects writing code and creating software releases for the public good. This work is all handled inside of each Apache project by the volunteer committers there. The board can help a by promoting clear documentation from the various ASF services offered to projects, and by balancing a few specific technical needs from some projects (think: big data projects need big test environments!) with the capacity and smooth running of our crack Apache Infra team’s provided services to all projects.

The communities. The most important work at the ASF is the independent governance that Apache projects provide to their communities. PMCs manage their own communities independently and have many different styles, but all share some common behaviors. On-list discussion, [VOTE]s, release management: these are all required behaviors, as is independence from commercial influence. These behaviors are to some degree separate from the technical maintenance of the code and are may sometimes be outside of some committers’ typical skill sets.

Here, the board has two roles: oversight of quarterly reports and mentoring PMCs when needed. The board needs to help all the support groups at Apache – Incubator, infra, press, brand, ComDev – to provide the best services to the projects. If PMCs do have community problems, it’s up to the board to nudge a community to fix itself before taking more official action. The board needs to make sure guidance is clear and respectful of the PMC’s volunteers – and explains why changes might be needed.

The Foundation. Keeping the corporation that provides the legal and infrastructure home for all our communities is a complex task, run by volunteers, that is often underappreciated. We need a board that makes the roles and responsibilities clear and consistently documented. This respects the current volunteers doing the work, and ensures that Members looking to help know where and how to start volunteering. Just as projects want to attract new committers, the Foundation needs to attract Members who will step up to help with the work behind the scenes.

The Common Thread – Volunteers

In each of these cases, the ASF relies on volunteers [1] to keep our organization running. In many cases, these volunteers are doing these Apache tasks outside of their primary day-to-day job, meaning they have limited cycles. In most cases (besides writing code), these volunteers are doing Apache tasks that are probably not their primary skillset – meaning education and documentation are important.

We need a board that will create the documentation, foster the culture, and will help promote the kinds of environments where these volunteer groups can succeed in both performing their Apache tasks, as well as drawing in new volunteers to help.

The ASF has proven ability at how to do this on the technical project level: we’ve defined many ways that open source projects can be successful long-term. We need to ensure we can consistently apply these techniques to our Foundation-level governance and operations as well. As our number of far-flung projects and committers increase, we need to make sure that our Membership can easily step up to volunteer for the corporate operations and services we provide to projects.


[1] While the ASF contracts some services – paid infra staff, accounting and tax support, etc. to paid positions, officers overseeing these positions are volunteers. We have been consistent in keeping governance decision making as volunteers, even where the day-to-day work may be staff (infra, EA) or contracted (press, accounting, legal).

The post What Apache Also Needs In A Board appeared first on Community Over Code.

Categories: FLOSS Project Planets

Last Call Media: What if you wanted to move faster?

Planet Drupal - Sat, 2017-03-25 13:00
What if you wanted to move faster? Rob Sat, 03/25/2017 - 13:00 Continuous Integration & Delivery: Resources
Categories: FLOSS Project Planets

Packaging Ishiiruka-Dolphin (GameCube/Wii Emulator)

Planet KDE - Sat, 2017-03-25 12:50

You may have heard about Dolphin, not our file manager but the GameCube and Wii emulator of the same name. What you may not have heard of is Ishiiruka, a fork of Dolphin that prioritizes performance over emulation accuracy – and clean code if comments by an upstream Dolphin author on Reddit are to be believed.

Although Ishiiruka began as a reaction to remove the Direct3D 9 renderer in the Windows version of Dolphin (which is probably why the Linux community ignored it for the most part), it also began to tackle other performance issues such as “micro stuttering”.

Recently the Git master branch of Ishiiruka shipped compilation fixes for Linux, so I decided to dust off my old dolphin-emu.spec file and give it a try (I’m hardly an expert packager). So after some dabbling I succeeded. For now only Fedora 24, Fedora 25, and openSUSE Tumbleweed are supported. The packages are available from https://software.opensuse.org/package/ishiiruka-dolphin-unstable.

openSUSE Leap requires some workaround because it defaults to GCC 4. I plan to look into it at a later time. Once Tino creates a new Stable branch that incorporates the Linux fixes, I’ll post it under https://software.opensuse.org/package/ishiiruka-dolphin.

If anyone of you is interested in Arch, Debian, Ubuntu,… packages (anything supported by OBS), I’ll gladly accept Submit Requests for PKGBUILD etc. files at https://build.opensuse.org/project/show/home:KAMiKAZOW:Emulators.

Categories: FLOSS Project Planets

Drupal core announcements: What do you change about the Admin menu for your clients? We need to know.

Planet Drupal - Sat, 2017-03-25 11:55

A number of issues indicate that Drupal's core IA could be better, and solutions proposed range from isolated fixes to totally restructuring the admin menu.

At the same time, it is pretty standard that clients need customizations to the default Admin menu structure. It seems a lot of us might be doing the same thing, over-and-over.
To find sustainable solutions, we need your real world experiences.

Do you build Drupal 8 sites for clients?
Do you ever change the admin interface by adding or removing items? Do you replace it?
Are there certain modifications you make to almost all projects?

If so, please tell us your examples and experiences at https://drupal.org/node/2863330

Categories: FLOSS Project Planets

Eddy Petrișor: LVM: Converting root partition from linear to raid1 leads to boot failure... and how to recover

Planet Debian - Sat, 2017-03-25 11:39
I have a system which has 3 distinct HDDs used as physucal volumes for Linux LVM. One logical volume is the root partition and it was initally created as a linear LV (vg0/OS).
Since I have PV redundancy, I thought it might be a good idea to convert the root LV from liear to raid1 with 2 mirrors.

WARNING: It seems LVM raid1 logicalvolume for / is not supported with grub2, at least not with Ubuntu's 2.02~beta2-9ubuntu1.6 (14.04LTS) or Debian Jessie's grub-pc 2.02~beta2-22+deb8u1!

So I did this:
lvconvert -m2 --type raid1 vg0/OS

Then I restarted to find myself at the 'grub rescue>' prompt.

The initial problem was seen on an Ubuntu 14.04 LTS (aka trusty) system, but I reproduced it on a VM with Debian Jessie.

I downloaded the Super Grub2 Disk and tried to boot the VM. After choosing the option to load the LVM and RAID support, I was able to boot my previous system.

I tried several times to reinstall GRUB, thinking that was the issue, but I always got this  kind of error:

/usr/sbin/grub-probe: error: disk `lvmid/QtJiw0-wsDf-A2zh-2v2y-7JVA-NhPQ-TfjQlN/phCDlj-1XAM-VZnl-RzRy-g3kf-eeUB-dBcgmb' not found.

In the end, after digging for more than 4 hours for answers,  I decided I might be able to revert the config to linear configuration, from the (initramfs) prompt.

Initally the LV was inactive, so I activated it:

lvchange -a y /dev/vg0/OS

Then restored the LV to linear:

lvconvert -m0 vg0/OS

Then tried to reboot without reinstalling GRUB, just for kicks, which succeded.

In order to confirm this was the issue, I redid the whole thing, and indeed, with a raid1 root, I always got the error lvmid error.

I'll have to check on Monday at work if I can revert it the same way the Ubuntu 14.04 system, but I suspect I will have no issues.

Is it true root on lvm-raid1 is nto supported?
Categories: FLOSS Project Planets

Urvika Gola: Speaking at FOSSASIA’17 | Seasons of Debian : Summer of Code & Winter of Outreachy

Planet Debian - Sat, 2017-03-25 05:10

I got an amazing chance to speak at FOSSASIA 2017 held at Singapore on “Seasons of Debian – Summer of Code and Winter of Outreachy“. I gave a combined talk with my co-speaker Pranav Jain, who contributed to Debian through GSoC. We talked about two major open source initiatives – Outreachy and Google Summer of Code and the work we did on a common project – Lumicall under Debian.

The excitement started even before the first day! On 16th March, there was a speakers meetup at Microsoft office in Singapore. There, I got the chance to connect with other speakers and learn about their work The meetup was concluded by Microsoft Office tour! As a student it was very exciting to see first hand the office of a company that I had only dreamt of being at.

On 17th March, i.e the first day of the three days long conference, I met Hong Phuc Dang, Founder of FOSSASIA. She is very kind and just talking to her just made me cheerful!
Meeting so many great developers from different organizations was exciting.

On 18th March, was the day of our talk!  I was a bit nervous to speak in front of amazing developers but, that’s how you grow

Categories: FLOSS Project Planets

PyBites: Code Challenge 11 - Generators for Fun and Profit - Review

Planet Python - Sat, 2017-03-25 04:10

It's end of the week again so we review the code challenge of this week. It's never late to sign up, just fork our challenges repo and start coding.

Categories: FLOSS Project Planets

OSTraining: Creating a Drupal 8 Private File System

Planet Drupal - Fri, 2017-03-24 21:01

An OSTraining member asked how they could set up Drupal 8 private file system.

In Drupal 7 you could do this from the configuration at Administrator > Configuration > Media > File System. In Drupal 8 we have to set the private path manually.

Categories: FLOSS Project Planets

Justin Mason: Links for 2017-03-24

Planet Apache - Fri, 2017-03-24 19:58
  • That thing about pwning N26

    Whitehat CCC hacker thoroughly pwns N26 bank — there’s a lot of small leaks and insecurities here. Sounds like N26 are dealing with them though

    (tags: ccc hacks exploits n26 banks banking security)

  • ‘For decades, the transaction concept has played a central role in database research and development. Despite this prominence, transactional databases today often surface much weaker models than the classic serializable isolation guarantee—and, by default, far weaker models than alternative,“strong but not serializable” models such as Snapshot Isolation. Moreover, the transaction concept requires the programmer’s involvement: should an application programmer fail to correctly use transactions by appropriately encapsulating functionality, even serializable transactions will expose programmers to errors. While many errors arising from these practices may be masked by low concurrency during normal operation, they are susceptible to occur during periods of abnormally high concurrency. By triggering these errors via concurrent access in a deliberate attack, a determined adversary could systematically exploit them for gain. In this work, we defined the problem of ACIDRain attacks and introduced 2AD, a lightweight dynamic analysis tool that uses traces of normal database activity to detect possible anomalous behavior in applications. To enable 2AD, we extended Adya’s theory of weak isolation to allow efficient reasoning over the space of all possible concurrent executions of a set of transactions based on a concrete history, via a new concept called an abstract history, which also applies to API calls. We then applied 2AD analysis to twelve popular self-hosted eCommerce applications, finding 22 vulnerabilities spread across all but one application we tested, affecting over 50% of eCommerce sites on the Internet today. We believe that the magnitude and the prevalence of these vulnerabilities to ACIDRain attacks merits a broader reconsideration of the success of the transaction concept as employed by programmers today, in addition to further pursuit of research in this direction. Based on our early experiences both performing ACIDRain attacks on self-hosted applications as well as engaging with developers, we believe there is considerable work to be done in raising awareness of these attacks—for example, via improved analyses and additional 2AD refinement rules (including analysis of source code to better highlight sources of error)—and in automated methods for defending against these attacks—for example, by synthesizing repairs such as automated isolation level tuning and selective application of SELECT FOR UPDATE mechanisms. Our results here—as well as existing instances of ACIDRain attacks in the wild—suggest there is considerable value at stake.’

    (tags: databases transactions vulnerability security acidrain peter-bailis storage isolation acid)

  • Scientists made a detailed “roadmap” for meeting the Paris climate goals. It’s eye-opening. – Vox

    tl;dr: this is not going to happen and we are fucked.

    (tags: climate environment global-warming science roadmap future grim-meathook-future)

  • HyperBitBit

    jomsdev notes: ‘Last year, in the AofA’16 conference Robert Sedgewick proposed a new algorithm for cardinality estimation. Robert Sedgwick is a professor at Princeton with a long track of publications on combinatorial/randomized algorithms. He was a good friend of Philippe Flajolet (creator of Hyperloglog) and HyperBitBit it’s based on the same ideas. However, it uses less memory than Hyperloglog and can provide the same results. On practical data, HyperBitBit, for N < 2^64 estimates cardinality within 10% using only 128 + 6 bits.'

    (tags: algorithms programming cs hyperloglog estimation cardinality counting hyperbitbit)

Categories: FLOSS Project Planets

Laptop freezing -- figuring out the issues

Planet KDE - Fri, 2017-03-24 17:55

Hi all, I have an awesome laptop I bought from my son, a hardcore gamer. So used, but also very beefy and well-cared-for. Lately, however, it has begun to freeze, by which I mean: the screen is not updated, and no keyboard inputs are accepted. So I can't even REISUB; the only cure is the power button.

I like to leave my laptop running overnight for a few reasons -- to get IRC posts while I sleep, to serve *ubuntu ISO torrents, and to run Folding@Home.

Attempting to cure the freezing, I've updated my graphics driver, rolled back to an older kernel, removed my beloved Folding@Home application, turned on the fan overnight, all to no avail. After adding lm-sensors and such, it didn't seem likely to be overheating, but I'd like to be sure about that.

Lately I turned off screen dimming at night and left a konsole window on the desktop running `top`. This morning I found a freeze again, with nothing apparent in the top readout:

So I went looking on the internet and found this super post: Using KSysGuard: System monitor tool for KDE. The first problem was that when I hit Control+Escape, I could not see the System Load tab he mentioned or any way to create a custom tab. However, when I started Ksysguard from the commandline, it matches the screenshots in the blog.

Here is my custom tab:

So tonight I'll leave that on my screen along with konsole running `top` and see if there is any more useful information.

Categories: FLOSS Project Planets

Palantir: MidCamp Madness

Planet Drupal - Fri, 2017-03-24 17:25
MidCamp Madness brandt Fri, 03/24/2017 - 16:25 Alex Brandt Mar 27, 2017

Join us at MidCamp March 30 – April 2, 2017!

In this post we will cover...
  • How we’re involved this year
  • What sessions we think you should check out
  • An invite to join us for game night on Friday

Stay connected with the latest news on web strategy, design, and development.

Sign up for our newsletter.

Calling all designers, developers, strategists, and UX professionals: MidCamp week is here!

MidCamp is the largest annual DrupalCamp in the Midwest, and we are proud to have multiple Palantiri deeply involved in the planning and execution of this event. Not only is MidCamp a place to see great sessions on current topics, but it creates an opportunity to connect with the local community in person (not that we don’t also enjoy wearing those fun hats in a Google Hangout).

In addition to supporting MidCamp on an organizational level, this year Palantir is sponsoring Friday night’s Game Night. There will be an assortment of games to play and delicious food provided by the Donerman food truck (free for all Game Night ticket holders). Come late, leave early — we are looking forward to connecting with everyone!

The Palantiri Agenda

Content Before Code — A D8 Case Study
Michelle Jackson and Bec White

Friday, March 31, 2017
11:00am – 11:45am
Room: 314B

In this session you’ll learn how to:

  • Use GatherContent to create content
  • Migrate structured content in conjunction with Drupal 8
  • Collaborate with clients in planning, structuring and curating content prior to development

Successfully Integrate Teams of Internal and External Developers
Megh Plunkett

Friday, March 31, 2017
3:45pm – 4:30pm
Room: 314B

In this session you’ll learn how to:

  • Prepare your team to work with external developers
  • Spot communication breakdowns before they are un-mendable
  • Manage resourcing, roles and responsibilities

Game Night

  • Friday, March 31, 2017
  • 6pm – 9pm
  • Room: Second floor common area

Supporting Innovation Through Contribution
George DeMet

Saturday, April 1, 2017
11:30am – 12:00pm
Room: 314B

In this session you’ll learn:

  • The different benefits gained from supporting Drupal
  • What barriers are keeping organizations from contributing to Drupal
  • How to promote a culture of contribution within the Drupal community

Your Styleguide is an API
Luke Wertz

Saturday, April 1, 2017
2:00pm – 2:45pm
Room: 314B

In this session you’ll learn:

  • How to turn a style guide into a dependency instead of a deliverable
  • A framework for how to use style guides in a CMS-agnostic way
  • Some basic implementation strategies

It’s still not too late to grab tickets, so join us this week, and be sure to follow along on Twitter: #MidCamp.

Stay connected with the latest news on web strategy, design, and development.

Sign up for our newsletter.
Categories: FLOSS Project Planets

Acquia Developer Center Blog: United by Contribution — Two Takes on Open Source

Planet Drupal - Fri, 2017-03-24 17:14

A couple of Drupalists meet WordCamp London 2017 - WordCamp London 2017 was my very first WordPress community event; my first time up close with a community rumoured to have a very different focus and feeling than the Drupal community I know and love. But contribution, the one theme unifying us across all open source software communities, brought us all together.

Tags: acquia drupal planetwordpresslondoncommunitycontribution
Categories: FLOSS Project Planets

Gunnar Wolf: Dear lazyweb: How would you visualize..?

Planet Debian - Fri, 2017-03-24 16:46

Dear lazyweb,

I am trying to get a good way to present the categorization of several cases studied with a fitting graph. I am rating several vulnerabilities / failures according to James Cebula et. al.'s paper, A taxonomy of Operational Cyber Security Risks; this is a somewhat deep taxonomy, with 57 end items, but organized in a three levels deep hierarchy. Copying a table from the cited paper (click to display it full-sized):

My categorization is binary: I care only whether it falls within a given category or not. My first stab at this was to represent each case using a star or radar graph. As an example:

As you can see, to a "bare" star graph, I added a background color for each top-level category (blue for actions of people, green for systems and technology failures), red for failed internal processes and gray for external events), and printed out only the labels for the second level categories; for an accurate reading of the graphs, you have to refer to the table and count bars. And, yes, according to the Engineering Statistics Handbook:

Star plots are helpful for small-to-moderate-sized multivariate data sets. Their primary weakness is that their effectiveness is limited to data sets with less than a few hundred points. After that, they tend to be overwhelming.

I strongly agree with the above statement — And stating that "a few hundred points" can be understood is even an overstatement. 50 points are just too much. Now, trying to increase usability for this graph, I came across the Sunburst diagram. One of the proponents for this diagram, John Stasko, has written quite a bit about it.

Now... How to create my beautiful Sunburst diagram? That's a tougher one. Even though the page I linked to in the (great!) Data visualization catalogue presents even some free-as-in-software tools to do this... They are Javascript projects that will render their beautiful plots (even including an animation)... To the browser. I need them for a static (i.e. to be printed) document. Yes, I can screenshot and all, but I want them to be automatically generated, so I can review and regenerate them all automatically. Oh, I could just write JSON and use SaaS sites such as Aculocity to do the heavy-lifting, but if you know me, you will understand why I don't want to.

So... I set out to find a Gunnar-approved way to display the information I need. Now, as the Protovis documentation says, an icicle is simply a sunburst transformed from polar to cartesian coordinates... But I came to a similar conclusion: The tools I found are not what I need. OK, but an icicle graph seems much simpler to produce — I fired up my Emacs, and started writing using Ruby, RMagick and RVG... I decided to try a different way. This is my result so far:

So... What do you think? Does this look right to you? Clearer than the previous one? Worst? Do you have any idea on how I could make this better?

Oh... You want to tell me there is something odd about it? Well, yes, of course! I still need to tweak it quite a bit. Would you believe me if I told you this is not really a left-to-right icicle graph, but rather a strangely formatted Graphviz non-directed graph using the dot formatter?

I can assure you you don't want to look at my Graphviz sources... But in case you insist... Take them and laugh. Or cry. Of course, this file comes from a hand-crafted template, but has some autogenerated bits to it. I have still to tweak it quite a bit to correct several of its usability shortcomings, but at least it looks somewhat like what I want to achieve.

Anyway, I started out by making a "dear lazyweb" question. So, here it goes: Do you think I'm using the right visualization for my data? Do you have any better suggestions, either of a graph or of a graph-generating tool?


[update] Thanks for the first pointer, Lazyweb! I found a beautiful solution; we will see if it is what I need or not (it is too space-greedy to be readable... But I will check it out more thoroughly). It lays out much better than anything I can spew out by myself — Writing it as a mindmap using TikZ directly from within LaTeX, I get the following result:

Categories: FLOSS Project Planets

KDE.org and Drupal

Planet KDE - Fri, 2017-03-24 16:26

KDE.org quite possibly has one of the largest open-source websites compared to any other desktop-oriented project, extending beyond into applications, wikis, guides, and much more. The amount of content is dizzying and indeed a huge chunk of that content is about as old as the mascot Kandalf – figuratively and literally.

I personally believe he’s ripped under that cloak.

The KDE.org user-facing design “Aether” is live and various kinks have been worked out, but one fact is glaringly obvious; we’ve made the layers of age and look better by adding another layer. Ultimately the real fix is migrating the site to Drupal, so I figured this post would cover some of the thoughts and progress behind the ongoing work.

Right now work is on porting the Aether theme to Drupal 8, ideally it’ll be “better than perfect port” with Drupal optimizations, making better use of Bootstrap 4, and refinements. Additionally, I’m preparing a “Neverland-style” template for those planning to use Aether on their KDE-related project sites, but it’s more of a side-project until the Drupal theme lands. Recently the theme was changed to use Bootsraps’ Barrio base theme, which has been a very pleasant decision as we get much more “out of the box”. It does require a Bootstrap library module which will allow local or CDN-based Bootstrap installations, and while at first I was asking “why can’t a theme just be self-contained?”, now I’m understanding the logic – Bootstrap is popular, multiple themes use it, this will keep it all up-to-date and can be updated itself. I do think maybe one thing Drupal should do is have some rudimentary package management that says “hey, we also need to download this”, but it’s easy enough to install separately.

If you have a project website looking to port to Aether, I would first advise you simply waiting until you can consider moving your page to the main Drupal installation when it eventually goes live; in my perfect world I imagine Drupal unifying a great amount of disparate content, thus getting free updates. Additionally, consider hitting up the KDE-www mailing list and ask to help out on content, or place feature requests for front-end UI elements. While I’m currently lurking the mailing list, I’ll try to provide whatever info I can. On an aside, I had some Telegram confusion with some people looking to contribute and concerns from administrators, so please simply defer to the mailing list.

In terms of the Aether theme, I will be posting the basic theme on our git repo; when it goes up if you have Bootstrap and Twig experience (any at all is more than I had when I started), please consider contributing, especially if you maintain a page and would migrate to Drupal if it had the appropriate featureset. I will post a tiny follow-up when the repo is up.





Categories: FLOSS Project Planets

PyCharm: PyCharm 2017.1 Out Now: Faster debugger, new test runners, and more

Planet Python - Fri, 2017-03-24 12:46

PyCharm 2017.1 is out now! Get it now for a much faster debugger, improved Python and JavaScript unit testing, and support for the six library.

  • The Python debugger got forty times faster for Python 3.6 projects, and up to two times faster for older versions of Python
  • We’ve added support for the six compatibility library
  • Unit test runners for Python have been rebuilt from the ground up: you can now run any test configuration with PyCharm
  • Are you a full stack developer? We’ve improved our JavaScript unit testing: gutter icons indicating whether a test passed and support for Jest, Facebook’s JS testing framework (only available in PyCharm Professional edition)
  • Zero-latency typing is now on by default: typing latencies for PyCharm 2017.1 are lower than those for Sublime Text and Emacs
  • Support for native Docker for Mac – no more need to use SOCAT! (only available in PyCharm Professional edition)
  • And more!

Get PyCharm 2017.1 now from our website

Please let us know what you think about PyCharm! You can reach us on Twitter, Facebook, and by leaving a comment on the blog.

PyCharm Team
-The Drive to Develop

Categories: FLOSS Project Planets

A typical day in bugs.kde.org

Planet KDE - Fri, 2017-03-24 12:32


I’m sorry that $feature behaves differently to how you expect it. But it’s the way it is and that’s by design. The feature work exactly as it’s supposed to work. I’m sorry, this won’t be changed.


With decisions like that, no wonder KDE is still a broken mess.

I wonder why the hell I even bother reporting issues. Bugs are by design these days.

Never again.

Have a nice life.

Categories: FLOSS Project Planets

WebGL streaming in a Raspberry PI Zero W

Planet KDE - Fri, 2017-03-24 09:06

A week ago I received my Raspberry Pi Zero W to play a bit with some IoT device. The specs of this small device computer are the following:

  • 1GHz, single-core CPU
  • 512MB RAM
  • Mini HDMI and USB On-The-Go ports
  • Micro USB power
  • HAT-compatible 40-pin header
  • Composite video and reset headers
  • CSI camera connector

But the interesting part comes with the connectivity:

  • 802.11 b/g/n wireless LAN
  • Bluetooth 4.1
  • Bluetooth Low Energy (BLE)

And especially from one of the hidden features that allows one to use the device as a headless device and connect using SSH over USB adding the following line to config.txt:
And modifying the file cmdline.txt to add:

remember to create a file called ssh to enable SSH access to your Raspberry Pi. There are plenty tutorials over the Internet showing this! ��

To build a Qt version for the device you can follow this guide.
In the following screen shot, we can see the calqlatr example running directly on the Raspberry Pi:

One of the use cases which comes to my mind using this device and this feature is being able to create portable presentations and show them on any computer without the need of installing new software.

For the presentation, I used the qml-presentation-system (link).

More use cases could be:

  • Application showcase.
  • Custom text editor for taking your notes everywhere.

Please comment if you have other ideas or use cases.

The post WebGL streaming in a Raspberry PI Zero W appeared first on Qt Blog.

Categories: FLOSS Project Planets

Andrew Dalke: ChEMBL bioactivity data

Planet Python - Fri, 2017-03-24 08:00

I almost only use ChEMBL structure files. I download the .sdf files and process them. ChEMBL also supplies bioactivity data, which I've never worked with. Iain Watson suggested I look to it as a source of compound set data, and provided some example SQL queries. This blog post is primarily a set of notes for myself as I experiment with the queries and learn more about what is in the data file.

There is one bit of general advice. If you're going to use the SQLite dump from ChEMBL, make sure that you did "ANALYZE" at least on the tables of interest. This may take a few hours. I'm downloading ChEMBL-22-1 to see if it comes pre-analyzed. If it doesn't, I'll ask them to do so as part of their releases.

For those playing along from home (or the office, or whereever fine SQL database engines may be found), I downloaded the SQLite dump for ChEMBL 21, which is a lovely 2542883 KB (or 2.4) compressed, and 12 GB uncompressed. That link also includes dumps for MySQL, Oracle, and Postgres, as well as schema documentation.

Unpack it the usual way (it takes a while to unpack 12GB), cd into the directory, and open the database using sqlite console: % tar xf chembl_21_sqlite.tar.gz % cd chembl_21_sqlite % sqlite3 chembl_21.db SQLite version 3.8.5 2014-08-15 22:37:57 Enter ".help" for usage hints. sqlite>


The 'compound_structures' table looks interesting. How many structures are there? sqlite> select count(*) from compound_structures; 1583897 Wow. Just .. wow. That took a several minutes to execute. This is a problem I've had before with large databases. SQLite doesn't store the total table size, so the initial count(*) ends up doing a full table scan. This brings in every B-tree node from disk, which requires a lot of random seeks for my poor hard disk made of spinning rust. (Hmm, Crucible says I can get a replacement 500GB SSD for only EUR 168. Hmmm.)

The second time and onwards is just fine, thanks to the power of caching.

What does the structures look like? I'll decided to show only a few of the smallest structures to keep the results from overflowing the screen: sqlite>
   ...> select molregno, standard_inchi, standard_inchi_key, canonical_smiles
  from compound_structures where length(canonical_smiles) 10 limit 4;
1813|InChI=1S/C4H11NO/c1-2-3-4-6-5/h2-5H2,1H3|WCVVIGQKJZLJDB-UHFFFAOYSA-N|CCCCON 3838|InChI=1S/C2H4INO/c3-1-2(4)5/h1H2,(H2,4,5)|PGLTVOMIXTUURA-UHFFFAOYSA-N|NC(=O)CI 4092|InChI=1S/C4H6N2/c5-4-6-2-1-3-6/h1-3H2|VEYKJLZUWWNWAL-UHFFFAOYSA-N|N#CN1CCC1 4730|InChI=1S/CH4N2O2/c2-1(4)3-5/h5H,(H3,2,3,4)|VSNHCAURESNICA-UHFFFAOYSA-N|NC(=O)NO

For fun, are there canonical SMILES which are listed multiple times? There are a few, so I decided to narrow it down to those with more than 2 instances. (None occur more than 3 times.) sqlite> select canonical_smiles, count(*) from compound_structures group by canonical_smiles having count(*) > 2; CC(C)Nc1cc(ccn1)c2[nH]c(nc2c3ccc(F)cc3)[S+](C)[O-]|3 CC(C)Nc1cc(ccn1)c2[nH]c(nc2c3ccc(F)cc3)[S+]([O-])C(C)C|3 CC(C)[C@@H](C)Nc1cc(ccn1)c2[nH]c(nc2c3ccc(F)cc3)[S+](C)[O-]|3 CC(C)[C@H](C)Nc1cc(ccn1)c2[nH]c(nc2c3ccc(F)cc3)[S+](C)[O-]|3 CC(C)[S+]([O-])c1nc(c2ccc(F)cc2)c([nH]1)c3ccnc(NC4CCCCC4)c3|3 ... Here are more details about the first output where the same SMILES is used multiple times: sqlite>
   ...>select molregno, standard_inchi from compound_structures
 where canonical_smiles = "CC(C)Nc1cc(ccn1)c2[nH]c(nc2c3ccc(F)cc3)[S+](C)[O-]";
1144470|InChI=1S/C18H19FN4OS/c1-11(2)21-15-10-13(8-9-20-15)17-16(22-18(23-17)25(3)24)12-4-6-14(19)7-5-12/h4-11H,1-3H3,(H,20,21)(H,22,23) 1144471|InChI=1S/C18H19FN4OS/c1-11(2)21-15-10-13(8-9-20-15)17-16(22-18(23-17)25(3)24)12-4-6-14(19)7-5-12/h4-11H,1-3H3,(H,20,21)(H,22,23)/t25-/m1/s1 1144472|InChI=1S/C18H19FN4OS/c1-11(2)21-15-10-13(8-9-20-15)17-16(22-18(23-17)25(3)24)12-4-6-14(19)7-5-12/h4-11H,1-3H3,(H,20,21)(H,22,23)/t25-/m0/s1 The differences are in the "/t(isotopic:stereo:sp3)", "/m(fixed_:stereo:sp3:inverted)", and "/s(fixed_H:stereo_type=abs)" layers. Got that?

I don't. I used the techniques of the next section to get the molfiles for each structure. The differences are in the bonds between atoms 23/24 (the sulfoxide, represented in charge-separated form) and atoms 23/25 (the methyl on the sulfur). The molfile for the first record has no asigned bond stereochemistry, the second has a down flag for the sulfoxide, and the third has a down flag for the methyl.

molfile column in compound_structures

There's a "molfile" entry. Does it really include the structure as a raw MDL molfile? Yes, yes it does: sqlite> select molfile from compound_structures where molregno = 805; 11280714442D 1 1.00000 0.00000 0 8 8 0 0 0 999 V2000 6.0750 -2.5667 0.0000 C 0 0 0 0 0 0 0 0 0 5.3625 -2.9792 0.0000 N 0 0 3 0 0 0 0 0 0 6.7917 -2.9792 0.0000 N 0 0 0 0 0 0 0 0 0 5.3625 -3.8042 0.0000 C 0 0 0 0 0 0 0 0 0 4.6542 -2.5667 0.0000 C 0 0 0 0 0 0 0 0 0 6.0750 -1.7417 0.0000 C 0 0 0 0 0 0 0 0 0 4.6542 -1.7417 0.0000 C 0 0 0 0 0 0 0 0 0 5.3625 -1.3292 0.0000 C 0 0 0 0 0 0 0 0 0 2 1 1 0 0 0 3 1 2 0 0 0 4 2 1 0 0 0 5 2 1 0 0 0 6 1 1 0 0 0 7 8 1 0 0 0 8 6 1 0 0 0 7 5 1 0 0 0 M END

Why did I choose molregno = 805? I looked for a structure with 8 atoms and 8 bond by searching for the substring "  8  8  0", which is in the counts line. (It's not a perfect solution, but rather a good-enough one. sqlite> select molregno from compound_structures where molfile LIKE "% 8 8 0%" limit 1; 805 I bet with a bit of effort I could count the number of rings by using the molfile to get the bond counts and use the number of "."s in the canonical_smiles to get the number of fragments.

compound_properties and molecule_dictionary tables

The compound_properties table stores some molecular properties. I'll get the number of heavy atoms, the number of aromatic rings, and the full molecular weight for structure 805. sqlite> select heavy_atoms, aromatic_rings, full_mwt from compound_properties where molregno = 805; 8|0|112.17 I've been using "805", which is an internal identifier. What's its public ChEMBL id? sqlite> select chembl_id from molecule_dictionary where molregno = 805; CHEMBL266980 What are some of the records with only 1 or 2 atoms? sqlite>
select chembl_id, heavy_atoms from molecule_dictionary, compound_properties
 where molecule_dictionary.molregno = compound_properties.molregno
 and heavy_atoms < 3 limit 5;
CHEMBL1098659|1 CHEMBL115849|2 CHEMBL1160819|1 CHEMBL116336|2 CHEMBL116838|2

InChI and heavy atom count for large structures

I showed that some of the SMILES were used for two or three records. What about the InChI string? I started with: sqlite>
select molregno, standard_inchi, count(*) from compound_structures
  group by standard_inchi having count(*) > 1;
1378059||9 After 10 minutes with no other output, I gave up. Those 9 occurrences have a NULL value, that is: sqlite> select count(*) from compound_structures where standard_inchi is NULL; 9 I was confused at first because there are SMILES string (I'll show only the first 40 characters), so there is structure information. The heavy atom count is also NULL: sqlite>
select compound_structures.molregno, heavy_atoms, substr(canonical_smiles, 1, 40)
 from compound_structures, compound_properties
 where standard_inchi is NULL and compound_structures.molregno = compound_properties.molregno;
615447||CC1=CN([C@H]2C[C@H](OP(=O)(O)OC[C@H]3O[C 615448||CC1=CN([C@H]2C[C@H](OP(=O)(O)OC[C@H]3O[C 615449||CC1=CN([C@H]2C[C@H](OP(=O)(O)OC[C@H]3O[C 615450||CC1=CN([C@H]2C[C@H](OP(=O)(O)OC[C@H]3O[C 615451||CC1=CN([C@H]2C[C@H](OP(=O)(O)OC[C@H]3O[C 1053861||CN(C)P(=O)(OC[C@@H]1CN(C[C@@H](O1)n2cnc3 1053864||CN(C)P(=O)(OC[C@@H]1CN(C[C@@H](O1)n2cnc3 1053865||CN(C)P(=O)(OC[C@@H]1CN(C[C@@H](O1)N2C=CC 1378059||CC[C@H](C)[C@H](NC(=O)[C@H](CCCNC(=N)N)N Then I realized it's because the schema specifies the "heavy_atoms" field as "NUMBER(3,0)". While SQLite ignores that limit, it looks like ChEMBL doesn't try to store a count above 999.

What I'll do instead is get the molecular formula, which shows that there are over 600 heavy atoms in those structures: sqlite>
select chembl_id, full_molformula
  from compound_structures, compound_properties, molecule_dictionary
  where standard_inchi is NULL
  and compound_structures.molregno = compound_properties.molregno
  and compound_structures.molregno = molecule_dictionary.molregno;
CHEMBL1077162|C318H381N118O208P29 CHEMBL1077163|C319H383N118O209P29 CHEMBL1077164|C318H382N119O208P29 CHEMBL1077165|C325H387N118O209P29 CHEMBL1631334|C361H574N194O98P24S CHEMBL1631337|C367H606N172O113P24 CHEMBL1631338|C362H600N180O106P24 CHEMBL2105789|C380H614N112O113S9 Those are some large structures! The reason there are no InChIs for them is that InChI didn't support large molecules until version 1.05, which came out in early 2017. Before then, InChI only supported 1024 atoms. Which is normally fine as most compounds are small (hence "small molecule chemistry"). In fact, there aren't any records with more than 79 heavy atoms: sqlite>
select heavy_atoms, count(*) from compound_properties
  where heavy_atoms > 70 group by heavy_atoms;
71|364 72|207 73|46 74|29 75|3 76|7 78|2 79|2 How in the world do these large structures have 600+ atoms? Are they peptides? Mmm, no, not all. The first 8 contain a lot of phosphorouses. I'm guessing some sort of nucleic acid. The last might be a protein. Perhaps I can get a clue from the chemical name, which is in the compound_records table. Here's an example using the molregno 805 from earlier: sqlite> select * from compound_records where molregno = 805; 1063|805|14385|14|1-Methyl-piperidin-(2Z)-ylideneamine|1| Some of the names of the 600+ atom molecules are too long, so I'll limit the output to the first 50 characters of the name: sqlite>
select chembl_id, full_molformula, substr(compound_name, 1, 50)
  from compound_structures, molecule_dictionary, compound_properties, compound_records
 where standard_inchi is NULL
   and compound_structures.molregno = molecule_dictionary.molregno
   and compound_structures.molregno = compound_properties.molregno
   and compound_structures.molregno = compound_records.molregno;
CHEMBL1077161|C307H368N116O200P28|{[(2R,3S,4R,5R)-5-(4-amino-2-oxo-1,2-dihydropyrimi CHEMBL1077162|C318H381N118O208P29|{[(2R,3S,5R)-2-{[({[(2R,3S,4R,5R)-5-(4-amino-2-oxo CHEMBL1077163|C319H383N118O209P29|{[(2R,3S,5R)-2-{[({[(2R,3S,4R,5R)-5-(4-amino-2-oxo CHEMBL1077164|C318H382N119O208P29|{[(2R,3S,5R)-2-{[({[(2R,3S,4R,5R)-5-(4-amino-2-oxo CHEMBL1077165|C325H387N118O209P29|{[(2R,3S,5R)-2-{[({[(2R,3S,4R,5R)-5-(4-amino-2-oxo CHEMBL1631334|C361H574N194O98P24S|HRV-EnteroX CHEMBL1631337|C367H606N172O113P24|PV-5'term CHEMBL1631338|C362H600N180O106P24|PV-L4 CHEMBL2105789|C380H614N112O113S9|Mirostipen That didn't help much, but I could at least do a web search for some of the names. For example, HRV-EnteroX is a PPMO (peptide-conjugated phosphorodiamidate morpholino oligomers), which is where those phosphorous atoms come from.

The names weren't really help, and the images at ChEMBL were too small to make sense of the structures, so I looked at them over at PubChem. HRV-EnteroX looks like a 12-mer peptide conjugated to about 25 morpholino oligomers. Mirostipen looks like a peptide. CHEMBL1077161 looks like a nucleic acid strand.

I don't think there's anything interesting to explore in this direction so I'll move on.

Assay data I'll take a look at assay data, which I deal with a lot less often than I do structure data. How many assays are there? sqlite> select count(*) from assays; 1212831 Okay, and how many of them are human assays? For that I need the NCBI taxonomy id. Iain's example code uses 9606, which the NCBI web site tells me is for Homo sapiens. I don't think there's a table in the SQLite data dump with all of the taxonomy ids. The organism_class table says only: sqlite> select * from organism_class where tax_id = 9606; 7|9606|Eukaryotes|Mammalia|Primates The assay table "assay_organism" column stores the "[n]ame of the organism for the assay system", with the caution "[m]ay differ from the target organism (e.g., for a human protein expressed in non-human cells, or pathogen-infected human cells)." I'll throw caution to the wind and check that field: sqlite> select count(*) from assays where assay_organism = "Homo sapiens"; 291143 sqlite>
select assay_organism, count(*) from assays
 where assay_tax_id = 9606 group by assay_organism;
|17 Homo sapiens|291135 sqlite> select count(*) from assays where assay_tax_id = 9606 and assay_organism is NULL; 17 It looks like 9606 is indeed for humans.

Assay activities What sort of assay activities are there? sqlite> select distinct published_type from activities; ED50 Transactivation % % Cell Death ... AUC AUC (0-24h) AUC (0-4h) AUC (0-infinity) ... Change Change HDL -C Change MAP Change TC ... Okay, quite a few. There appear to be some typos as well: sqlite>
select published_type, count(*) from activities where published_type in ("Activity", "A ctivity",
  "Ac tivity", "Act ivity", "Acti vity", "Activ ity", "Activi ty", "Activit y", "Activty")
  group by published_type;
A ctivity|1 Activ ity|2 Activit y|1 Activity|700337 Activty|1 After another 20 minutes of data exploration, I realized that there are two different types. The "published_type" is what the assayer published, while there's also a "standard_type", which looks to be a normalized value by ChEMBL: sqlite>
select published_type, standard_type from activities
  where published_type in ("A ctivity", "Activ ity", "Activit y", "Activty");
A ctivity|Activity Activ ity|Activity Activ ity|Activity Activit y|Activity Activty|Activity There are a many ways to publish a report with IC50 data. I'll show only those that end with "IC50". sqlite> select distinct published_type from activities where published_type like "%IC50"; -Log IC50 -Log IC50/IC50 -logIC50 Average IC50 CC50/IC50 CCIC50 CIC IC50 CIC50 Change in IC50 Cytotoxicity IC50 Decrease in IC50 EIC50 FIC50 Fold IC50 I/IC50 IC50 IC50/IC50 Increase in IC50 Log 1/IC50 Log IC50 MBIC50 MIC50 Mean IC50 RIC50 Ratio CC50/IC50 Ratio CIC95/IC50 Ratio ED50/MIC50 Ratio IC50 Ratio LC50/IC50 Ratio LD50/IC50 Ratio pIC50 Ratio plasma concentration/IC50 Relative ET-A IC50 Relative IC50 TBPS IC50 TC50/IC50 Time above IC50 fIC50 log1/IC50 logIC50 pIC50 pMIC50 rIC50 The "p" prefix, as in "pIC50", is shorthand for "-log", so "-Log IC50", "Log 1/IC50", and "pIC50" are almost certainly the same units. Let's see: sqlite>
select distinct published_type, standard_type from activities
  where published_type in ("-Log IC50", "Log 1/IC50", "pIC50");
-Log IC50|IC50 -Log IC50|pIC50 -Log IC50|-Log IC50 Log 1/IC50|IC50 Log 1/IC50|Log 1/IC50 pIC50|IC50 pIC50|pIC50 pIC50|Log IC50 pIC50|-Log IC50 Well color me confused. Oh! There's a "standard_flag", which "[s]hows whether the standardised columns have been curated/set (1) or just default to the published data (0)." Perhaps that will help enlighten me. sqlite>
select distinct published_type, standard_flag, standard_type from activities
 where published_type in ("-Log IC50", "Log 1/IC50", "pIC50");
-Log IC50|1|IC50 -Log IC50|1|pIC50 -Log IC50|0|-Log IC50 Log 1/IC50|1|IC50 Log 1/IC50|0|Log 1/IC50 pIC50|1|IC50 pIC50|1|pIC50 pIC50|0|Log IC50 pIC50|1|-Log IC50 pIC50|0|pIC50 Nope, I still don't understand what's going on. I'll assume it's all tied to the complexities of data curation. For now, I'll assume that the data set is nice and clean.

IC50 types

Let's look at the "IC50" values only. How do the "published_type" and "standard_type" columns compare to each other? sqlite>
select published_type, standard_type, count(*) from activities
  where published_type = "IC50" group by standard_type;
IC50|% Max Response|21 IC50|Change|2 IC50|Control|1 IC50|Electrophysiological activity|6 IC50|Fold Inc IC50|1 IC50|Fold change IC50|12 IC50|IC50|1526202 IC50|IC50 ratio|1 IC50|Inhibition|12 IC50|Log IC50|20 IC50|Ratio IC50|40 IC50|SI|4 IC50|T/C|1 sqlite>
select published_type, standard_type, count(*) from activities
  where standard_type = "IC50" group by published_type;
-Log IC50|IC50|1736 -Log IC50(M)|IC50|28 -Log IC50(nM)|IC50|39 -logIC50|IC50|84 3.3|IC50|1 Absolute IC50 (CHOP)|IC50|940 Absolute IC50 (XBP1)|IC50|940 Average IC50|IC50|34 CIC50|IC50|6 I 50|IC50|202 I-50|IC50|25 I50|IC50|6059 IC50|IC50|1526202 IC50 |IC50|52 IC50 app|IC50|39 IC50 max|IC50|90 IC50 ratio|IC50|2 IC50(app)|IC50|457 IC50_Mean|IC50|12272 IC50_uM|IC50|20 ID50|IC50|3 Log 1/I50|IC50|280 Log 1/IC50|IC50|589 Log 1/IC50(nM)|IC50|88 Log IC50|IC50|7013 Log IC50(M)|IC50|3599 Log IC50(nM)|IC50|77 Log IC50(uM)|IC50|28 Mean IC50|IC50|1 NA|IC50|5 NT|IC50|20 log(1/IC50)|IC50|1016 pI50|IC50|2386 pIC50|IC50|43031 pIC50(mM)|IC50|71 pIC50(nM)|IC50|107 pIC50(uM)|IC50|419 Yeah, I'm going to throw my hands up here, declare "I'm a programmer, Jim, not a bioactivity specialist", and simply use the published_type of IC50.

IC50 activity values

How are the IC50 values measured? Here too I need to choose between "published_units" and "standard_units". A quick look at the two shows that the standard_units are less diverse. sqlite>
select standard_units, count(*) from activities where published_type = "IC50"
  group by standard_units;
|167556 %|148 % conc|70 10'-11uM|1 10'-4umol/L|1 M ml-1|15 equiv|64 fg ml-1|1 g/ha|40 kJ m-2|20 liposomes ml-1|5 mMequiv|38 mg kg-1|248 mg.min/m3|4 mg/kg/day|1 milliequivalent|22 min|9 ml|3 mmol/Kg|10 mol|6 molar ratio|198 nA|6 nM|1296169 nM g-1|1 nM kg-1|4 nM unit-1|7 nmol/Kg|1 nmol/mg|5 nmol/min|1 ppm|208 ppm g dm^-3|7 uL|7 uM hr|1 uM tube-1|9 uM well-1|52 uM-1|25 uM-1 s-1|1 ucm|6 ucm s-1|2 ug|168 ug cm-2|1 ug g-1|2 ug well-1|12 ug.mL-1|61139 ug/g|16 umol kg-1|3 umol.kg-1|8 umol/dm3|2 "Less diverse", but still diverse. By far the most common is "nM for "nanomolar", which is the only unit I expected. How many IC50s have an activities better than 1 micromolar, which is 1000 nM?

select count(*) from activities where published_type = "IC50"
   and standard_value < 1000 and standard_units = "nM";
483041 That's fully 483041/1212831 = 40% of the assays in the data dump.

How many of the IC50s are in humans? For that I need a join with the assays table using the assay_id: sqlite>
select count(*) from activities, assays
 where published_type = "IC50"
   and standard_value < 1000 and standard_units = "nM"
   and activities.assay_id = assays.assay_id
   and assay_tax_id = 9606;
240916 About 1/2 of them are in humans.

Assay target type from target_dictionary


Remember earlier when I threw caution to the wind? How many of the assays are actually against human targets? I can join on the target id "tid" to compare the taxon id in the target vs. the taxon id in the assay: sqlite>
select count(*) from assays, target_dictionary
 where assays.tid = target_dictionary.tid
   and target_dictionary.tax_id = 9606;
301810 sqlite>
select count(*) from assays, target_dictionary
 where assays.tid = target_dictionary.tid
   and assays.assay_tax_id = 9606;

Compare assay organisms with target organism

What are some of the non-human assay organisms where the target is humans? sqlite>
select distinct assay_organism from assays, target_dictionary
 where assays.tid = target_dictionary.tid
   and assays.assay_tax_id != 9606
   and target_dictionary.tax_id = 9606
 limit 10;
rice Saccharomyces cerevisiae Oryza sativa Rattus norvegicus Sus scrofa Cavia porcellus Oryctolagus cuniculus Canis lupus familiaris Proteus vulgaris Salmonella enterica subsp. enterica serovar Typhi

Compounds tested against a target name

I'm interested in the "SINGLE PROTEIN" target names in humans. The target name is a manually curated field. sqlite> select distinct pref_name from target_dictionary where tax_id = 9606 limit 5; Maltase-glucoamylase Sulfonylurea receptor 2 Phosphodiesterase 5A Voltage-gated T-type calcium channel alpha-1H subunit Dihydrofolate reductase What are structures used in "Dihydrofolate reductase" assays? This requires three table joins, one on 'tid' to go from target_dictionary to assays, another on 'assay_id' to get to the activity, and another on 'molregno' to go from assay to molecule_dictionary so I can get the compound's chembl_id. (To make it more interesting, three of the tables have a chembl_id column.) sqlite>
select distinct molecule_dictionary.chembl_id
  from target_dictionary, assays, activities, molecule_dictionary
 where target_dictionary.pref_name = "Dihydrofolate reductase"
   and target_dictionary.tid = assays.tid
   and assays.assay_id = activities.assay_id
   and activities.molregno = molecule_dictionary.molregno
 limit 10;
CHEMBL1679 CHEMBL429694 CHEMBL106699 CHEMBL422095 CHEMBL1161155 CHEMBL350033 CHEMBL34259 CHEMBL56282 CHEMBL173175 CHEMBL173901 sqlite>
select count(distinct molecule_dictionary.chembl_id)
   from target_dictionary, assays, activities, molecule_dictionary
  where target_dictionary.pref_name = "Dihydrofolate reductase"
    and target_dictionary.tid = assays.tid
    and assays.assay_id = activities.assay_id
    and activities.molregno = molecule_dictionary.molregno;
3466 There are 3466 of these, including non-human assays. I'll limit it to human ones only: sqlite>
select count(distinct molecule_dictionary.chembl_id)
   from target_dictionary, assays, activities, molecule_dictionary
  where target_dictionary.pref_name = "Dihydrofolate reductase"
    and target_dictionary.tax_id = 9606
    and target_dictionary.tid = assays.tid
    and assays.assay_id = activities.assay_id
    and activities.molregno = molecule_dictionary.molregno;
1386 I'll further limit it to those with an IC50 of under 1 micromolar: sqlite>
.timer on
select count(distinct molecule_dictionary.chembl_id)
   from target_dictionary, assays, activities, molecule_dictionary
  where target_dictionary.pref_name = "Dihydrofolate reductase"
    and target_dictionary.tax_id = 9606
    and target_dictionary.tid = assays.tid
    and assays.assay_id = activities.assay_id
    and activities.published_type = "IC50"
    and activities.standard_units = "nM"
    and activities.standard_value < 1000
    and activities.molregno = molecule_dictionary.molregno;
255 Run Time: real 174.561 user 18.073715 sys 23.285346 I turned on the timer to show that the query took about 3 minutes! I repeated it to ensure that it wasn't a simple cache issue. Still about 3 minutes.

ANALYZE the tables

The earlier query, without the activity filter, took 5.7 seconds when the data wasn't cached, and 0.017 seconds when cached. It found 1386 matches. The new query takes almost 3 minutes more to filter those 1386 matches down to 255. That should not happen.

This is a strong indication that the query planner used the wrong plan. I've had this happen before. My solution then was to "ANALYZE" the tables, which "gathers statistics about tables and indices and stores the collected information in internal tables of the database where the query optimizer can access the information and use it to help make better query planning choices."

It can take a while, so I limited it to the tables of interest. sqlite> analyze target_dictionary; Run Time: real 0.212 user 0.024173 sys 0.016268 sqlite> analyze assays; Run Time: real 248.184 user 5.890109 sys 4.793236 sqlite> analyze activities; Run Time: real 6742.390 user 97.862790 sys 129.854073 sqlite> analyze molecule_dictionary; Run Time: real 33.879 user 2.195662 sys 2.043848 Yes, it took almost 2 hours to analyze the activities table. But it was worth it from a pure performance view. I ran the above code twice, with this pattern: % sudo purge # clear the filesystem cache % sqlite3 chembl_21.db # start SQLite SQLite version 3.8.5 2014-08-15 22:37:57 Enter ".help" for usage hints. sqlite> .timer on sqlite> .... previous query, with filter for IC50 < 1uM ... 255 Run Time: real 8.595 user 0.038847 sys 0.141945 sqlite> .... repeat query using a warm cache 255 Run Time: real 0.009 user 0.005255 sys 0.003653 Nice! Now I only need to do about 60 such queries to justify the overall analysis time.

Categories: FLOSS Project Planets

Jo Shields: Mono repository changes, beginning Mono vNext

Planet Debian - Fri, 2017-03-24 06:06

Up to now, Linux packages on mono-project.com have come in two flavours – RPM built for CentOS 7 (and RHEL 7), and .deb built for Debian 7. Universal packages that work on the named distributions, and anything newer.

Except that’s not entirely true.

Firstly, there have been “compatibility repositories” users need to add, to deal with ABI changes in libtiff, libjpeg, and Apache, since Debian 7. Then there’s the packages for ARM64 and PPC64el – neither of those architectures is available in Debian 7, so they’re published in the 7 repo but actually built on 8.

A large reason for this is difficulty in our package publishing pipeline – apt only allows one version-architecture mix in the repository at once, so I can’t have, say, built on AMD64 on both Debian 7 and Ubuntu 16.04.

We’ve been working hard on a new package build/publish pipeline, which can properly support multiple distributions, based on Jenkins Pipeline. This new packaging system also resolves longstanding issues such as “can’t really build anything except Mono” and “Architecture: All packages still get built on Jo’s laptop, with no public build logs”

So, here’s the old build matrix:

Distribution Architectures Debian 7 ARM hard float, ARM soft float, ARM64 (actually Debian 8), AMD64, i386, PPC64el (actually Debian 8) CentOS 7 AMD64

And here’s the new one:

Distribution Architectures Debian 7 ARM hard float (v7), ARM soft float, AMD64, i386 Debian 8 ARM hard float (v7), ARM soft float, ARM64, AMD64, i386, PPC64el Raspbian 8 ARM hard float (v6) Ubuntu 14.04 ARM hard float (v7), ARM64, AMD64, i386, PPC64el Ubuntu 16.04 ARM hard float (v7), ARM64, AMD64, i386, PPC64el CentOS 6 AMD64, i386 CentOS 7 AMD64

The compatibility repositories will no longer be needed on recent Ubuntu or Debian – just use the right repository for your system. If your distribution isn’t listed… sorry, but we need to draw a line somewhere on support, and the distributions listed here are based on heavy analysis of our web server logs and bug requests.

You’ll want to change your package manager repositories to reflect your system more accurately, once Mono vNext is published. We’re debating some kind of automated handling of this, but I’m loathe to touch users’ sources.list without their knowledge.

CentOS builds are going to be late – I’ve been doing all my prototyping against the Debian builds, as I have better command of the tooling. Hopefully no worse than a week or two.

Categories: FLOSS Project Planets
Syndicate content