FLOSS Research
Between Thought and Expression
Computer programs are treated, for the purposes of copyright law, as literary works. As well as giving some legitimacy to the legion of people out there calling themselves Codepoet, this decision has the effect of making the division between idea and expression a key one in determining what is and is not ownable in a computer program. This is because it is a fundamental assumption in all doctrines of copyrightability that is it is the specific expression that is protected, not the idea that underlies it. To take the example of a more standard literary work, it is a novel’s original arrangement of specific words on a page that is protected, not the events that make up its plot.
This division between idea and expression is slightly more complicated in the world of computer programs, however. Clearly the lines of code in a program’s source files are analagous to the words on a page in a novel, but what is a program’s ‘plot’? Is it the broad task that the program is written to achieve? How about the arrangement of code elements – say the way that the task is conceptualised as subroutines or objects? At what level of abstraction does a program pass from being an expression into an idea?
This question is an important one for anyone who is publishing code. While it’s easy to see that pasting someone else’s source code into your own program is likely to need the original author’s permission, it’s less clear whether borrowing someone else’s object model, data model or API definition is an infringing act. These questions have been being discussed both in the US and Europe recently as a result of a couple of high profile court cases.
In the US, Oracle and Google have been fighting over a range of intellectual property issues – both patent and copyright-related – for almost two years now. One key issue that remains unresolved at the time of writing is whether certain non-literal elements of the Java programming language are protected by copyright. This argument centres around what are essentially helpful code snippets that are provided to Java programmers by the creators of the language. These are arranged into named sets, with established conventions for calling them up and making use of them (APIs). The question at issue here is whether the naming of these sets and the conventions for making use of them is ownable.
In Europe, in a case covering similar though not identical ground, the European Court of Justice has ruled on an argument between SAS and WPL over the issue of whether a programming language and the structure of data files are ownable. The former question is closely adjacent to the Google-Oracle APIs issue: is a conceptual arrangement of useful items copyrightable? In this case the ECJ ruled that it is not, and drew heavily on the analogy with literature and natural language; books in English are ownable, but you can’t own English. On the issue of data files, on the other hand, the ECJ held that the structure of a data file is ownable as copyright as a part of the expression of a computer program. So here we have a real example of a concept which is not a literal expression but is still not sufficiently abstract to be an unprotected idea. (In the event this finding did not help the ‘owners’ of the data file format, as the act of infringement they were complaining about was legitimised by an exception in EU and UK law that permits certain acts in respect of computer programs if they are done for the purposes of facilitating interoperability.)
These issues may seem annoyingly abstract and inconsequential, but in fact they have deep significance for all software authors and consumers. While software is treated as a literary work in copyright law, its tendency to be more formally structured and complex than literature means that the division between idea and expression in it will tend to be hard to find. This in turn means that there is often a real lack of clarity on what aspects of a computer program can be legitimately reused by other software authors. This exposes the authors to risk, and means that we as consumers can find ourselves relying on software that infringes others rights and may be subject to unexpected licence fees or removal from the market. While not specifically an open source issue, it affects open source just as much as closed source. So the recent ECJ judgement and the forthcoming decision on API copyrightability in the Google-Oracle case are of real benefit to the IT community. Whatever the specifics of the decisions, their clarity will be useful.
The dominance of open source tools in Big Data
Most of the tools that are best suited for dealing with Big Data are open source. This provides the research community with a huge opportunity, because no investment in software licenses is needed. You just download the software and ‘get on with it’. The challenge, as became clear at the Eduserv symposium last week, is to find people with the right skills to apply these tools.
Without a doubt, Apache Hadoop that is the most important open source project in this space. It is amazing to see how fast the Apache Hadoop ecosystem is growing and how everyone is trying to jump on the bandwagon. Start-up companies like Cloudera and Hortonworks have no trouble finding venture capitalists willing to invest large sums of money. Similarly, nearly every major tech company is offering it, while other internet companies that deal with big data are using it (secretly or not). At the Eduserv symposium, EMC CTO Rob Anderson focused on the implication big data has for storage, and showed their Hadoop-based offering. Because the Apache licence allows you to use any Apache project in a closed-source implementation, EMC can sell their Hadoop distribution without needing to make that product open source.
There are big implications of the big data trend for the research community. Guy Coates of the Sanger institute showed how the amount of data they are managing is increasing rapidly. They are expecting this increase to continue, especially since the costs of human DNA sequencing is dropping dramatically. They expect it to drop to $1000 for a full scan within two years (excluding storage!). His main challenge was not the actual storage of the data, but the management of the data as researchers were analysing it. Sanger is using the open source tool iRODS, a community-driven project that originates from the Data Intensive Cyber Environments (DICE) research group in the DICE Center at the University of North Carolina.
Another open source project that featured prominently at the Eduserv symposium was Apache CouchDB. Simon Metson of Bristol University explained how NoSQL is the enabler of big data and new database systems that do not use the traditional relational database approach are better suited for these tasks. Open source software projects like CouchDB, but also Apache Cassandra, are leading in this space. Simon highlighted that the community-aspect of big data is very important. By engaging with the community that uses these tools to solve their big data problem, you can solve the hard problems. Something you may encountered once in a thousand times, may have been solved by someone else in the community who runs into it more often, and vice versa.
The closing keynote was given by Anthony D Joseph, professor at the AMP Lab at UC Berkeley. He mentioned how Facebook started the Open Compute project to share best practice in cluster design for big data centers. It is an interesting example of the old economic adage that you should commodotise your complement. Berkeley is collaborating on the Apache Incubator project Mesos, which is a scalable cluster manager that can dynamically share resources between multiple computing frameworks. They support frameworks like Hadoop, Spark and MPI.
So the technology is there or is well underway in being developed. And being open source, anyone can download and start using it. Technology is not the problem of big data, but the challenges lie in the cultural and organisational change that is needed to capitalise on big data. People within and across the organisations need to be willing to share their data and think of new, intelligent and creative ways of making use of this data. Two well-known examples that were mentioned were the Google flutrends, a website that predicts flu epidemics based on what people search for, and a Twitter application that was created to detect and report on earthquakes using people’s tweets.
A final challenge that was recognised widely at the conference was the shortage of skilled people in the big data space. This is true both for the data scientists that were needed to analyse the data, and for people that can help curate the data longer term, which is a completely different challenge for many HE institutions. In the spirit of open source though, there are many resources freely available online for people who want to get started, such as on the website bigdatauniversity.com. And of course, if you want to get started with one of the open source projects mentioned, there are many ways to get involved.
What makes a community led project work?
This guest post has been contributed by Ross Gardler of OpenDirective. Ross is Vice President of Community Development at The Apache Software Foundation and a mentor at the Outercurve Foundation. Ross has been active in open development of open source software for over ten years.
OSS Watch has been participating in the development of Apache Rave, a ‘next-generation portal engine, supporting (Open)Social Gadgets as well as WC3 widgets’. As Sander observes in this blog, the Rave ecosystem is made up of a ‘diverse range of collaborators’ from both the academic and commercial sectors. These partners are sharing resources in order to build a critical piece of software at lower cost as well as to increase innovation around that product.
A few days ago I posted an evaluation of the Apache OpenOffice project’s journey through the Apache Incubator (all code entering the Apache Software Foundation (ASF) must pass through the incubator). That post looked at what makes an Apache project different from many other open source project. This post repeats many of the same points, but rather than examine them from the point of view of OpenOffice I will examine why predominantly academic team behind Apache Rave chose to go to the ASF.
In Apache projects, a Project Management Committee (PMC) oversees each project on behalf of its users, contributors, committers and the foundation itself. Upon entering incubation the PMC is guided by mentors from the foundation. Upon graduation mentors either retire or become equal members of the PMC. For the Rave community the provision of mentors meant that the project team could avoid the mistakes of many other open source projects. As a result, the team got an honourable mention in the Black Duck Open Source Rookie of the Year awards. Not bad for a team with no significant knowledge of open source software development on a large scale. Now it has graduated, it no longer has mentors actively overseeing its work, but it still has the backing of over 100 full Apache projects and another 50-odd incubating projects.
New committers and PMC members are elected by the PMC based on merit. It should be relatively easy for anyone to gain influence on an Apache project. In the ASF this is achieved through rewarding merit. If you contribute to the project you are rewarded with influence over the project. In environments where staff turnover can be high, such as academic research, this is important with respect to continuity. It also removes the opportunity for someone to insist on a level of control based purely on the cash they wield. In an Apache project it is all about the delivery.
All decisions unrelated to individuals happen on the public mailing list, discussions on the private list is kept to a minimum. This behaviour has no special bearing on academic projects compared to non-academic projects. For both types this rule ensures maximum inclusivity which results in maximum engagement with potential contributors.
‘If it didn’t happen on the dev list, it didn’t happen’ – meaning no decision about the project can be made outside of the public development list. Proposals can be drawn up elsewhere, but decisions occur on the public list. Academic projects, like open source projects in general, often involve collaborators from a variety of geographic regions. This can make it difficult to ensure that everyone is kept informed and engaged. Apache projects require that all significant decisions are made in public so that no participant (or potential participant) is excluded from the process.
Where possible, decisions are made by consensus reached through discussion. There are voting rules but the ASF prefers not to have to vote. Apache Rave began life as a merger between three pre-existing projects. It was important that all three parties were equally engaged in the project. Had there been a pre-defined leader this would, probably, have made some participants feel less engaged. Initially the consensus driven approach can be hard to understand, however, over time natural leaders emerge in specific areas of the project. At this point consensus is easily achieved since each decision is led by the person best equipped to lead it.
Releases are created according to the ASF’s licence requirements. The Apache License is a permissive licence that allows anyone to do anything they want with the code. This allows for maximum flexibility in business cases for engaging with the project which in turn encourages third party contributions. Whilst conforming with Apache policies is more onerous than might be found elsewhere, they are designed to ensure that people can use and contribute to your software with minimal legal risk. Risk is something that universities and companies alike tend to avoid.
Trademarks and logos used by ASF projects belong to the ASF. Protecting trademarks is an important part of open source software. By running the Rave project inside the ASF much of the legal infrastructure and experience is in place should an issue arise in the future.
Apache projects are managed by a diverse group of people, each representing their own interests within the project. Apache decision making processes prevent ‘block votes’ controlling the process by ensuring each voice is equally loud. A number of people are contributing to Apache Rave, each with their own motivations. Each contributor must be assured that what they do today will still be useful tomorrow. Apache projects adopt a model that means it is not possible for third parties to gain control of a project. Consequently, researchers and product developers do not run the risk of losing influence over the code.
As can be seen from the above list of required behaviours found in Apache projects, the focus is on ensuring the project provides maximum opportunities for collaboration and innovation. There are other ways of achieving this but for the initial participants in Apache Rave (Universities of Bolton, Oxford and Indiana, SurfNet, Mitre Corp. and Hippo) the ‘Apache Way’ was deemed to be the most suitable. The same can be said of Apache Wookie which is used in Rave and was also helped by OSS Watch as it moved to Apache.
If your project wants to explore the opportunities that foundations (not just the ASF) can offer your project OSS Watch is here to help.
FRAND or FOSS?
Standards in technology are generally considered to be a good thing. Having documented technologies that can be implemented by all means that businesses can compete on equal terms and consumers benefit from the effects of this competition. Of course, before a technology can be standardised, individual technology players need to do the work of innovation to develop the techniques the standard will encompass. Sometimes these technology players will have sought to protect their investment in innovation by obtaining a patent for the innovative technology they have created. Patents are designed to provide a monopoly over a specific technological process for the owner, so how does this monopoly fit in with the idea of a standard?
The answer is that it doesn’t, really. In situations where implementing a standard would necessarily infringe on someone’s patent, the standards creation bodies will usually try to get the patent’s owner to agree some terms which will guarantee them a return for their investment but which will still allow everyone in the market to actually use the standard in their products. These kinds of terms are often referred to as RAND or FRAND – standing for (fair), reasonable and non-discriminatory.
FRAND is a slippery term. There’s no single definition, which makes determining what is and is not FRAND hard. Most people agree that the general principle behind FRAND is that the fees or other requirements for use of the patents in question are not ridiculously high and are the same for anyone who wishes to implement the standard, whether your best friend or fiercest competitor.
That sounds like a good idea to most people, and for more traditional hardware and closed source implementations of standards it arguably is. There can be problems, however, when software under a free or open source software wishes to implement a standard available under FRAND terms. For example, the GNU GPL family of licences all contain conditions that say – in essence – that if a distributor of the software is forced to pay for the use of a patent in the software, they must either cease distribution or obtain a licence for everyone (the schoolroom chewing gum scenario). These conditions are designed to deter patent owners from pursuing distributors of GPL software, but they mean that payable FRAND standards and GPL software do not play well together.
Even where the licence is not GPL, there can be problems with the interaction between FRAND and FOSS. One way in which patent owners make their patents available for use in a standard is by issuing a ‘non-assert’ promise. These are unilateral undertakings to not assert their patent rights, and in this context they are usually conditional on the patent being used in an implementation of the standard (not unreasonably). However in the context of open development, this can be something of a nightmare. You may write a piece of code that implements the standard and release it under a FOSS licence, confident that you are protected from patent litigation by the non-assert. An unwary downstream developer looks at your code – specifically the bit that implements the patent – and thinks: “that’s a nice bit of code – I’ll use that for my next project…” Of course, unless by some happy accident their next project is also implementing the standard then their use of the same code will not be protected by the non-assert, creating a potentially very dangerous problem.
The question of the compatibility of FRAND terms with FOSS software has become a vexed one recently due to the UK Government’s Cabinet Office seeking to create a policy around the use of open standards in government IT. The idea here is to reduce the currently crippling costs of government IT systems by opening the procurement process up to more competition. One of the perceived problems with the current situation is that there are only a few providers of solutions who can cope with the government’s massive requirements, and that the monolithic solutions they provide are often very hard to substitute once they are in place. The solution, or part of it anyway, is to break up the requirements into smaller deliverables that could be provided by more and smaller companies. How do you get these smaller solutions to work together? Use standards, preferable ‘open’ ones. That ought to create a level playing field for all sizes of providers, and alongside that make it easier to pitch FOSS solutions – with their problems with more restrictive standards and tendency to be supported by SMEs – to government.
Initially the Cabinet Office just stated that they would mandate open standards in future government procurements. Unfortunately this ran into problem of definition. Just as with FRAND – no-one has a single, snappy definition of what and open standard actually is. It’s easy to assume – with Justice Potter Stewart – that we will know one when we see one, but in practice there are polarised views in this area. The Cabinet Office’s initial definition was not to everyone’s liking. To resolve this potential confusion, not to say conflict, the Cabinet Office launched a consultation exercise to help pin down exactly what an open standard is, according to the largest possible group of respondents. The deadline for this has since been extended after it emerged that a perception of bias might have been introduced by the conduct of the process.
Some evidence of the ructions that lead to the consultation exercise can be seem in the documents columnist Glynn Moody obtained through a Freedom of Information request. I will not attempt to summarise this weighty sheaf, but I would recommend glancing through them if you want to see how lobbying of the government over IT matters looks in its naked state. At issue is the idea that – as in the Cabinet Office’s initial definition – open standards should be entirely royalty free. Now obviously ‘at no cost’ is about as low a barrier to entry as one can get, at least in monetary terms, so it’s easy to see why the Cabinet Office adopted this definition from its original home at the W3C. For one thing, it would get around the GPL-compatibility issue mentioned above, and if used instead of a non-assert, also the ‘mode of use’ problem I have cited. However it would also exclude some existing technical standards (although not many – most are already royalty free), and clearly some players are not going to be happy with that…
OSS Watch is interested in the outcome of this process because – as a non-advocacy group – we are keen that all potential solutions are able to be assessed on their merits alone. We would strongly recommend that everyone responds to the UK government consultation exercise, in order that a truly communal definition of open standards can be achieved.
Don’t keep your data under your desk
It is a well-known problem for researchers. Data is being collected for a research project and no decision has been made about how to manage the data during the project. Naturally, once you have finalised the project and start publishing on the end results, you may deposit your final dataset in a institutional repository such as your university’s DSpace or E-prints repository, or you may even put it in Dryad. However, that is not sufficient to keep your data safe while you are still working on it. Often, such data ends up on a computer that just happens to lie around in the office or department, or even on the researcher’s local machine.
People that are conscious about back-up issues may be using a solution like Dropbox, SkyDrive or Google Drive, but some issues exist around data ownership and rights that may prevent you from wanting to use these services.
So what would be easier than just saving it in a folder, as you would with tools like Dropbox, but have it backed up by the institution, version-controlled automatically and keeping it within the trusted boundary of your organisation? And still allowing you to optionally share the folders with your research group, or a wider group of people, whichever is appropriate.
This is what the open source tool DataStage offers you. Developed as part of the DataFlow project, it is a piece of software that will be installed at, usually, the departmental level of your institution, but it can also be hosted in a virtual ‘cloud’ infrastructure. It allows you as a user to simply map a network drive to it. You save files as normal, and everything will be handled for you. Near the end of the project, when you start publishing and want to make the datasets available to a wider public, you can push any dataset to a SWORD-compliant repository, such as the ones mentioned above or to a DataBank instance.
The beauty about an open source project like DataStage is that anyone is welcome to use the software and contribute towards its ongoing development. You can imagine there are many more use cases for a tool like this, which are unrelated to research data. Take for example the popular Raspberry Pi project. In a classroom situation where where all the kids have their own little computer, they can submit their homework via DataStage to the teacher who can centrally check everything on the main server and mark their work. This smart different application was highlighted by David Shotton in its presentation during the DataFlow Launch Workshop on 2 March.
Are you curious about what DataStage can do for you? Come and download our beta release to try it out and join us on the DataFlow mailing list to tell us about your experiences and what may be improved. We would love to hear from you!
Survey Results Re Individual Memberships
As part of the OSI’s governance reform, we are planning to establish a mechanism for individuals to join the OSI. We recently conducted a survey to ask people about their interest in joining, and to learn some preferences about how best to proceed. We received more than 350 replies to the survey, which was initially announced at FOSDEM in February 2012. The respondents came from many different countries, and had widely differing experiences with open source software. We are grateful to those who took the time to participate in the survey.
OSI Supports Open Standards
Why Open?
This question was raised to me recently, and comes up frequently. It’s complicated by the fact that the word ‘open’ means many things to many people, but there are threads of commonality through all of the varying definitions. So the question is: “Why is openness useful to the public sector?” There are many answers to this, but here I’d like to concentrate on one that is perhaps less frequently cited.
In 2003, early in OSS Watch’s history, Sebastian Rahtz and Stuart Yeates drafted a policy on open source software for our funders the JISC, beginning it during a long train journey to an event. JISC had been receiving questions from the community about its attitude to open source, which was becoming a something of a hot topic. I had joined OSS Watch at its inception, having worked in other externally funded projects here in Oxford before that. One thing that had become clear quickly was that intellectual property rights were often an afterthought among projects, and that particularly where project work involved collaboration between institutions, failing to sort out those rights early could result in hair-tearing complications by project end. Where the problem was not solved, project outputs could remain undistributed, and the public money invested in them locked away. Of course JISC was even more aware of this than any individual institution. Thus the open source policy served the dual purposes of spelling out the benefits of open licensing of resources and introducing the idea that intellectual property rights needed to be dealt with early in a project’s lifecycle.
The policy introduced a presumption that software developed with JISC resources would be open source. While this might seem like a value judgement about openness, the fact that projects could make an argument against openness where they felt it would be detrimental was another key component. In practice projects could take either approach, but what they could not do was ignore the issue. The openness presumption provided a default exploitation model that would allow maximum reusability of the publicly funded resources. If the project’s host institution felt that a different llicensing model would suit the work better, then that option was open to them. All they needed to do was to justify it.
So one use of openness for publicly funded works is – I would argue – to stimulate creative thinking about exploitation. If the default assumption is that the intellectual property will be ‘in the cupboard’ and ready for exploitation when we get around to it, it is all too easy to postpone the decision. Operational complications can then mean it is forgotten altogether. If we begin with a default policy of openness, we know that this cannot happen, and the option to draft variant exploitation models means that we do not limit anyone’s creative thinking.
JISC were ahead of the curve in identifying the root problem here and implementing the policy to deal with it. As we have worked with other public funders over the years it has been extremely useful to point to the policy and the thinking behind it.
Graduating Apache Rave project demonstrates open innovation in software
The Apache Rave project graduated from the Incubator last month. This means that the Rave project has demonstrated to be a viable project community, which is being governed well according to the meritocratic principles of the Apache Software Foundation.
Apache Rave provides a next-generation portal engine, supporting (Open)Social Gadgets as well as WC3 widgets. Have a go with the latest release and you will see that it works out-of-the-box, but it can alternatively serve as the basis for an enterprise-level social portal application.
The way Rave came about and is still moving forward is a clear demonstration of open innovation in software. The project originated from a collaboration of partners that donated the software they have been developing independently, because they recognised that there was so much overlap in their projects, that collaborating on a shared codebase would generate more benefits than the effort that they would need to put in. And as such, Rave was built on the software code from MITRE Corporation, Indiana University Pervasive Technology Institute and SURFnet, all of which had donated their sources to the Apache Software Foundation. A major catalyst for bringing these partners together was the open-source CMS vendor Hippo and their chief architect Ate Douma who championed the project.
So here you have three projects that have put in several person-years to build a portal, and all of their code is now merged and available for everyone. Bringing all of the code together has been a huge task, and the current state of Rave is not a fully mature product yet, but it’s a viable basis for all to build on and is now working towards a 1.0 version. Participants come from different sectors and use the application in different contexts. Hippo may want to base their future CMS products on it, while Indiana University may want to use it for their Science Gateway portal application. The recognition that you can build on the underlying platform because your use cases are very similar on that level, enables all partners to save significantly on software development costs. Everyone benefits from the contribution that each partner makes.
As OSS Watch, we are working with other projects and new initiatives in the academic sector in the UK to help the sector here benefit from this approach and engage them onto the Rave platform. For example, the Bamboo project aims to develop a Virtual Research Environment for humanities scholars using an OpenSocial portal. They are already working with SURF, but we are now working with the project team in Oxford to see if they can build on the Apache Rave project. Some contributors from the Apache Wookie (Incubating) project came along and added support for W3C widgets and they are now using Rave for their European OMELETTE project. As Scott Wilson and Claudia Villalonga showed at the Open Source Junction workshop last month, building on a stack of open source projects enabled them to focus effort on new challenges in the project, rather than reinventing the wheel.
Ecosystem of Apache Rave projects
I use the slide above in some presentations because it truly shows open innovation in action because of the many projects and diverse range of collaborators that make up the Rave ecosystem. It is always work in progress and new collaborators are welcome at any time. So read more about the Rave project, try it out and subscribe to the user or developer mailing list. The community would love to hear from you!
Thank you, Michael Tiemann!
The OSI recently instituted a term limit on directors (after two terms, a director must be off the board for at least one year). In the most recent election cycle, this resulted in Michael Tiemann stepping down from the board and the Presidency. The newly-elected directors officially started their terms this April 1st, and their first meeting was today, April 4th. At that meeting, the Board passed the following resolution by unanimous & enthusiastic consent, with the new directors participating:
The Open Source Initiative thanks Michael Tiemann for his decade of service and sensitive leadership on the OSI Board, and hopes that he will continue to be involved with the OSI for a long time.
Thank you, Michael!OSI Welcomes Debian and CENATIC
OSI is very pleased to welcome two important new members to the Affiliate scheme for community groups.
Meeting at the Junction: cross-sector collaboration seeded at OSS Watch workshop
Last week saw the third edition of Open Source Junction. Two days of presentations and interactive sessions with representation of the commercial and the academic sector. It was a successful workshop with a lot of interesting interaction and new ideas for collaboration were being discussed.
The report of the workshop will be published shortly. For now, please have a read through the live blogs of days one and two below and check the slides of the sessions for more information.
<a href="http://www.coveritlive.com/mobile.php?option=com_mobile&task=viewaltcast&altcast_code=310b5ec92b" _mce_href="http://www.coveritlive.com/mobile.php?option=com_mobile&task=viewaltcast&altcast_code=310b5ec92b" >Open Source Junction 3 day 1</a>
<a href="http://www.coveritlive.com/mobile.php/option=com_mobile/task=viewaltcast/altcast_code=1de7fe4a8a" _mce_href="http://www.coveritlive.com/mobile.php/option=com_mobile/task=viewaltcast/altcast_code=1de7fe4a8a" >Open Source Junction 3 day 2</a>
A new EveryDesk is out!
We were particularly happy about out work on EveryDesk – a portable, fully working live Linux installation on a USB disk. But we found out that more and more people were looking for more space, a more modern environment, and in general to refresh things. We have been busy with out other pet project – CloudWeavers, a private cloud toolkit, and we redesigned EveryDesk to be the ideal client environment for companies and administrations that are moving totally or partially to a private or public cloud. We took several ideas from ChromeOS, but frankly speaking the hardware support was extremely limited, and even with exceptional ports like Hexxeh’s “Lime” the user experience is still less than optimal. We have basically redesigned everything – the base operating system is now derived from OpenSuse (mainly thanks to the excellent package management tool, that drastically increases the probability that the system would continue to work after an update – a welcome change from Ubuntu), we integrate Gnome 3, the latest Firefox and Chromium on a BTRFS install that supports compression and error concealment, so it works properly even on low-cost USB devices. On an 8Gb USB key, you get 4Gb free, and all the apps at your disposal, ready to go.
The only major change in hardware support is the fact that EveryDesk is now a 64-bit only operating system, but we believe that despite the limitation it can still be useful at large. It integrates some components that are maybe less interesting for individual use – for example the XtreemFS file system, that can be used to turn individual PCs into scale-out storage servers in a totally transparent way, and with great performance, or many virtualization enhancements. On the user side, we already installed some of our favorite additions among fonts, software, and tools; Firefox uses by default the exceptional Pdf.js embedded viewer, that uses no separate plugins and is faster than Adobe Acrobat, and there is the usual assortment of media codecs and ancillary little things.
We love every moment that we work on this project, and I would like to thank the many people that helped us, sent criticisms and praises. One wrote “I can’t believe how well it works, without time lags I normally associate with running on a CD or a thumb” and I can’t thank our users enough – they are our real value. As usual, you can download EveryDesk from Sourceforge.
OSI's new Board
Last Friday the OSI Board held a special meeting to fill vacancies that had arisen by the departure of three directors - Mike Godwin, Andrew Oliver and Michael Tiemann. Michael Tiemann left the Board after serving as OSI President for many years and leaves a large gap which the board will only fill thoughtfully; as a consequence, Martin Michlmayr, currently OSI Secretary, was temporarily appointed Acting President and the election of a full new President scheduled for a later meeting of the new Board. The Board warmly thanks all three for the contribution they have made to OSI.
Open sourcing software essential for reproducible science
I was very pleased last week to read that Nature published an editorial that argued for open sourcing software that had been used in the research leading up to a publication.
The ground principle is very simple: in order for claimed scientific results to be credible, it must be possible to verify those results. The key to doing that properly is the ability to reproduce these results. And if there is some piece of software code used to create these results, that is not made available to the scientific community, it is not possible for the wider community to reproduce the results.
Varies initiatives have been taken to ensure this academic principle is followed. For example, a conference like Sigmod installs a repeatability committee that will need the software used for the creation of accepted papers. Although this is good to ensure Sigmod’s papers have been thoroughly checked, it will not enable other researchers in the field to verify the results or to build on them.
Luckily, many scientists see open sourcing their code as a normal practice in their research, such as Daniel Lemire. The software project provides a solid basis for collaboration, and as such is an example of open innovation in academic communities.
One of the barriers that is cited against open sourcing code, is that the university may see commercial value in it and wish to commercialise it. It is important to realise, however, that there are many business models available for institutions that go far beyond just selling the raw outputs of a software projects. All of these still allow the institutions to create a viable business by adding value to what is available on the download page of a code hosting website. The true value of a software project is never just in the raw code.
At OSS Watch, we work with academic projects that develop software as part of their research and provide free support to the UK academic sector. So if you have a question about how to open source your code or how to deal with licences, we welcome you to contact us.
Upcoming JISC OSS Watch Webinars
This is just a quick plug for a webinar that I will be running – with the kind assistance of JISC – next Wednesday (7th March) on the topic: “Choosing the right open source licence”. To quote the blurb:
There are many free and open source software licences, and while they all broadly attempt to facilitate the same things, they also have some differences. Some of the major differences can be grouped together into categories, and this talk acts as an introduction to these categories. Having attended this session, you should be able to understand which decisions you should take in order to select a licence for your code.
Delegates will take away an understanding of:
- the main categories of open source licences available
- the implications of choosing one for the future of your software
Also, advance notice that the week after, on Wednesday March 14th, OSS Watch’s Sander Van Der Waal will be asking: “How healthy is your open source community?“
To be viable, academic projects using open source software need to ensure that people continue to engage with their project beyond initial funding. Similarly, academic institutions and businesses seeking to adopt open source solutions developed as part of academic projects need to be sure they can do so without exposing themselves to unmanageable risk. By using the Software Sustainability Maturity Model, both businesses and academic users and developers can identify any weak points in their development and governance processes, and address them as appropriate. This session will provide the participants with the skills to assess the non-technical aspects of open source software development.
Having attended this session, you will be able to answer the question: “Can a business collaboration be built around this open source project?” You will understand how to evaluate the health of an open source community and plan for sustainable engagements with companies interested in developing products or services based on it.
I hope you can join us!
Open source “matches proprietary code quality”
Sometimes we are asked to give an opinion on a particular piece of open source software and its quality in comparison to a specific closed source alternative. Of course, with the sheer number of projects and products out there, it is often very hard to answer these kind of questions with any authority, and this means that we can often not give a detailed answer. On one occasion where I was personally asked this kind of question, I gave the usual disclaimer and set about asking what contacts I had in that specific problem domain what their opinion was (for my own edification as much as that of the questioner). One particular response I got back was interesting; I’ll paraphrase as the communication was not intended to be public. In essence the respondent – someone with long years’ experience in this particular area – told me that they had heard good things about the open source implementation but that in their opinion only an idiot would ever use it for ‘real world tasks’. It stood to reason, they argued, that open source must necessarily be buggier and less professional than closed source, and notwithstanding anything they heard to the contrary about the quality of this particular solution, they could not recommend anyone waste their time with it.
Now as I say, the OSS Watch staff are not experts in every software-intensive problem domain, and so we do not gainsay actual experts lightly. Even so, in this case I noted to myself that I might be seeing a certain amount of unsupported prejudice. The problem is that code quality is a notoriously hard property to assess. Even users of the same program can have radically different impressions of its quality, stability and efficacy. One approach to arriving at verifiable metrics of code quality is static program analysis, where software is used to analyse the source code of other software and identify where problems might occur. One company that offers static analysis software and services is Coverity, and over the last five years, partnership with the United States Department of Homeland Security, they have been periodically assessing the quality of selected large open source projects. As might be hoped, the picture has been one of gradually increasing code quality with each survey.
This year for the first time Coverity made a direct comparison of open source and proprietary code quality, and the results were interesting (you may need to register in order to receive the pdf of the report). In the open source projects they examined (Linux, PHP, and PostgreSQL) rates of software defects were lower than in the corpus of proprietary closed code with which they compared (0.45 vs 0.61 problems per 1,000 lines of code respectively). Of course, we must be cautious about such a circumscribed survey. The three projects they chose are well supported, mature and active. They also, in common with the proprietary comparators, use the Coverity software to identify errors as part of their development processes. Therefore one could conclude – and Coverity seem keen that we do – that the real lesson here is that using their software reduces error rates whatever your licensing or development model. Still, it is useful to have some more evidence in the discussion of open source vs proprietary code quality.
Open Source Junction 3: mobile and cloud, Oxford 20-21 March 2012
Mobile technologies have become an integral part of our lives. Research indicates that by 2015 80% of people accessing the Internet will be doing so from mobile devices. Mobile applications and services are changing the way we engage with the web, and to a certain extent with each other.
At the same time, cloud technologies deliver better and better IT services. From email and content storage to complex computing and development platforms, users can access clouds via simple browsers, thus eliminating the need for end-user applications and high-power computers.
In UK Higher Education, cloud solutions are an integral part of a JISC programme aimed at helping universities and colleges deliver better efficiency and value for money through the development of shared services. As pointed out by Rachel Bruce, JISC’s Innovation Director for digital infrastructure, cloud solutions are increasingly attractive to HE institutions. They allow universities to reduce environmental and financial costs, share the load of maintaining a physical infrastructure, be flexible and operate on a pay-as-you-go basis, access data and applications from any location, and make scientific experiments easier to reproduce.
Mobile and cloud technologies are here to stay, and open source already plays a key part in the new market emerging from their cross-pollination, as the Future of Open Source survey indicates.
What are the effects of these two giants moving towards each other? What are the technological and economic implications of their intersection?
Open Source Junction 3 is an event specially designed to help you answer such questions by bringing together scientific innovation and business entrepreneurship to showcase early success and facilitate new collaboration in this sector. Over two days delegates from industry and academia will meet at Trinity College in Oxford to hear about highly innovative mobile cloud projects and work together to build industry-academia partnerships in this field.
One of the featured projects at OSJ3 will be ‘Access King’s Global Desktop’, which was implemented to support a new mobile working strategy at King’s College. Alex Hove from Getronix will describe King’s successful migration to a private cloud platform delivered over the JANET network, which provided some 25,000 staff and students with access to core applications from almost any device and any location.
Ajit Jaokar, founder of the publishing and research companyfuturetext, will highlight the technical architecture challenges facing a world of 50 billion connected devices predicted for 2020. He will discuss how the next generation web and new networking technologies such as white space networks may be able to address these challenges, in the context of the European ‘Smart cities’ initiative.
The ASTRA research project develops and tests low cost platforms capable of delivering scientific instruments into the stratosphere. Steve Johnston, Senior Research Fellow at Southampton University, will describe the hardware, data communications and software applications required for stratospheric flights, including the recent success of using Windows Phone 7 dataloggers with GSM communication and the Azure cloud for back-end processing.
The Eduserv Education Cloud provides compute and storage cloud IaaS for UK Higher and Further Education sectors. Eduserv was recently nominated as one of the top 20 tech companies to watch in 2012. Research Programme Director Andy Powell will describe the service and suggest ways in which cloud infrastructure such as Eduserv can support and enhance mobile technology projects.
Andrew Betts, founder of the web technology firm Assanka, will talk about the HTML5 web app they developed for Financial Times. HTML5 includes features that allow the creation of web applications that work while offline, and Assanka’s FT web app demonstrate how this capability provides excellent user experience in conditions of intermittent or no network access.
These are just a few of the projects that will be showcased at Open Source Junction 3. The programme also includes specially designed small group interaction sessions to help delegates understand each others’ value propositions and identify partnership opportunities.
Social activities on the evening of the first day, including a tour of the five century-old Trinity College, drinks at the historical Turf Tavern, and dinner at a surprise restaurant, will also offer excellent opportunities to network and further discuss collaboration.
Places are still available but early bird registration ends soon. We hope to see some of you at the event.
Open Source CADNANO is unbelievably cool
I just read this in Nature:
The researchers designed the structure of the nanorobots using open-source software, called Cadnano, developed by one of the authors — Shawn Douglas, a biophysicist at Harvard's Wyss Institute for Biologically Inspired Engineering. They then built the bots using DNA origami. The barrel-shaped devices, each about 35 nanometres in diameter, contain 12 sites on the inside for attaching payload molecules and two positions on the outside for attaching aptamers, short nucleotide strands with special sequences for recognizing molecules on the target cell. The aptamers act as clasps: once both have found their target, they spring open the device to release the payload.
These robots may be able to identify and target cancer cells.
Software sustainability != project sustainability
At OSS Watch, we take the quality of our documents very seriously. When we publish documents, we don’t just leave them alone on the website, but we nurture them, reviewing them every six months. This ensures that every one of our published briefing notes remains up-to-date and relevant.
Sometimes, reviewing a document means that we make some changes to reflect current thinking. This happened recently, when we were reviewing our document on advice for project bids. In that document, we discuss the sustainability aspects of funded software projects. The focus was on ‘project sustainability’ but reading that back it felt like an ambiguous term.
Our experience in working with projects like the DataFlow has confirmed that sustainability of the project, where this refers to the project as funded by eg. JISC or a research council, is related to but not the same as the sustainalibity of the software that is being developed as part of the project.
DataFlow, for example, creates a two-stage data management infrastructure that “makes it easy for you and your research group to manage your research data.” Two separate pieces of software, DataStage and DataBank, are being developed as part of the project. From our perspective, the value is in the software that is being developed and the communities that can be created around these communities to collaboratively develop the software further using the open development methodology. It is likely that DataFlow as a project will end. But the value of the software will remain and the ongoing sustainability will be through the software that has been developed. The software, and its associated community, documentation, etc., will be what can attract people from outside the funded project to become interested and involved. Of course, the software and development around it is a project in its own right, but to prevent ambiguity we now distinguish explicitly between software sustainability (ie. sustainability of the software being developed) and project sustainability (where this regards the project that funds the initial software development).
Sometimes it can even be just a small part of the software that gets picked up and becomes sustainability. If you read our Wookie case study, you’ll see that Apache Wookie (Incubating) originally was only part of the software that had been developed in the TenCompetence project. TenCompetence itself does not show a lot of activity, whereas the Wookie project is on its way to graduate from the ASF’s Incubator to become a top-level Apache project.
Our work with the DataFlow project is continuing, and we help organise a workshop in Oxford to which we invite everyone who is interested in the solutions that are being developed in both the DataFlow and VIDaaS project. Register now if you are interested!