Feeds

EuroPython: EuroPython 2017: Get ready for EuroPython Call for Proposals

Planet Python - Fri, 2017-03-24 05:38

Thinking of giving your contribution to EuroPython? Starting from March 27th you can submit a proposal on every aspect of Python: programming from novice to advanced levels, applications and frameworks, or how you have been involved in introducing Python into your organization. 

We offer a variety of different contribution formats that you can present at EuroPython: from regular talks to panel discussions, from trainings to posters; if you have ideas to promote real-time human-to-human-interaction or want to run yourself a helpdesk to answer other people’s python questions, this is your chance. 

Read our different opportunities on our website https://ep2017.europython.eu/en/speakers/call-for-proposals/ and start drafting your ideas. Call for Proposals opens in just 3 days!

Enjoy,

EuroPython 2017 Team

https://ep2017.europython.eu/

EuroPython Society

Categories: FLOSS Project Planets

Gocept Weblog: Sprinting to push Zope to the Python 3 wonderland

Planet Python - Fri, 2017-03-24 05:37

Earlier this year there was a sprint in Innsbruck, Austria. We made progress in porting Zope to Python 3 by working on RestrictedPython. After this sprint RestrictedPython no longer seems to be a blocker to port the parts of Zope which rely on RestrictedPython to Python 3.

See the full sprint report on the plone.org website.

We will work further on pushing Zope towards the Python 3 wonderland on the Zope 2 Resurrection Sprint in Halle/Saale, Germany at gocept in the first week of May 2017. You are welcome to  join us on site or remote.

Photo copyright: Christine Baumgartner


Categories: FLOSS Project Planets

Sylvain Beucler: Practical basics of reproducible builds

Planet Debian - Fri, 2017-03-24 04:40

As GNU FreeDink upstream, I'd very much like to offer pre-built binaries: one (1) official, tested, current, distro-agnostic version of the game with its dependencies.
I'm actually already doing that for the Windows version.
One issue though: people have to trust me -- and my computer's integrity.
Reproducible builds could address that.
My release process is tightly controlled, but is my project reproducible? If not, what do I need? Let's check!

I quickly see that documentation is getting better, namely https://reproducible-builds.org/ :)
(The first docs I read on reproducibility looked more like a crazed date-o-phobic rant than actual solution - plus now we have SOURCE_DATE_EPOCH implemented in gcc ;))

However I was left unsatisfied by the very high-level viewpoint and the lack of concrete examples.
The document points to various issues but is very vague about what tools are impacted.

So let's do some tests!

Let's start with a trivial program:

$ cat > hello.c #include <stdio.h> int main(void) { printf("Hello, world!\n"); }

OK, first does GCC compile this reproducibly?
I'm not sure because I heard of randomness in identifiers and such in the compilation process...

$ gcc-5 hello.c -o hello-5 $ md5sum hello-5 a00416d7392442321bad4afc5a461321 hello-5 $ gcc-5 hello.c -o hello-5 $ md5sum hello-5 a00416d7392442321bad4afc5a461321 hello-5

Cool, ELF compiler output is stable through time!
Now do 2 versions of GCC compile a hello world identically?

$ gcc-6 hello.c -o hello-6 $ md5sum hello-6 f7f52c2f5f82fe2a95061a771a6c5acd hello-6 $ hexcompare hello-5 hello-6 [lots of red] ...

Well let's not get our hopes too high ;)
Trivial build options change?

$ gcc-6 hello.c -lc -o hello-6 $ gcc-6 -lc hello.c -o hello-6b $ md5sum hello-6 hello-6b f7f52c2f5f82fe2a95061a771a6c5acd hello-6 f73ee6d8c3789fd8f899f5762025420e hello-6b $ hexcompare hello-6 hello-6b [lots of red] ...

OK, let's be very careful with build options then. What about 2 different build paths?

$ cd .. $ cp -a repro/ repro2/ $ cd repro2/ $ gcc-6 hello.c -o hello-6 $ md5sum hello-6 f7f52c2f5f82fe2a95061a771a6c5acd hello-6

Basic compilation is stable across directories.
Now I tried recompiling identically FreeDink on 2 different git clones.
Disappointment:

$ md5sum freedink/native/src/freedink freedink2/native/src/freedink 839ccd9180c72343e23e5d9e2e65e237 freedink/native/src/freedink 6d5dc6aab321fab01b424ac44c568dcf freedink2/native/src/freedink $ hexcompare freedink2/native/src/freedink freedink/native/src/freedink [lots of red]

Hmm, what about stripped versions?

$ strip freedink/native/src/freedink freedink2/native/src/freedink $ md5sum freedink/native/src/freedink freedink2/native/src/freedink 415e96bb54456f3f2a759f404f18c711 freedink/native/src/freedink e0702d798807c83d21f728106c9261ad freedink2/native/src/freedink $ hexcompare freedink/native/src/freedink freedink2/native/src/freedink [1 single red spot]

OK, what's happening? diffoscope to the rescue:

$ diffoscope freedink/native/src/freedink freedink2/native/src/freedink --- freedink/native/src/freedink +++ freedink2/native/src/freedink ├── readelf --wide --notes {} │ @@ -3,8 +3,8 @@ │ Owner Data size Description │ GNU 0x00000010 NT_GNU_ABI_TAG (ABI version tag) │ OS: Linux, ABI: 2.6.32 │ │ Displaying notes found in: .note.gnu.build-id │ Owner Data size Description │ GNU 0x00000014 NT_GNU_BUILD_ID (unique build ID bitstring) │ - Build ID: a689574d69072bb64b28ffb82547e126284713fa │ + Build ID: d7be191a61e84648a58c18e9c108b3f3ce500302

What on earth is Build ID and how it is computed?
After much digging, I find it's a 2008 plan with application in selecting matching detached debugging symbols.
https://fedoraproject.org/wiki/RolandMcGrath/BuildID is the most detailed overview/rationale I found.
It is supposed to be computed from parts of the binary. It's actually pretty resistant to changes, e.g. I could add the missing "return 0;" in my hello source and get the exact same Build ID!
On the other hand my FreeDink binaries do match except for the Build ID so there must be a catch.

Let's try our basic example with default ./configure CFLAGS:

$ (cd repro/ && gcc -g -O2 hello.c -o hello) $ (cd repro/ && gcc -g -O2 hello.c -o hello-b) $ md5sum repro/hello repro/hello-b 6b2cd79947d7c5ed2e505ddfce167116 repro/hello 6b2cd79947d7c5ed2e505ddfce167116 repro/hello-b # => OK for now $ (cd repro2/ && gcc -g -O2 hello.c -o hello) $ md5sum repro2/hello 20b4d09d94de5840400be05bc76e4172 repro2/hello $ strip repro/hello repro2/hello $ diffoscope repro/hello repro2/hello --- repro/hello +++ repro2/hello2 ├── readelf --wide --notes {} │ @@ -3,8 +3,8 @@ │ Owner Data size Description │ GNU 0x00000010 NT_GNU_ABI_TAG (ABI version tag) │ OS: Linux, ABI: 2.6.32 │ │ Displaying notes found in: .note.gnu.build-id │ Owner Data size Description │ GNU 0x00000014 NT_GNU_BUILD_ID (unique build ID bitstring) │ - Build ID: 462a3c613537bb57f20bd3ccbe6b7f6d2bdc72ba │ + Build ID: b4b448cf93e7b541ad995075d2b688ef296bd88b # => issue reproduced with -g -O2 and different build directories $ (cd repro/ && gcc -O2 hello.c -o hello) $ (cd repro2/ && gcc -O2 hello.c -o hello) $ md5sum repro/hello repro2/hello 1571d45eb5807f7a074210be17caa87b repro/hello 1571d45eb5807f7a074210be17caa87b repro2/hello # => culprit is not -O2, so culprit is -g

Bummer. So the build ID must be computed also from the debug symbols, even if I strip them afterwards :(
OK, so when https://reproducible-builds.org/docs/build-path/ says "Some tools will record the path of the source files in their output", that means the compiler, and more importantly the stripped executable.

Conclusion: apparently to achieve reproducible builds I need identical full build paths and to keep track of them.

What about Windows/MinGW btw?

$ /opt/mxe/usr/bin/i686-w64-mingw32.static-gcc hello.c -o hello.exe $ md5sum hello.exe e0fa685f6866029b8e03f9f2837dc263 hello.exe $ /opt/mxe/usr/bin/i686-w64-mingw32.static-gcc hello.c -o hello.exe $ md5sum hello.exe df7566c0ac93ea4a0b53f4af83d7fbc9 hello.exe $ /opt/mxe/usr/bin/i686-w64-mingw32.static-gcc hello.c -o hello.exe $ md5sum hello.exe bbf4ab22cbe2df1ddc21d6203e506eb5 hello.exe

PE compiler output is not stable through time.
(any clue?)

OK, there's still a long road ahead of us...

There are lots of other questions.
Is autoconf output reproducible?
Does it actually matter if autoconf is reproducible if upstream is providing a pre-generated ./configure?
If not what about all the documentation on making tarballs reproducible, along with the strip-nondeterminism tool?
Where do we draw the line between build and build environment?
What are the legal issues of distributing a docker-based build environment without every single matching distro source packages?

That was my modest contribution to practical reproducible builds documentation for developers, I'd very much like to hear about more of it.
Who knows, maybe in the near future we'll get reproducible official builds for Eclipse, ZAP, JetBrains, Krita, Android SDK/NDK...

Categories: FLOSS Project Planets

Third & Grove: One Image Field, Multiple Aspect Ratios

Planet Drupal - Fri, 2017-03-24 03:30
One Image Field, Multiple Aspect Ratios abby Fri, 03/24/2017 - 03:30
Categories: FLOSS Project Planets

Catalin George Festila: Take weather data with pyowm from openweathermap .

Planet Python - Fri, 2017-03-24 02:53
This tutorial shows you how to download and install the pyowm python module.
One of the great things about using this python module let you to provide data from openweathermap website (need to have one account).
PyOWM runs on Python 2.7 and Python 3.2+, and integrates with Django 1.10+ models.
All documentation can be found here.

The install is simple with pip , python 2.7 and Fedora 25.
 [root@localhost mythcat]# pip install pyowm
Collecting pyowm
Downloading pyowm-2.6.1.tar.gz (3.6MB)
100% |████████████████████████████████| 3.7MB 388kB/s
Building wheels for collected packages: pyowm
Running setup.py bdist_wheel for pyowm ... done
Stored in directory: /root/.cache/pip/wheels/9a/91/17/bb120c765f08df77645cf70a16aa372d5a297f4ae2be749e81
Successfully built pyowm
Installing collected packages: pyowm
Successfully installed pyowm-2.6.1
The source code is very simple just connect with API key and print data.
#/usr/bin/env python
#" -*- coding: utf-8 -*-
import pyowm

print " Have a account to openweathermap.org and use with api key free or pro"
print " owm = pyowm.OWM(API_key='your-API-key', subscription_type='pro')"

owm = pyowm.OWM("327407589df060c7f825b63ec1d9a096")
forecast = owm.daily_forecast("Falticeni,ro")
tomorrow = pyowm.timeutils.tomorrow()
forecast.will_be_sunny_at(tomorrow)

observation = owm.weather_at_place('Falticeni,ro')
w = observation.get_weather()
print (w)
print " Weather details"
print " =============== "

print " Get cloud coverage"
print w.get_clouds()
print " ----------------"
print " Get rain volume"
print w.get_rain()
print " ----------------"
print " Get snow volume"
print w.get_snow()

print " Get wind degree and speed"
print w.get_wind()
print " ----------------"
print " Get humidity percentage"
print w.get_humidity()
print " ----------------"
print " Get atmospheric pressure"
print w.get_pressure()
print " ----------------"
print " Get temperature in Kelvin degs"
print w.get_temperature()
print " ----------------"
print " Get temperature in Celsius degs"
print w.get_temperature(unit='celsius')
print " ----------------"
print " Get temperature in Fahrenheit degs"
print w.get_temperature('fahrenheit')
print " ----------------"
print " Get weather short status"
print w.get_status()
print " ----------------"
print " Get detailed weather status"
print w.get_detailed_status()
print " ----------------"
print " Get OWM weather condition code"
print w.get_weather_code()
print " ----------------"
print " Get weather-related icon name"
print w.get_weather_icon_name()
print " ----------------"
print " Sunrise time (ISO 8601)"
print w.get_sunrise_time('iso')
print " Sunrise time (GMT UNIXtime)"
print w.get_sunrise_time()
print " ----------------"
print " Sunset time (ISO 8601)"
print w.get_sunset_time('iso')
print " Sunset time (GMT UNIXtime)"
print w.get_sunset_time()
print " ----------------"
print " Search current weather observations in the surroundings of"
print " Latitude and longitude coordinates for Fălticeni, Romania:"
observation_list = owm.weather_around_coords(47.46, 26.30)
Let's see and the result of running the python script for one random location:
 [root@localhost mythcat]# python openweather.py
Have a account to openweathermap.org and use with api key free or pro
owm = pyowm.OWM(API_key='your-API-key', subscription_type='pro')

Weather details
===============
Get cloud coverage
20
----------------
Get rain volume
{}
----------------
Get snow volume
{}
Get wind degree and speed
{u'speed': 5.7, u'deg': 340}
----------------
Get humidity percentage
82
----------------
Get atmospheric pressure
{'press': 1021, 'sea_level': None}
----------------
Get temperature in Kelvin degs
{'temp_max': 287.15, 'temp_kf': None, 'temp': 287.15, 'temp_min': 287.15}
----------------
Get temperature in Celsius degs
{'temp_max': 14.0, 'temp_kf': None, 'temp': 14.0, 'temp_min': 14.0}
----------------
Get temperature in Fahrenheit degs
{'temp_max': 57.2, 'temp_kf': None, 'temp': 57.2, 'temp_min': 57.2}
----------------
Get weather short status
Clouds
----------------
Get detailed weather status
few clouds
----------------
Get OWM weather condition code
801
----------------
Get weather-related icon name
02d
----------------
Sunrise time (ISO 8601)
2017-03-24 04:08:33+00
Sunrise time (GMT UNIXtime)
1490328513
----------------
Sunset time (ISO 8601)
2017-03-24 16:33:59+00
Sunset time (GMT UNIXtime)
1490373239
----------------
Search current weather observations in the surroundings of
Latitude and longitude coordinates for Fălticeni, Romania:
Categories: FLOSS Project Planets

tanay.co.in: Drupal Community Deadlock - The way for Drupal Association to restore normalcy and come out clean

Planet Drupal - Fri, 2017-03-24 02:48

Drupal Community is in a tough situation. A series of events had led to a popular contributor and leader from the Drupal community, Larry Garfield, to be asked to step down. Apparently, there was some incriminating evidence against Larry, that have led Drupal Association (DA) to take this decision.

 

I am not going into the details of the issue, which you can understand from here, here, here and here.

 

Several people, over the past day, have asked me if I approve of DA’s decision and Dries’ actions. I had chosen to not take a side, for I believe there has been an imbalance of information available to me from both perspectives to be able to form a sound opinion about the issue.

 

I see the current situation in a deadlock.

  1. A large section of the Drupal community has expressed displeasure and disapproval of DA/CWG’s decision, demanding that any evidence that can prove in any way that Larry has violated the Code of Conduct, or in anyway abused anyone, or used his position in the Drupal community to force his ways of life on anyone, be made public, justifying DA’s decision.

  2. DA/CWG has so far refused to disclose any evidence or supporting details, apparently in the interest of protecting the privacy of the other members who provided the evidence, as well as in the interest of Larry.

 

It is difficult to restore normalcy and trust in DA without releasing more information and evidence. And DA would not want to do it to protect the members who shared the evidence to DA. Hence the deadlock.

 

Drupal Community, as I have known it, has always been the most open and embracing community to people of all races, gender, orientation and ways of life. And I strongly believe it still remains the same.

 

To see something like this break the trust that the community and Drupal Association has earned over the years, is disheartening.

 

Here are a couple of ways, I believe, in which DA can restore the faith of the community on the association and CWG:

 

  1. Release the evidence, withholding any personal details of any of the personnel who provided the evidence, masking any data that could either directly or indirectly lead to any specific individuals from being recognized.

     

  2. If the evidence has such personal information scattered all over, making it difficult for DA to make the evidence public, DA could invite a panel of five prominent leaders from Drupal or Opensource communities all over, who have publicly expressed their disapproval of DA’s decision. DA can then share the evidence with the panel, who could then review the decision and express their opinion about if DA’s decision to ask Larry to step down was reasonable, considering the evidence. The panel members, before being able to access the evidence, should agree to ensure they will keep all evidence shared with them extremely confidential, and will not disclose anything from the evidence, except their opinion about whether the decision taken by DA was reasonable.

     

Another idea for building such Panel easier would be - DA’s elections for Position of Director at Large have just ended. The list of candidates is here -  Inviting the top 5 contestants, who won the maximum votes, could make such panel formation pretty quick.

 

DA might not be technically accountable to validate every decision of their board by some external panel. However the onus is on DA to restore any lost confidence of the community on DA/CWG and to come out clean.  

 

If there is anything that DA could do, to restore the lost trust and confidence that the community has vested so far on DA, and lost significantly over the past couple of days, I think this would be it.

 

DA/CWG still has my confidence and trust. However, with the data available in front of me, I am inclined to believe that the decision taken by DA was not reasonable or justified. Although I still believe this is because DA hasn’t made public, their perspective of things and events. Would love to see DA go the extra mile to restore everyone’s confidence on the association.

 

Thumbnail Image: Source

 
Categories: FLOSS Project Planets

PreviousNext: Migrating Drupal 7 File Entities to Drupal 8 Media Entities

Planet Drupal - Thu, 2017-03-23 21:48

The Drupal 8.3.x branch is getting ready to introduce a new experimental media module. This will bring enhanced media handling in Drupal 8. The closest solution in Drupal 7 to handle media is the file entity module. Now is the time to discuss migrations from file entity in Drupal 7 to media entities in Drupal 8. For core, there is already an issue for this, but for contrib... there is no migration. So, I wrote one.

Categories: FLOSS Project Planets

Dirk Eddelbuettel: RApiDatetime 0.0.1

Planet Debian - Thu, 2017-03-23 21:30

Very happy to announce a new package of mine is now up on the CRAN repository network: RApiDatetime.

It provides six entry points for C-level functions of the R API for Date and Datetime calculations: asPOSIXlt and asPOSIXct convert between long and compact datetime representation, formatPOSIXlt and Rstrptime convert to and from character strings, and POSIXlt2D and D2POSIXlt convert between Date and POSIXlt datetime. These six functions are all fairly essential and useful, but not one of them was previously exported by R. Hence the need to put them together in the this package to complete the accessible API somewhat.

These should be helpful for fellow package authors as many of us have either our own partial copies of some of this code, or rather farm back out into R to get this done.

As a simple (yet real!) illustration, here is an actual Rcpp function which we could now cover at the C level rather than having to go back up to R (via Rcpp::Function()):

inline Datetime::Datetime(const std::string &s, const std::string &fmt) { Rcpp::Function strptime("strptime"); // we cheat and call strptime() from R Rcpp::Function asPOSIXct("as.POSIXct"); // and we need to convert to POSIXct m_dt = Rcpp::as<double>(asPOSIXct(strptime(s, fmt))); update_tm(); }

I had taken a first brief stab at this about two years ago, but never finished. With the recent emphasis on C-level function registration, coupled with a possible use case from anytime I more or less put this together last weekend.

It currently builds and tests fine on POSIX-alike operating systems. If someone with some skill and patience in working on Windows would like to help complete the Windows side of things then I would certainly welcome help and pull requests.

For questions or comments please use the issue tracker off the GitHub repo.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

Categories: FLOSS Project Planets

Vasudev Ram: Analysing that Python code snippet

Planet Python - Thu, 2017-03-23 20:49
By Vasudev Ram

Hi readers,

Some days ago I had written this post:

Analyse this Python code snippet

in which I had shown a snippet of Python code (run in the Python shell), and said:

"Analyse the snippet of Python code below. See what you make of it. I will discuss it in my next post."

I am a few days late in discussing it; sorry about that.

Here is the analysis:

First, here's the the snippet again, for reference:
>>> a = 1
>>> lis = [a, 2 ]
>>> lis
[1, 2]
>>> lis = [a, 2 ,
... "abc", False ]
>>>
>>> lis
[1, 2, 'abc', False]
>>> a
1
>>> b = 3
>>> lis
[1, 2, 'abc', False]
>>> a = b
>>> a
3
>>> lis
[1, 2, 'abc', False]
>>> lis = [a, 2 ]
>>> lis
[3, 2]
>>>

The potential for confusion (at least, as I said, for newbie Pythonistas) lies in these apparent points:

The variable a is set to 1.
Then it is put into the list lis, along with the constant 2.
Then lis is changed to be [a, 2, "abc", False].
One might now think that the variable a is stored in the list lis.
The next line prints its value, which shows it is 1.
All fine so far.
Then b is set to 3.
Then a is set to b, i.e. to the value of b.
So now a is 3.
But when we print lis again, it still shows 1 for the first item, not 3, as some might expect (since a is now set to 3).
Only when we run the next line:
lis = [a, 2]
and then print lis again, do we see that the first item in lis is now 3.

This has to do with the concept of naming and binding in Python.

When a Python statement like:
a = 1
is run, naming and binding happens. The name on the left is first created, and then bound to the (value of the) object on the right of the equals sign (the assignment operator). The value can be any expression, which, when evaluated, results in a value (a Python object [1]) of some kind. In this case it is the int object with value 1.

[1] Almost everything in Python is an object, like almost everything in Unix is a file. [Conditions apply :)]

When that name, a, is used in an expression, Python looks up the value of the object that the name is bound to, and uses that value in the expression, in place of the name.

So when the name a was used inside any of the lists that were bound to the name lis, it was actually the value bound to the name a that was used instead. So, the first time it was 1, so the first item of the list became 1, and stayed as 1 until another binding of some other (list) object to the name lis was done.

But by this time, the name a had been rebound to another object, the int 3, the same one that name b had been earlier bound to just before. So the next time that the name lis was bound to a list, that list now included the value of the current object that name a was now bound to, which was 3.

This is the reason why the code snippet works as it does.

On a related note (also about Python language features, syntax and semantics), I was playing around with the pprint module (Python's pretty-printer) and the Python is operator, and came up with this other snippet:

>>> import pprint
>>> lis = []
>>> for i in range(10):
... lis.append(lis)
...
>>> print lis
[[...], [...], [...], [...], [...], [...], [...], [...], [...], [...]]

>>> pprint.pprint(lis)
[<recursion on list with id=32809968>,
<recursion on list with id=32809968>,
<recursion on list with id=32809968>,
<recursion on list with id=32809968>,
<recursion on list with id=32809968>,
<recursion on list with id=32809968>,
<recursion on list with id=32809968>,
<recursion on list with id=32809968>,
<recursion on list with id=32809968>,
<recursion on list with id=32809968>]

>>> len(lis)
10

>>> lis is lis[0]
True

>>> lis is lis[0] is lis[0][0]
True

>>> lis is lis[0] is lis[0][0] is lis[0][0][0]
True

in which I created a list, appended it to itself, and then used pprint.pprint on it. Also used the Python is operator between the list and its 0th item, recursively, and was interested to see that the is operator can be used in a chain. I need to look that up (pun intended).

Enjoy.

- Vasudev Ram - Online Python training and consulting

Get updates (via Gumroad) on my forthcoming apps and content.

Jump to posts: Python * DLang * xtopdf

Subscribe to my blog by email

My ActiveState Code recipes

Follow me on: LinkedIn * Twitter

Are you a blogger with some traffic? Get Convertkit:

Email marketing for professional bloggers

Share |



Vasudev Ram
Categories: FLOSS Project Planets

Thomas Guest: From bytes to strings in Python and back again

Planet Python - Thu, 2017-03-23 20:00

Low level languages like C have little opinion about what goes in a string, which is simply a null-terminated sequence of bytes. Those bytes could be ASCII or UTF-8 encoded text, or they could be raw data — object code, for example. It’s quite possible and legal to have a C string with mixed content.

char const * mixed = "EURO SIGN " // ASCII "UTF-8 \xE2\x82\xAC " // UTF-8 encoded EURO SIGN "Latin-9 \xA4"; // Latin-9 encoded EURO SIGN

This might seem indisciplined and risky but it can be useful. Environment variables are notionally text but actually C strings, for example, meaning they can hold whatever data you want. Similarly filenames and command line parameters are only loosely text.

A higher level language like Python makes a strict distinction between bytes and strings. Bytes objects contain raw data — a sequence of octets — whereas strings are Unicode sequences. Conversion between the two types is explicit: you encode a string to get bytes, specifying an encoding (which defaults to UTF-8); and you decode bytes to get a string. Clients of these functions should be aware that such conversions may fail, and should consider how failures are handled.

Simply put, a string in Python is a valid Unicode sequence. Real world text data may not be. Programmers need to take charge of reconciling any discrepancies.

We faced such problems recently at work. We’re in the business of extracting meaning from clinical narratives — text data stored on medical records systems in hospitals, for example. These documents may well have passed through a variety of systems. They may be unclear about their text encoding. They may not be encoded as they claim. So what? They can and do contain abbreviations, mispellings, jargon and colloquialisms. Refining the signal from such noise is our core business: if we can correctly interpret positional and temporal aspects of a sentence such as:

Previous fracture of left neck of femur

then we can surely deal with text which claims to be UTF-8 encoded but isn’t really.

Our application stack is server-based: a REST API to a Python application handles document ingest; lower down, a C++ engine does the actual document processing. The problem we faced was supporting a modern API capable of handling real world data.

It’s both undesirable and unnecessary to require clients to clean their text before submitting it. We want to make the ingest direct and idiomatic. Also, we shouldn’t penalise clients whose data is clean. Thus document upload is an HTTP POST request, and the document content is a JSON string — rather than, say, base64 encoded binary data. Our server, however, will be permissive about the contents of this string.

So far so good. Postel’s prescription advises:

Be liberal in what you accept, and conservative in what you send.

This would suggest accepting messy text data but presenting it in a cleaned up form. In our case, we do normalise the input data — a process which includes detecting and standardising date/time information, expanding abbreviations, fixing typos and so on — but this normalised form links back to a faithful copy of the original data. What gets presented to the user is their own text annotated with our findings. That is, we subscribe to a more primitive prescription than Postel’s:

Garbage in, garbage out

with the caveat that the garbage shouldn’t be damaged in transit.

Happily, there is a simple way to pass dodgy strings through Python. It’s used in the standard library to handle text data which isn’t guaranteed to be clean — those environment variables, command line parameters, and filenames for example.

The surrogateescape error handler smuggles non-decodable bytes into the (Unicode) Python string in such a way that the original bytes can be recovered on encode, as described in PEP 383:

On POSIX systems, Python currently applies the locale’s encoding to convert the byte data to Unicode, failing for characters that cannot be decoded. With this PEP, non-decodable bytes >= 128 will be represented as lone surrogate codes U+DC80..U+DCFF.

This workaround is possible because Unicode surrogates are intended for use in pairs. Quoting the Unicode specification, they “have no interpretation on their own”. The lone trailing surrogate code — the half-a-pair — can only be the result of a surrogateescape error handler being invoked, and the original bytes can be recovered by using the same error handler on encode.

In conclusion, text data is handled differently in C++ and Python, posing a problem for layered applications. The surrogateescape error handler provides a standard and robust way of closing the gap.

Unicode Surrogate Pairs

Code Listing >>> mixed = b"EURO SIGN \xE2\x82\xAC \xA4" >>> mixed b'EURO SIGN \xe2\x82\xac \xa4' >>> mixed.decode() Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa4 in position 14: invalid start byte >>> help(mixed.decode) Help on built-in function decode: decode(encoding='utf-8', errors='strict') method of builtins.bytes instance Decode the bytes using the codec registered for encoding. encoding The encoding with which to decode the bytes. errors The error handling scheme to use for the handling of decoding errors. The default is 'strict' meaning that decoding errors raise a UnicodeDecodeError. Other possible values are 'ignore' and 'replace' as well as any other name registered with codecs.register_error that can handle UnicodeDecodeErrors. >>> mixed.decode(errors='surrogateescape') 'EURO SIGN € \udca4' >>> s = mixed.decode(errors='surrogateescape') >>> s.encode() Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'utf-8' codec can't encode character '\udca4' in position 12: surrogates not allowed >>> s.encode(errors='surrogateescape') b'EURO SIGN \xe2\x82\xac \xa4'
Categories: FLOSS Project Planets

Justin Mason: Links for 2017-03-23

Planet Apache - Thu, 2017-03-23 19:58
Categories: FLOSS Project Planets

Kubuntu 17.04 Beta 2 released for testers

Planet KDE - Thu, 2017-03-23 19:46

Today the Kubuntu team is happy to announce that Kubuntu Zesty Zapus (17.04) Beta 2 is released . With this Beta 2 pre-release, you can see and test what we are preparing for 17.04, which we will be releasing April 13, 2017.

Kubuntu 17.04 Beta 2

 

NOTE: This is Beta 2 Release. Kubuntu Beta Releases are NOT recommended for:

* Regular users who are not aware of pre-release issues
* Anyone who needs a stable system
* Anyone uncomfortable running a possibly frequently broken system
* Anyone in a production environment with data or work-flows that need to be reliable

Getting Kubuntu 17.04 Beta 2:
* Upgrade from 16.10: run `do-release-upgrade -d` from a command line.
* Download a bootable image (ISO) and put it onto a DVD or USB Drive : http://cdimage.ubuntu.com/kubuntu/releases/zesty/beta-2/

Release notes: https://wiki.ubuntu.com/ZestyZapus/Beta2/Kubuntu

Categories: FLOSS Project Planets

Carl Chenet: Feed2tweet 1.0, tool to post RSS feeds to Twitter, released

Planet Python - Thu, 2017-03-23 19:00

Feed2tweet 1.0, a self-hosted Python app to automatically post RSS feeds to the Twitter social network, was released March 2017, 23th.

The main new feature of this release allows to create filters for each RSS feed, because before you could only define global filters. Contributed by Antoine Beaupré, Feed2tweet is also able to use syslog, starting from this release.

What’s the purpose of Feed2tweet?

Some online services offer to convert your RSS entries into Twitter posts. Theses services are usually not reliable, slow and don’t respect your privacy. Feed2tweet is Python self-hosted app, the source code is easy to read and you can enjoy the official documentation online with lots of examples.

Twitter Out Of The Browser

Have a look at my Github account for my other Twitter automation tools:

  • Retweet , retweet all (or using some filters) tweets from a Twitter account to another one to spread content.
  • db2twitter, get data from SQL database (several supported), build tweets and send them to Twitter
  • Twitterwatch, monitor the activity of your Twitter timeline and warn you if no new tweet appears

What about you? Do you use tools to automate the management of your Twitter account? Feel free to give me feedback in the comments below.

… and finally

You can help Feed2tweet by donating anything through Liberaypay (also possible with cryptocurrencies). That’s a big factor motivation

Categories: FLOSS Project Planets

Drupal Association blog: A Statement from the Executive Director

Planet Drupal - Thu, 2017-03-23 18:49

We understand that there is uncertainty and concern in the Drupal community about project founder, Dries Buytaert, asking Larry Garfield to leave the Drupal community, and about the Drupal Association removing Larry's DrupalCon sessions and ending his term as track chair.

We want to be clear that the decision to remove Larry's DrupalCon session and track chair role was not because of his private life or personal beliefs. The Drupal Association stands by our values of inclusivity. Our decision was based on confidential information conveyed in private by many sources. Due to the confidential nature of the situation we cannot and will not disclose any information that may harm any members of our community, including Larry.

This decision followed our established process. As the Executive Director, charged with safekeeping the goodwill of the organization, I made this decision after considering input from various sources including the Community Working Group (CWG) and Drupal Project Lead, Dries Buytaert. Upon Larry’s request for an appeal, the full board reviewed the situation, all the evidence, and statements provided by Larry. After reviewing the entirety of the information available (including information not in the public view) the decision was upheld.

In order to protect everyone involved we cannot comment more, and trust that the community will be understanding.  

We do see that there are many feelings and questions around this DrupalCon decision and we empathize with those community members. We will continue to monitor comments. We are listening.

Categories: FLOSS Project Planets

NumFOCUS: PyData Atlanta Meetup Celebrates 1 Year and over 1,000 members

Planet Python - Thu, 2017-03-23 17:23
PyData Atlanta holds a meetup at MailChimp, where Jim Crozier spoke about analyzing NFL data with PySpark. Atlanta tells a new story about data by Rob Clewley

In late 2015, the three of us (Tony Fast, Neel Shivdasani, and myself) had been regularly  nerding out about data over beers and becoming fast friends. We were eager to see a shift from Atlanta's data community to be more welcoming and encouraging towards beginners, self-starters, and generalists. We were about to find out that we were not alone.

We had met at local data science-related events earlier in the year and had discovered that we had lots of opinions—and weren’t afraid to advocate for them. But we also found that we listened to reason (data-driven learning!), appreciated the art in doing good science, and cared about people and the community. Open science, open data, free-and-open-source software, and creative forms of technical communication and learning were all recurring themes in our conversations. We also all agreed that Python is a great language for working with data.

Invitations were extended to like-minded friends, and the informal hangout was soon known as “Data Beers”. The consistent good buzz that Data Beers generated helped us realize an opportunity to contribute more widely to the Atlanta community. At the time, Atlanta was beginning its emergence as a new hub in the tech world and startup culture.

Some of the existing data-oriented meetups around Atlanta have a more formal business atmosphere, or are highly focused on specific tools or tech opinions. Such environments seem to intimidate newcomers and those less formally educated in math or computer science. This inspired us to take a new perspective through an informal and eclectic approach. So, in January 2016, with the support of not-for-profit organization NumFOCUS, we set up the Atlanta chapter of PyData.

The mission of NumFOCUS is to promote sustainable high-level programming languages, open code development, and reproducible scientific research. NumFOCUS sponsors PyData conferences and local meetups internationally. The PyData community gathers to discuss how best to apply tools using Python, R, Stan, and Julia to meet evolving challenges in data management, processing, analytics, and visualization. In all, PyData is over 28,000 members across 52 international meetups. The Python language and the data-focused ecosystem that has grown around it has been remarkably successful in attracting an inclusive mindset centered around free and open-source software and science. Our Atlanta chapter aims to be even more neutral about specific technologies so long as the underlying spirit resonates with our mission.

The three of us, with the help of friend and colleague Lizzy Rolando, began sourcing great speakers who have a distinctive approach to using data that resonated with the local tech culture. We hosted our first meetup in early April. From the beginning, we encouraged a do-it-yourself, interactive vibe to our meetings, supporting shorter-format 30 minute presentations with 20 minute question and answer sessions.

Regardless of the technical focus, we try to bring in speakers who are applying their data-driven work to something of general interest. Our programming balances technical and more qualitative talks. Our meetings have covered a diverse range of applications, addressing computer literacy and education, human rights, neuroscience, journalism, and civics.

A crowd favorite is the inclusion of 3-4 audience-submitted lightning talks at the end of the main Q&A. The strictly five-minute talks add more energy to the mix and give a wider platform to the local community. They’re an opportunity to practice presentation skills for students, generate conversations around projects needing collaborators, discussions about new tools, or just have fun looking at interesting data sets.

Students, career changers, and professionals have come together as members of PyData to learn and share. Our network has generated new friends, collaborators, and even new jobs. Local organizations that share our community spirit provide generous sponsorship and refreshments for our meetings.

We believe we were in the right place at the right time to meet a need. It’s evident in the positive response and rapid growth we’ve seen, having acquired over 1,000 members in one year and hosted over 120 attendees at our last event. It has been a whirlwind experience, and we are delighted that our community has shared our spirit and become involved with us so strongly. Here’s to healthy, productive, data-driven outcomes for all of us in 2017!
Categories: FLOSS Project Planets

Stefan Bodewig: XMLUnit.NET 2.3.1 Released

Planet Apache - Thu, 2017-03-23 16:05
This release adds xml docs to the binary distribution and the nuget package, it doesn't contain any functional changes.
Categories: FLOSS Project Planets

San Francisco Open Source Voting System Project Continues On

Open Source Initiative - Thu, 2017-03-23 15:13

This update on San Francisco's project to develop and certify the country's first open source voting system was submitted by OSI Individual Member Chris Jerdonek. While Chris is a member (and President) of the San Francisco Elections Commission, he is providing this update as an individual and not in his official capacity as a Commissioner. Chirs' e-mail is, chris@sfopenvoting.org and a website with that domain is expected soon.

Some "Action Items"

Below are some things you can do to help the effort:

  • Show support by following @SFOpenVoting on Twitter: https://twitter.com/SFOpenVoting
  • Retweet the following tweet about a recent front-page news story to help spread the word: https://twitter.com/SFOpenVoting/status/834136200663310336
  • Reply to Chris with the words "keep me posted" if you'd like me to notify you sooner if something interesting happens related to the project (e.g. an RFP or job posting getting posted, an important Commission meeting coming up, the publishing of a news piece about SF open source voting, etc).
  • Reply to Chris with the words "might want to help" if you might like to help organize or be part of a core group of activists to help build more support and otherwise help the project succeed. This would likely start off with a small organizing meeting.
  • Show your support and come watch the next Elections Commission meeting on Wed, April 19 at 6pm in Room 408 of San Francisco City Hall: http://sfgov.org/electionscommission/
San Francisco Examiner Article

At the February 15 Elections Commission meeting, the Elections Commission voted unanimously to ask the Mayor's Office to allocate $4 million towards initial development of the open source voting project for the 2018-19 fiscal year (from Aug. 2018 - July 2019). This would go towards initial development once the planning phase is complete.

The San Francisco Examiner wrote a good article about this development here (with the headline appearing on the front page): http://www.sfexaminer.com/sfs-elections-commission-asks-mayor-put-4m-toward-open-source-voting-system/

Latest Project Updates

The open source voting project is starting to gain some definition. For the latest, read the March 2017 Director's Report in the agenda packet of last week's March 15 Elections Commission meeting: http://sfgov.org/electionscommission/commission-agenda-packet-march-15-2017

A few highlights from the report:

  • Very soon (perhaps within the next few days), the Department will be posting a job opening for a senior staff position to assist with the project.
  • In addition, by the end of March or so, the Department will be issuing an RFP for an outside contractor to help plan and create a "business case" for the project.
  • Software for the project will be released under version 3 of the GNU General Public License where possible. (GPL-3.0 is a copyleft license, which means that future changes would also be assured open source.)
  • The software is projected to be released to the public as it is written, which would be great for increased public visibility and transparency.
Citizen's Advisory Committee

Also at last week's Elections Commission meeting, the Commission started discussing the idea of forming a Citizen's Advisory Committee to help guide the open source voting project. This is an idea that was raised at the February meeting, as well as previously.

At the meeting, it was suggested that the committee help advise primarily on technical matters -- things like agile procurement approaches, project management, open source issues, and engineering / architecture issues.

The Commission will be taking this up again at its April meeting on April 19 (and possibly voting on it).

If you might be interested in serving on the committee, you are encouraged to listen to the discussion from last week's meeting to get an idea of what it might be like (starting at 11 mins, 18 sec): https://www.youtube.com/watch?v=k2-DX8UNqY0&t=11m18s

(And if you know someone who might be good to serve on such a committee, please forward them this info!)

FairVote California house party (recap)

In early March, Chris spoke about San Francisco's open source voting project at a house party organized by FairVote California. Thank you to FairVote California for having me!

If anyone would like me to speak to a group of people about the project, just shoot me an e-mail. It would only require 5 minutes or so of a group's time.

GET Summit (upcoming: May 17-18)

On May 17-18 in San Francisco, there will be a conference on open source and election technology issues called the GET Summit: https://www.getsummit.org

The conference is being organized by Startup Policy Lab (SPL) and has an amazing line-up of many speakers, including people like California Secretary of State Alex Padilla and former Federal Elections Commissioner (FEC) Ann Ravel. In September 2016, Alex Padilla said on television that "open source [voting] is the ultimate in transparency and accountability for all." And Ann Ravel made headlines last month with her very public resignation from the FEC. See the conference website for the latest speaker list.

So that's all folks! Please follow @SFOpenVoting on Twitter if you haven't yet, and thank you for all your continued interest and support!

We thank Chris Jerdonek for his work in raising awareness of San Francisco's efforts to develop an open source voting system and sharing these updates with us here at the OSI and the larger open source software community.

Categories: FLOSS Research

d7One: 50 ways to slide your images

Planet Drupal - Thu, 2017-03-23 14:34

Recently, I had to create a slideshow for a project. Nothing unusual about that you say. Indeed, everywhere you look you see slideshows. If there's one thing that's common to 99% of all websites today it's a slideshow. Almost boring. That is until you actually start to implement one. There must be 50 ways to slide you images - and I don't mean the transition effects. Picking and choosing the right module and library is almost a burden.

There are various scenarios where you would want to display a slideshow. The most common is the home page. I've used the venerable Views Slideshow module (2007) in the past for this purpose. It's simple enough to implement and is available for D7, D8 as well as Backdrop.

Categories: FLOSS Project Planets

Simon McVittie: GTK hackfest 2017: D-Bus communication with containers

Planet Debian - Thu, 2017-03-23 14:07

At the GTK hackfest in London (which accidentally became mostly a Flatpak hackfest) I've mainly been looking into how to make D-Bus work better for app container technologies like Flatpak and Snap.

The initial motivating use cases are:

  • Portals: Portal authors need to be able to identify whether the container is being contacted by an uncontained process (running with the user's full privileges), or whether it is being contacted by a contained process (in a container created by Flatpak or Snap).

  • dconf: Currently, a contained app either has full read/write access to dconf, or no access. It should have read/write access to its own subtree of dconf configuration space, and no access to the rest.

At the moment, Flatpak runs a D-Bus proxy for each app instance that has access to D-Bus, connects to the appropriate bus on the app's behalf, and passes messages through. That proxy is in a container similar to the actual app instance, but not actually the same container; it is trusted to not pass messages through that it shouldn't pass through. The app-identification mechanism works in practice, but is Flatpak-specific, and has a known race condition due to process ID reuse and limitations in the metadata that the Linux kernel maintains for AF_UNIX sockets. In practice the use of X11 rather than Wayland in current systems is a much larger loophole in the container than this race condition, but we want to do better in future.

Meanwhile, Snap does its sandboxing with AppArmor, on kernels where it is enabled both at compile-time (Ubuntu, openSUSE, Debian, Debian derivatives like Tails) and at runtime (Ubuntu, openSUSE and Tails, but not Debian by default). Ubuntu's kernel has extra AppArmor features that haven't yet gone upstream, some of which provide reliable app identification via LSM labels, which dbus-daemon can learn by querying its AF_UNIX socket. However, other kernels like the ones in openSUSE and Debian don't have those. The access-control (AppArmor mediation) is implemented in upstream dbus-daemon, but again doesn't work portably, and is not sufficiently fine-grained or flexible to do some of the things we'll likely want to do, particularly in dconf.

After a lot of discussion with dconf maintainer Allison Lortie and Flatpak maintainer Alexander Larsson, I think I have a plan for fixing this.

This is all subject to change: see fd.o #100344 for the latest ideas.

Identity model

Each user (uid) has some uncontained processes, plus 0 or more containers.

The uncontained processes include dbus-daemon itself, desktop environment components such as gnome-session and gnome-shell, the container managers like Flatpak and Snap, and so on. They have the user's full privileges, and in particular they are allowed to do privileged things on the user's session bus (like running dbus-monitor), and act with the user's full privileges on the system bus. In generic information security jargon, they are the trusted computing base; in AppArmor jargon, they are unconfined.

The containers are Flatpak apps, or Snap apps, or other app-container technologies like Firejail and AppImage (if they adopt this mechanism, which I hope they will), or even a mixture (different app-container technologies can coexist on a single system). They are containers (or container instances) and not "apps", because in principle, you could install com.example.MyApp 1.0, run it, and while it's still running, upgrade to com.example.MyApp 2.0 and run that; you'd have two containers for the same app, perhaps with different permissions.

Each container has an container type, which is a reversed DNS name like org.flatpak or io.snapcraft representing the container technology, and an app identifier, an arbitrary non-empty string whose meaning is defined by the container technology. For Flatpak, that string would be another reversed DNS name like com.example.MyGreatApp; for Snap, as far as I can tell it would look like example-my-great-app.

The container technology can also put arbitrary metadata on the D-Bus representation of a container, again defined and namespaced by the container technology. For instance, Flatpak would use some serialization of the same fields that go in the Flatpak metadata file at the moment.

Finally, the container has an opaque container identifier identifying a particular container instance. For example, launching com.example.MyApp twice (maybe different versions or with different command-line options to flatpak run) might result in two containers with different privileges, so they need to have different container identifiers.

Contained server sockets

App-container managers like Flatpak and Snap would create an AF_UNIX socket inside the container, bind() it to an address that will be made available to the contained processes, and listen(), but not accept() any new connections. Instead, they would fd-pass the new socket to the dbus-daemon by calling a new method, and the dbus-daemon would proceed to accept() connections after the app-container manager has signalled that it has called both bind() and listen(). (See fd.o #100344 for full details.)

Processes inside the container must not be allowed to contact the AF_UNIX socket used by the wider, uncontained system - if they could, the dbus-daemon wouldn't be able to distinguish between them and uncontained processes and we'd be back where we started. Instead, they should have the new socket bind-mounted into their container's XDG_RUNTIME_DIR and connect to that, or have the new socket set as their DBUS_SESSION_BUS_ADDRESS and be prevented from connecting to the uncontained socket in some other way. Those familiar with the kdbus proposals a while ago might recognise this as being quite similar to kdbus' concept of endpoints, and I'm considering reusing that name.

Along with the socket, the container manager would pass in the container's identity and metadata, and the method would return a unique, opaque identifier for this particular container instance. The basic fields (container technology, technology-specific app ID, container ID) should probably be added to the result of GetConnectionCredentials(), and there should be a new API call to get all of those plus the arbitrary technology-specific metadata.

When a process from a container connects to the contained server socket, every message that it sends should also have the container instance ID in a new header field. This is OK even though dbus-daemon does not (in general) forbid sender-specified future header fields, because any dbus-daemon that supported this new feature would guarantee to set that header field correctly, the existing Flatpak D-Bus proxy already filters out unknown header fields, and adding this header field is only ever a reduction in privilege.

The reasoning for using the sender's container instance ID (as opposed to the sender's unique name) is for services like dconf to be able to treat multiple unique bus names as belonging to the same equivalence class of contained processes: instead of having to look up the container metadata once per unique name, dconf can look it up once per container instance the first time it sees a new identifier in a header field. For the second and subsequent unique names in the container, dconf can know that the container metadata and permissions are identical to the one it already saw.

Access control

In principle, we could have the new identification feature without adding any new access control, by keeping Flatpak's proxies. However, in the short term that would mean we'd be adding new API to set up a socket for a container without any access control, and having to keep the proxies anyway, which doesn't seem great; in the longer term, I think we'd find ourselves adding a second new API to set up a socket for a container with new access control. So we might as well bite the bullet and go for the version with access control immediately.

In principle, we could also avoid the need for new access control by ensuring that each service that will serve contained clients does its own. However, that makes it really hard to send broadcasts and not have them unintentionally leak information to contained clients - we would need to do something more like kdbus' approach to multicast, where services know who has subscribed to their multicast signals, and that is just not how dbus-daemon works at the moment. If we're going to have access control for broadcasts, it might as well also cover unicast.

The plan is that messages from containers to the outside world will be mediated by a new access control mechanism, in parallel with dbus-daemon's current support for firewall-style rules in the XML bus configuration, AppArmor mediation, and SELinux mediation. A message would only be allowed through if the XML configuration, the new container access control mechanism, and the LSM (if any) all agree it should be allowed.

By default, processes in a container can send broadcast signals, and send method calls and unicast signals to other processes in the same container. They can also receive method calls from outside the container (so that interfaces like org.freedesktop.Application can work), and send exactly one reply to each of those method calls. They cannot own bus names, communicate with other containers, or send file descriptors (which reduces the scope for denial of service).

Obviously, that's not going to be enough for a lot of contained apps, so we need a way to add more access. I'm intending this to be purely additive (start by denying everything except what is always allowed, then add new rules), not a mixture of adding and removing access like the current XML policy language.

There are two ways we've identified for rules to be added:

  • The container manager can pass a list of rules into the dbus-daemon at the time it attaches the contained server socket, and they'll be allowed. The obvious example is that an org.freedesktop.Application needs to be allowed to own its own bus name. Flatpak apps' implicit permission to talk to portals, and Flatpak metadata like org.gnome.SessionManager=talk, could also be added this way.

  • System or session services that are specifically designed to be used by untrusted clients, like the version of dconf that Allison is working on, could opt-in to having contained apps allowed to talk to them (effectively making them a generalization of Flatpak portals). The simplest such request, for something like a portal, is "allow connections from any container to contact this service"; but for dconf, we want to go a bit finer-grained, with all containers allowed to contact a single well-known rendezvous object path, and each container allowed to contact an additional object path subtree that is allocated by dconf on-demand for that app.

Initially, many contained apps would work in the first way (and in particular sockets=session-bus would add a rule that allows almost everything), while over time we'll probably want to head towards recommending more use of the second.

Related topics Access control on the system bus

We talked about the possibility of using a very similar ruleset to control access to the system bus, as an alternative to the XML rules found in /etc/dbus-1/system.d and /usr/share/dbus-1/system.d. We didn't really come to a conclusion here.

Allison had the useful insight that the XML rules are acting like a firewall: they're something that is placed in front of potentially-broken services, and not part of the services themselves (which, as with firewalls like ufw, makes it seem rather odd when the services themselves install rules). D-Bus system services already have total control over what requests they will accept from D-Bus peers, and if they rely on the XML rules to mediate that access, they're essentially rejecting that responsibility and hoping the dbus-daemon will protect them. The D-Bus maintainers would much prefer it if system services took responsibility for their own access control (with or without using polkit), because fundamentally the system service is always going to understand its domain and its intended security model better than the dbus-daemon can.

Analogously, when a network service listens on all addresses and accepts requests from elsewhere on the LAN, we sometimes work around that by protecting it with a firewall, but the optimal resolution is to get that network service fixed to do proper authentication and access control instead.

For system services, we continue to recommend essentially this "firewall" configuration, filling in the ${} variables as appropriate:

<busconfig> <policy user="${the daemon uid under which the service runs}"> <allow own="${the service's bus name}"/> </policy> <policy context="default"> <allow send_destination="${the service's bus name}"/> </policy> </busconfig>

We discussed the possibility of moving towards a model where the daemon uid to be allowed is written in the .service file, together with an opt-in to "modern D-Bus access control" that makes the "firewall" unnecessary; after some flag day when all significant system services follow that pattern, dbus-daemon would even have the option of no longer applying the "firewall" (moving to an allow-by-default model) and just refusing to activate system services that have not opted in to being safe to use without it. However, the "firewall" also protects system bus clients, and services like Avahi that are not bus-activatable, against unintended access, which is harder to solve via that approach; so this is going to take more thought.

For system services' clients that follow the "agent" pattern (BlueZ, polkit, NetworkManager, Geoclue), the correct "firewall" configuration is more complicated. At some point I'll try to write up a best-practice for these.

New header fields for the system bus

At the moment, it's harder than it needs to be to provide non-trivial access control on the system bus, because on receiving a method call, a service has to remember what was in the method call, then call GetConnectionCredentials() to find out who sent it, then only process the actual request when it has the information necessary to do access control.

Allison and I had hoped to resolve this by adding new D-Bus message header fields with the user ID, the LSM label, and other interesting facts for access control. These could be "opt-in" to avoid increasing message sizes for no reason: in particular, it is not typically useful for session services to receive the user ID, because only one user ID is allowed to connect to the session bus anyway.

Unfortunately, the dbus-daemon currently lets unknown fields through without modification. With hindsight this seems an unwise design choice, because header fields are a finite resource (there are 255 possible header fields) and are defined by the D-Bus Specification. The only field that can currently be trusted is the sender's unique name, because the dbus-daemon sets that field, overwriting the value in the original message (if any).

To make it safe to rely on the new fields, we would have to make the dbus-daemon filter out all unknown header fields, and introduce a mechanism for the service to check (during connection to the bus) whether the dbus-daemon is sufficiently new that it does so. If connected to an older dbus-daemon, the service would not be able to rely on the new fields being true, so it would have to ignore the new fields and treat them as unset. The specification is sufficiently vague that making new dbus-daemons filter out unknown header fields is a valid change (it just says that "Header fields with an unknown or unexpected field code must be ignored", without specifying who must ignore them, so having the dbus-daemon delete those fields seems spec-compliant).

This all seemed fine when we discussed it in person; but GDBus already has accessors for arbitrary header fields by numeric ID, and I'm concerned that this might mean it's too easy for a system service to be accidentally insecure: It would be natural (but wrong!) for an implementor to assume that if g_message_get_header (message, G_DBUS_MESSAGE_HEADER_FIELD_SENDER_UID) returned non-NULL, then that was guaranteed to be the correct, valid sender uid. As a result, fd.o #100317 might have to be abandoned. I think more thought is needed on that one.

Unrelated topics

As happens at any good meeting, we took the opportunity of high-bandwidth discussion to cover many useful things and several useless ones. Other discussions that I got into during the hackfest included, in no particular order:

  • .desktop file categories and how to adapt them for AppStream, perhaps involving using the .desktop vocabulary but relaxing some of the hierarchy restrictions so they behave more like "tags"
  • how to build a recommended/reference "app store" around Flatpak, aiming to host upstream-supported builds of major projects like LibreOffice
  • how Endless do their content-presenting and content-consuming apps in GTK, with a lot of "tile"-based UIs with automatic resizing and reflowing (similar to responsive design), and the applicability of similar widgets to GNOME and upstream GTK
  • whether and how to switch GNOME developer documentation to Hotdoc
  • whether pies, fish and chips or scotch eggs were the most British lunch available from Borough Market
  • the distinction between stout, mild and porter

More notes are available from the GNOME wiki.

Acknowledgements

The GTK hackfest was organised by GNOME and hosted by Red Hat and Endless. My attendance was sponsored by Collabora. Thanks to all the sponsors and organisers, and the developers and organisations who attended.

Categories: FLOSS Project Planets
Syndicate content