Planet Apache

Syndicate content
Updated: 59 min 12 sec ago

Justin Mason: Links for 2017-04-24

Mon, 2017-04-24 19:58
  • sold your data to Uber

    ‘Uber devoted teams to so-called competitive intelligence, purchasing data from Slice Intelligence, which collected customers’ emailed Lyft receipts via and sold the data to Uber’. Also: ‘ allegedly “kept a copy of every single email that you sent or received” in “poorly secured S3 buckets”‘: CEO: ‘felt bad “to see that some of our users were upset to learn about how we monetise our free service”.’

    (tags: uber gmail google privacy data-protection lyft scumbags slice-intelligence)

  • Capturing all the flags in BSidesSF CTF by pwning Kubernetes/Google Cloud

    good exploration of the issues with running a CTF challenge (or any other secure infrastructure!) atop Kubernetes and a cloud platform like GCE

    (tags: gce google-cloud kubernetes security docker containers gke ctf hacking exploits)

  • How To Add A Security Key To Your Gmail (Tech Solidarity)

    Excellent how-to guide for Yubikey usage on gmail

    (tags: gmail yubikey security authentication google)

  • Ethics – Lyrebird

    ‘Lyrebird is the first company to offer a technology to reproduce the voice of someone as accurately and with as little recorded audio. [..] Voice recordings are currently considered as strong pieces of evidence in our societies and in particular in jurisdictions of many countries. Our technology questions the validity of such evidence as it allows to easily manipulate audio recordings. This could potentially have dangerous consequences such as misleading diplomats, fraud and more generally any other problem caused by stealing the identity of someone else. By releasing our technology publicly and making it available to anyone, we want to ensure that there will be no such risks. We hope that everyone will soon be aware that such technology exists and that copying the voice of someone else is possible. More generally, we want to raise attention about the lack of evidence that audio recordings may represent in the near future.’

    (tags: lyrebird audio technology scary ethics)

Categories: FLOSS Project Planets

Bruce Snyder: Annual Spinal Cord Injury Re-evaluation

Mon, 2017-04-24 17:04
Recently I went back to Craig Hospital for an annual spinal cord injury re-evaluation and the results were very positive. It was really nice to see some familiar faces of the people for whom I have such deep admiration like my doctors, physical therapists and administrative staff. My doctor and therapists were quite surprised to see how well I am doing, especially given that I'm still seeing improvements three years later. Mainly because so many spinal cord injury patients have serious issues even years later. I am so lucky to no longer be taking any medications and to be walking again.
It has also been nearly one year since I have been back to Craig Hospital and it seems like such a different place to me now. Being back there again feels odd for a couple of reasons. First, due to the extensive construction/remodel, the amount of change to the hospital makes it seem like a different place entirely. It used to be much smaller which encouraged more close interaction between patients and staff. Now the place is so big (i.e., big hallways, larger individual rooms, etc.) that patients can have more privacy if they want or even avoid some forms of interaction. Second, although I am comfortable being around so many folks who have been so severely injured (not everyone is), I have noticed that some folks are confused by me. I can tell the way they look at me that they are wondering what I am doing there because, outwardly, I do not appear as someone who has experienced a spinal cord injury. I have been lucky enough to make it out of the wheelchair and to walk on my own. Though my feet are still paralyzed, I wear flexible, carbon fiber AFO braces on my legs and walk with one arm crutch, the braces are covered by my pants so it's puzzling to many people.
The folks who I wish I could see more are the nurses and techs. These are the folks who helped me the most when I was so vulnerable and confused and to whom I grew very attached. To understand just how attached I was, simply moving to a more independent room as I was getting better was upsetting to me because I was so emotionally attached to them. I learned that these people are cut from a unique cloth and possess very big hearts to do the work they do every day. Because they are so involved with the acute care of in-patients, they are very busy during the day and not available for much socializing as past patients come through. Luckily, there was one of my nurses who I ran into and was able to spend some time speaking with him. I really enjoyed catching up with him and hearing about new adventures in his career. He was one of the folks I was attached to at the time and he really made a difference in my experience. I will be eternally thankful for having met these wonderful people during such a traumatic time in my life.
Today I am walking nearly 100% of the time with the leg braces and have been for over two years. I am working to rebuild my calves and my glutes, but this is a very, very long and slow process due to severe muscle atrophy after not being able to move my glutes for five months and my calves for two years. Although my feet are not responding yet, we will see what the future holds. I still feel so very lucky to be alive and continuing to make progress.
Although I cannot run at all or cycle the way I did previously, I am very thankful to be able to work out as much as I can. I am now riding the stationary bike regularly, using my Total Gym (yes, I have a Chuck Norris Total Gym) to build my calves, using a Bosu to work on balance and strength in my lower body, doing ab roller workouts and walking as much as I can both indoors on a treadmill and outside. I'd like to make time for swimming laps again, but all of this can be time consuming (and tiring!). I am not nearly as fit as I was at the time of my injury, but I continue to work hard and to see noticeable improvements for which I am truly thankful.
Thank you to everyone who continues to stay in touch and check in on me from time-to-time. You may not think it's much to send a quick message, but these messages have meant a lot to me through this process. The support from family and friends has been what has truly kept me going. The patience displayed by Bailey, Jade and Janene is pretty amazing.
Later this month, I will mark the three year anniversary of my injury. It seems so far away and yet it continues to affect my life every day. My life will never be the same but I do believe I have found peace with this entire ordeal.
Categories: FLOSS Project Planets

FeatherCast: Jean-Frédéric Clere, Barcamp Apache and Apachecon North America

Mon, 2017-04-24 09:45

An important event at Apachecon North America will be the #BarCampApache.  It is a free event where the content is organised by the attendees. Here we talk to Jean- Frederic Clere about the barcamp and his role as facilitator.

Register for Apachecon at

Categories: FLOSS Project Planets

Shawn McKinney: Advice for the graduating computer science student (on finding their first professional job)

Sun, 2017-04-23 09:39

I’ve been mentoring a graduating senior in computer science.  Here’s what I told him…

First, read the 10 steps to becoming the developer everyone wants.  It contains some pretty good strategies for what to do after you’ve landed that first job, how to become indispensable.

But even if you get that first job straight away, it’s never too early to start building a public reputation.  If you’re not already a member, join the social media outlets like linkedin, twitter, and the like.  Where you can collaborate over concepts and ideas. Linkedin has some pretty good groups to join.  Once you become fluent in a topic, you can start your own group.  For example, here’s one that I manage: Linkedin Open Source IAM group.

Even more important, open a github account and start publishing work there.  Read about The impact github is having on your software career.

Also be sure to join tech groups in your hometown.  These will put you in the same room with like-minded professionals.  Here’s one that I’ve recently joined: Little Rock JUG.

Then publish articles about topics that interest you.  If they interest you they will likely interest others.  Write about the research that you have completed.  Yes, the nitty-gritty details.  People love technical details when well thought out.  Retweet articles (written by others) that you like or agree with. Follow people that have work that you admire rather than for personal (friendship) reasons.  If you see something you like, let the other person know, ask questions about it.  If you see something you disagree with, offer constructive criticisms.  Above all be respectful and positive in your communications with others.  This is healthy collaboration in action and will be an important part of your technical career, as it blossoms.

Forget about being the genius capable of writing superb software all by yourself.  That genius is a unicorn, at most 1% of the population.  If that’s you (and I don’t think that it is) congratulations and carry on!  You won’t need any more of my advice.  Otherwise, if like 99% of the population (the rest of us), you absorb knowledge by working around others. Surround yourself with the smartest people you can find.  Be humble.  Admit that you don’t understand how it works yet.  Keep your mouth (mostly) shut until you’ve learned from the people who came before, the current experts.  They will respect you for that and will encourage your ideas as they become viable.  Later, once you’ve mastered the basics, you may tell them how to improve, and they will listen.

Eventually, perhaps after many years (less if you are lucky), you’ll have earned a good public reputation, and with it, a large number of loyal followers.  These people will then help you communicate about software projects that you’re interested in.  The latest releases of your software, conferences that you’re speaking at, articles that you’ve written, etc…

Afterwards you need not worry about finding a job again.  They will find you.  A public reputation supersedes any single organizational boundary and gives you complete control over your career’s path.

Categories: FLOSS Project Planets

Justin Mason: Links for 2017-04-22

Sat, 2017-04-22 19:58
Categories: FLOSS Project Planets

Bryan Pendleton: Borussia

Sat, 2017-04-22 19:19


So it turns out that the horrible bombing attack on the Borussia Dortmund football team was in fact NOT Islamic terrorists at all.

Rather, it was something much more banal: One man’s greed behind Dortmund attack, after all

Not many -if any- had seen this coming… Something more than a week after the triple bomb attack that targeted Borussia Dortmund and led to their Champions League game against Monaco being delayed by 24 hours, police have announced that the motive behind the whole incident was pure financial greed. The accused bought 15,000 put-options regarding the shares of Borussia Dortmund on April 11. Those options were running until June 17, 2017 and were bought with the ID of the hotel L’Arrivee (Dortmund’s team hotel) a prosecutor made known, through a written statement, after the police arrested a 28-year-old man

The Beeb has (a bit) more: Borussia Dortmund bombs: 'Speculator' charged with bus attack

Rather than having links to radical Islamism, he was a market trader hoping to make money if the price of shares in the team fell, prosecutors say.

The suspect has been charged with attempted murder, triggering explosions and causing serious physical injury.

He has been identified only as Sergej W, and was staying in the team's hotel overlooking the scene of the attack.

There was, I should think, more than just greed involved, as clearly the man was quite mentally ill:

He was staying at the team's L'Arrivée hotel in Dortmund on the day of the attack and had moved to a room on the top floor, overlooking the street where it took place, prosecutors say.

The suspect placed the bet on 11 April using an IP address traced to the hotel, after taking out a loan for the money.

That's somewhere bordering on stalker-level obsession, I'd say.

Very sad.

But I'm glad the German police were level-headed and careful and thorough and dug down to the underlying facts of the matter.

And SHAME on all those trashy publications that threw horrid terror speculations out there.

Yes, I'm looking at you, The Sun, and The Express, and The NY Post, and Fox News and The Star, and ...

You know who you were. Shame on you all.

Categories: FLOSS Project Planets

Bryan Pendleton: Three Junes: a very short review.

Sat, 2017-04-22 19:05

Julia Glass's Three Junes tells the story of an (extended) Scottish family across multiple generations, mostly set during the later decades of the 20 century.

It is beautifully written and quite emotional at times.

Categories: FLOSS Project Planets

Justin Mason: Links for 2017-04-21

Fri, 2017-04-21 19:58
Categories: FLOSS Project Planets

Colm O hEigeartaigh: Securing Apache Hadoop Distributed File System (HDFS) - part III

Fri, 2017-04-21 05:54
This is the third in a series of posts on securing HDFS. The first post described how to install Apache Hadoop, and how to use POSIX permissions and ACLs to restrict access to data stored in HDFS. The second post looked at how to use Apache Ranger to authorize access to data stored in HDFS. In this post we will look at how Apache Ranger can create "tag" based authorization policies for HDFS using Apache Atlas. For information on how to create tag-based authorization policies for Apache Kafka, see a post I wrote earlier this year.

The Apache Ranger admin console allows you to create security policies for HDFS by associating a user/group with some permissions (read/write/execute) and a resource, such as a directory or file. This is called a "Resource based policy" in Apache Ranger. An alternative is to use a "Tag based policy", which instead associates the user/group + permissions with a "tag". You can create and manage tags in Apache Atlas, and Apache Ranger supports the ability to imports tags from Apache Atlas via a tagsync service, something we will cover in this post.

1) Start Apache Atlas and create entities/tags for HDFS

First let's look at setting up Apache Atlas. Download the latest released version (0.8-incubating) and extract it. Build the distribution that contains an embedded HBase and Solr instance via:
  • mvn clean package -Pdist,embedded-hbase-solr -DskipTests
The distribution will then be available in 'distro/target/apache-atlas-0.8-incubating-bin'. To launch Atlas, we need to set some variables to tell it to use the local HBase and Solr instances:
  • export MANAGE_LOCAL_HBASE=true
  • export MANAGE_LOCAL_SOLR=true
Now let's start Apache Atlas with 'bin/'. Open a browser and go to 'http://localhost:21000/', logging on with credentials 'admin/admin'. Click on "TAGS" and create a new tag called "Data".  Click on "Search" and the "Create new entity" link. Select an entity type of "hdfs_path" with the following values:
  • QualifiedName: data@cl1
  • Name: Data
  • Path: /data
Once the new entity has been created, then click on "+" beside "Tags" and associate the new entity with the "Data" tag.

2) Use the Apache Ranger TagSync service to import tags from Atlas into Ranger

To create tag based policies in Apache Ranger, we have to import the entity + tag we have created in Apache Atlas into Ranger via the Ranger TagSync service. First, start the Apache Ranger admin service and rename the HDFS service we created in the previous tutorial from "HDFSTest" to "cl1_hadoop". This is because the Tagsync service will sync tags into the Ranger service that corresponds to the suffix of the qualified name of the tag with "_hadoop". Also edit 'etc/hadoop/ranger-hdfs-security.xml' in your Hadoop distribution and change the "" to "cl1_hadoop". Also change the "ranger.plugin.hdfs.policy.cache.dir" along the same lines. Finally, make sure the directory '/etc/ranger/cl1_hadoop/policycache' exists and the user you are running Hadoop as can write and read from this directory.

After building Apache Ranger then extract the file called "target/ranger-<version>-tagsync.tar.gz". Edit '' as follows:
  • Set TAG_SOURCE_ATLASREST_DOWNLOAD_INTERVAL_IN_MILLIS to "60000" (just for testing purposes)
Save '' and install the tagsync service via "sudo ./". It can now be started via "sudo start".

3) Create Tag-based authorization policies in Apache Ranger

Now let's create a tag-based authorization policy in the Apache Ranger admin UI. Click on "Access Manager" and then "Tag based policies". Create a new Tag service called "HDFSTagService". Create a new policy for this service called "DataPolicy". In the "TAG" field enter a capital "D" and the "Data" tag should pop up, meaning that it was successfully synced in from Apache Atlas. Create an "Allow" condition for the user "bob" with component permission of "HDFS" and "read" and "execute":

The last thing we need to do is to go back to the Resource based policies and edit "cl1_hadoop" and select the tag service we have created above.

4) Testing authorization in HDFS using our tag based policy

Wait until the Ranger authorization plugin syncs the new authorization policies from the Ranger Admin service and then we can test authorization. In the previous tutorial we showed that the file owner and user "alice" can read the data stored in '/data', but "bob" could not. Now we should be able to successfully read the data as "bob" due to the tag based authorization policy we have created:
  • sudo -u bob bin/hadoop fs -cat /data/LICENSE.txt
Categories: FLOSS Project Planets

Justin Mason: Links for 2017-04-20

Thu, 2017-04-20 19:58
  • Amazon DynamoDB Accelerator (DAX)

    Amazon DynamoDB Accelerator (DAX) is a fully managed, highly available, in-memory cache for DynamoDB that delivers up to a 10x performance improvement – from milliseconds to microseconds – even at millions of requests per second. DAX does all the heavy lifting required to add in-memory acceleration to your DynamoDB tables, without requiring developers to manage cache invalidation, data population, or cluster management. No latency percentile figures, unfortunately. Also still in preview.

    (tags: amazon dynamodb aws dax performance storage databases latency low-latency)

  • I Just Love This Juicero Story So Much

    When we signed up to pump money into this juice company, it was because we thought drinking the juice would be a lot harder and more expensive. That was the selling point, because Silicon Valley is a stupid libertarian dystopia where investor-class vampires are the consumers and a regular person’s money is what they go shopping for. Easily opened bags of juice do not give these awful nightmare trash parasites a good bargain on the disposable income of credulous wellness-fad suckers; therefore easily opened bags of juice are a worse investment than bags of juice that are harder to open.

    (tags: juicero juicebros techbros silicon-valley funny dystopia fruit bags juice)

  • Zeynep Tufekci: Machine intelligence makes human morals more important | TED Talk |

    Machine intelligence is here, and we’re already using it to make subjective decisions. But the complex way AI grows and improves makes it hard to understand and even harder to control. In this cautionary talk, techno-sociologist Zeynep Tufekci explains how intelligent machines can fail in ways that don’t fit human error patterns — and in ways we won’t expect or be prepared for. “We cannot outsource our responsibilities to machines,” she says. “We must hold on ever tighter to human values and human ethics.” More relevant now that nVidia are trialing ML-based self-driving cars in the US…

    (tags: nvidia ai ml machine-learning scary zeynep-tufekci via:maciej technology ted-talks)

  • ‘Mathwashing,’ Facebook and the zeitgeist of data worship

    Fred Benenson: Mathwashing can be thought of using math terms (algorithm, model, etc.) to paper over a more subjective reality. For example, a lot of people believed Facebook was using an unbiased algorithm to determine its trending topics, even if Facebook had previously admitted that humans were involved in the process.

    (tags: maths math mathwashing data big-data algorithms machine-learning bias facebook fred-benenson)

  • Build a Better Monster: Morality, Machine Learning, and Mass Surveillance

    We built the commercial internet by mastering techniques of persuasion and surveillance that we’ve extended to billions of people, including essentially the entire population of the Western democracies. But admitting that this tool of social control might be conducive to authoritarianism is not something we’re ready to face. After all, we’re good people. We like freedom. How could we have built tools that subvert it? As Upton Sinclair said, “It is difficult to get a man to understand something, when his salary depends on his not understanding it.” I contend that there are structural reasons to worry about the role of the tech industry in American political life, and that we have only a brief window of time in which to fix this.

    (tags: advertising facebook google internet politics surveillance democracy maciej-ceglowski talks morality machine-learning)

Categories: FLOSS Project Planets

Shawn McKinney: Secure Web Apps with JavaEE and Apache Fortress

Thu, 2017-04-20 16:18

ApacheCon is just a couple months away — coming up May 16-18 in Miami. We asked Shawn McKinney, Software Architect at Symas Corporation,  to share some details about his talk at ApacheCon. His presentation — “The Anatomy of a Secure Web Application Using Java EE, Spring Security, and Apache Fortress” will focus on an end-to-end application security architecture for an Apache Wicket Web app running in Tomcat. McKinney explains more in this interview.

Source: Secure Web Apps with JavaEE and Apache Fortress

Categories: FLOSS Project Planets

Steve Loughran: The interruption economy

Thu, 2017-04-20 14:19
With the untimely death of a laptop in Boston in February, I've rebuilt two laptops recently.

The first: a replacement for the dead one: a development macbook pro wired up to the various bits of work infra: MS office, VPN,  even hipchat. The second, a formerly dead 2009 macbook brought back to life with a 256GB SSD and a boost of its RAM to 8GB (!).

Doing this has brought home to be a harsh truth

The majority of applications you install on an OSX laptop consider it not just a right, but a duty, to interrupt you while you are trying to work.

It's not just the things where someone actually want's to talk  to (e.g. skype), it's pretty much everything you can install

For example, iTunes wants to be able to interrupt me, including playing sounds. It's a music player application, and it also wants to make beeping noises? Same for spotify. Why should background music apps or foreground media playback apps think they need to be able to interrupt you when they are running in the background?

Dropbox. I didn't realise this was doing notifications until it suddenly popped up to tell me the good news that it was keeping itself up to date automatically.

Keeping your installation up to date is something we should expect all applications to do. It should not be so important that you should pop up a dialog box "good news, you are only at risk from 0-day exploits we haven't found or patched yet!". Once I was aware that dropbox was happy to interrupt me, I went to its settings, only to discover that it also wants to interrupt me on "comments, share's and @mentions", and on synced files.

I hadn't noticed that a tool I used to sync files across machines had evolved into a groupware app where people could @mention me, but clearly it has, and in teams, interruptions whenever someone comments on things is clearly considered good. It also wants to interrupt me on files syncing. Think about that. We have an application whose primary purpose is "synchronising files across machines", and suddenly it wants to start popping up notifications when it is doing its job? What else should we have? Note taking applications sharing the good news that they haven't crashed yet?

Maybe, because amongst the apps which also consider interruption and inalienable right are: OneNote and macOS notes app. I have no idea what they want to interrupt me about: Notes doesn't specify what it wants to alert me about, only that it wants to notify me on locked screens and make a noise. OneNote? Lets you spec which notebooks can trigger interrupts, but again, the why is missing.

The list goes on. My password manager, text editor, IDE. Everything I install defaults to interrupting me.

Yes, you can turn the features off, but on a newly installed machine, that means that you have to go through every single app and disable every single interruption point. Miss out some small detail and while you are trying to get some work done, something pops up to say "lucky you! Something has happened which Photos thinks it is so important you should stop what you are doing and use it instead!". when you are building up two laptops, it means there's about 20+ times I've had to bring up the notifications preference pane, scroll down to whichever app last interrupted me, turn off all its notifications, then continue until something else chooses to break my concentration.

The web browsers want to let web pages interrupt you too.

Firefox you can't disable it, at least not without delving into about:config.

You can block it in the OS notifications settings, which implies it is at least integrated with the OS and the system-wide do-not-disturb feature.

Chrome: you can manage it in the browser —even though google don't want you to stop it, but it doesn't appear to  integrated with the OS;

With the OS integration, OSX's do-not-disturb feature won't work. will work here, so if you do let Chrome notify you, webapps gain the right to interrupt you during presentations, watching media content, etc.

Safari? Permitted, but OS controlled, completely blockable. This doesn't mean that webapps shouldn't be able to interrupt you: google calendar is a good example, it's just the easier we make it to do this, the more sites will want to.

The OS isn't even consistent itself. There is no way to tell time machine to not annoy you with the fact that it hasn't updated for 11 days. It's not part of the notification system, even though it came from the same building. What kind of example is that to set for others?

Because the default behaviour of every application is to interrupt, I have to go through every single installed app to disable it else my life is a constant noise of popups stating irrelevant facts. You may not notice that as you install one application at a time, turning off the settings individually, but when you build up a new box, the arrogance of all these applications becomes obvious, as it takes some time to actually stop your attention being attacked by the software you install.

Getting users to look at your app, your web site, is roped in as "The attention economy". That certainly applies to things like twitter, facebook, snapchat, etc. But how does translate into dropbox trying to get my attention to tell me that it's keeping itself up to date? Or whatever itunes or photos wants to interrupt me on? Why does OneNote need to tell me something about a saved workbook? This isn't "the attention economy". This is "interruption economy": people terrified that users may not be making full use of their features, so trying to keep popping up to encourage you to use the app or whatever new feature they've just installed

Interrupting people while they are trying to work is not a good use of the life of people whose work depends on "getting things done without interruptions". As my colleagues should know, though some of them forget, I don't run with hipchat on precisely because I hate getting popups "hey Steve, can i just ask..." , where the ask is something that I'd google for the answer myself, so why somebody asks me to google for them, I don't know. But even with the workflow interrupts off, things keep trying to stop me getting anything done

Then there's the apps which interrupt without any warning at all. I got caught out at this at Dataworks summit, where halfway through a presentation GPGMail popped up telling me there was a new version. This was a presentation where I'd explicitly set "do not disturb" on and war running full screen, but GPG mail checks weren't using it. Lesson: turn off the wifi as well as setting everything to do-not-disturb/offline.

Those update prompts, they are important. But everything keeps going "update me! now!" they end up being an irritant to ignore, just like the way the "service now!" alert pops up our car when we use it. It's just another low-level hint, not something which matters like "low pressure in tyres".

What it does really highlight is that having an applications keep itself up to date with security patches is still considered, on OSX, to be something worth interrupting the user to let them know about. All I can say it's a good thing that Linux apps don't feel the same way, or apt-get upgrade would be unbearable.

Finally, there's the OS
  • It'd be good if the OS recognised when a full screen media/presentation app was underway and automatically went into silent mode at that point.
  • All the OS's own notifications "upgrade available", "no time machine backups" should be integrated with the same notification mechanisms for app viewers. That's to help the users, but also set an example for all others.

What to to really do about it?

I'd really like to be able to tell the OS that the default settings for any newly installed app is "no notifications". Maybe now I've built up the laptops I won't have to go through the torment of disabling it across many apps, so it'll just be that case by case irritant. Even so, there's still the pain of being reminded of update options even

What I can do though, is promise not to personally write applications which interrupt people by default.

Here then, is my pledge:
  1. I pledge to give my users the opportunity to live a life free of interruptions, at least from my own code.
  2. I pledge not to write applications which bring up notification boxes to tell you that they have kept themselves up to date automatically, that someone has logged in to another machine, or that someone else is viewing a document a user has co-authored.
  3. Ideally, the update mech should integrate that from the OS, and so it can handle the notifications (or not).
  4. If I then add a notifications in an application for what I consider to be relevant information, I pledge for the default state to be "don't".
  5. They will all go away when left alone.
  6. Furthermore, I pledge to use the OS supplied mechanism and integrate with any do- not-disturb mechanism the OS implements.
I know, I haven't done do client side code for a long time, but I can assure people, if I did: I'd try to be much less annoying than what we have today. Because I recognise how much pain this causes.
    Categories: FLOSS Project Planets

    Colm O hEigeartaigh: Securing Apache Hadoop Distributed File System (HDFS) - part II

    Thu, 2017-04-20 10:23
    This is the second in a series of posts on securing HDFS. The first post described how to install Apache Hadoop, and how to use POSIX permissions and ACLs to restrict access to data stored in HDFS. In this post we will look at how to use Apache Ranger to authorize access to data stored in HDFS. The Apache Ranger Admin console allows you to create policies which are retrieved and enforced by a HDFS authorization plugin. Apache Ranger allows us to create centralized authorization policies for HDFS, as well as an authorization audit trail stored in SOLR or HDFS.

    1) Install the Apache Ranger HDFS plugin

    First we will install the Apache Ranger HDFS plugin. Follow the steps in the previous tutorial to setup Apache Hadoop, if you have not done this already. Then download Apache Ranger and verify that the signature is valid and that the message digests match. Due to some bugs that were fixed for the installation process, I am using version 1.0.0-SNAPSHOT in this post. Now extract and build the source, and copy the resulting plugin to a location where you will configure and install it:
    • mvn clean package assembly:assembly -DskipTests
    • tar zxvf target/ranger-1.0.0-SNAPSHOT-hdfs-plugin.tar.gz
    • mv ranger-1.0.0-SNAPSHOT-hdfs-plugin.tar.gz ${ranger.hdfs.home}
    Now go to ${ranger.hdfs.home} and edit "". You need to specify the following properties:
    • POLICY_MGR_URL: Set this to "http://localhost:6080"
    • REPOSITORY_NAME: Set this to "HDFSTest".
    • COMPONENT_INSTALL_DIR_NAME: The location of your Apache Hadoop installation
    Save "" and install the plugin as root via "sudo ./". The Apache Ranger HDFS plugin should now be successfully installed. Start HDFS with:
    • sbin/
    2) Create authorization policies in the Apache Ranger Admin console

    Next we will use the Apache Ranger admin console to create authorization policies for our data in HDFS. Follow the steps in this tutorial to install the Apache Ranger admin service. Start the Apache Ranger admin service with "sudo ranger-admin start" and open a browser and go to "http://localhost:6080/" and log on with "admin/admin". Add a new HDFS service with the following configuration values:
    • Service Name: HDFSTest
    • Username: admin
    • Password: admin
    • Namenode URL: hdfs://localhost:9000
    Click on "Test Connection" to verify that we can connect successfully to HDFS + then save the new service. Now click on the "HDFSTest" service that we have created. Add a new policy for the "/data" resource path for the user "alice" (create this user if you have not done so already under "Settings, Users/Groups"), with permissions of "read" and "execute".

    3) Testing authorization in HDFS

    Now let's test the Ranger authorization policy we created above in action. Note that by default the HDFS authorization plugin checks for a Ranger authorization policy that grants access first, and if this fails it falls back to the default POSIX permissions. The Ranger authorization plugin will pull policies from the Admin service every 30 seconds by default. For the "HDFSTest" example above, they are stored in "/etc/ranger/HDFSTest/policycache/" by default. Make sure that the user you are running Hadoop as can access this directory.

    Now let's test to see if I can read the data file as follows:
    • bin/hadoop fs -cat /data/LICENSE* (this should work via the underlying POSIX permissions)
    • sudo -u alice bin/hadoop fs -cat /data/LICENSE* (this should work via the Ranger authorization policy)
    • sudo -u bob bin/hadoop fs -cat /data/LICENSE* (this should fail as we don't have an authorization policy for "bob").

    Categories: FLOSS Project Planets

    Steve Loughran: Fear of Dependencies

    Thu, 2017-04-20 09:39
    There are some things to be scared of; some things to view as a challenge and embrace anyway.

    Here, Hardknott Pass falls into the challenge category —at least in summertime. You know you'll get up, the only question is "cycling" or "walking".

    Hardknott in Winter is a different game, its a "should I be trying to get up here at all" kind of issue. Where, for reference, the answer is usually: no. Find another way around.

    Upgrading dependencies to Hadoop jitters between the two, depending on what upgrade is being proposed.

    And, as the nominal assignee of HADOOP-9991, "upgrade dependencies", I get to see this.

    We regularly get people submitting one line patches "upgrade your dependency so you can work with my project' —and they are such tiny diffs people think "what a simple patch, it's easy to apply"

    The problem is they are one line patches that can lead to the HBase, Hive or Spark people cornering you and saying things like "why do you make my life so hard?"

    Before making a leap to Java 9, we're trapped whatever we do. Upgrade: things downstream break. Don' t upgrade, things downstream break when they update something else, or pull in a dependency which has itself updated.

    While Hadoop has been fairly good at keeping its own services stable, where it causes problems is in applications that pull in the Hadoop classpath for their own purposes: HBase, Hive, Accumulo, Spark, Flink, ...

    Here's my personal view on the risk factor of various updates.

    Critical :

    We know things will be trouble —and upgrades are full cross-project epics

    • protobuf., This will probably never be updated during the lifespan of Hadoop 2, given how google broke its ability to link to previously generated code.
    • Guava. Google cut things. Hadoop ships with Guava 11 but has moved off all deleted classes so runs happily against Guava 16+. I think it should be time just to move up, on the basis of Java 8 compatibility alone.
    • Jackson. The last time we updated, everything worked in Hadoop, but broke HBase. This makes everyone very said
    • In Hive and Spark: Kryo. Hadoop core avoids that problem; I did suggest adding it purely for the pain it would cause the Hive team (HADOOP-12281) —they knew it wasn't serious but as you can see, others got a bit worried. I suspect it was experience with my other POM patches that made them worry.
    I think a Jackson update is probably due, but will need conversations with the core downstream projects. And perhaps bump up Guava, given how old it is.

    High Risk

    Failures are traumatic enough we're just scared of upgrading unless there's a good reason.
    • jetty/servlets. Jetty has been painful (threads in the Datanodes to peform liveness monitoring of Jetty is an example of workarounds), but it was a known and managed problem). Plan is to move off jetty entirely and -> jersey + grizzly.
    • Servlet API.
    • jersey. HADOOP-9613 shows how hard that's been
    • Tomcat. Part of the big webapp set
    • Netty —again, a long standing sore point (HADOOP-12928, HADOOP-12927)
    • httpclient. There's a plan to move off Httpclient completely, stalled on hadoop-openstack. I'd estimate 2-3 days there, more testing than anything else. Removing a dependency entirely frees downstream projects from having to worry about the version Hadoop comes with.
    • Anything which has JNI bindings. Examples: leveldb, the codecs
    • Java. Areas of trauma: Kerberos,, SASL,

    With the move of trunk to Java 8, those servlet/webapp versions all need to be rolled.

    Medium Risk

    These are things where we have to be very cautious about upgrading, either because of a history of brittleness, or because failures would be traumatic
    • Jets3t. Every upgrade of jets3t moved the bugs. It's effectively frozen as "trouble, but a stable trouble", with S3a being the future
    • Curator 2.x ( see HADOOP-11612 ; HADOOP-11102) I had to do a test rebuild of curator 2.7 with guava downgraded to Hadoop's version to be confident that there were no codepaths that would fail. That doesn't mean I'm excited by Curator 3, as it's an unknown.
    • Maven itself
    • Zookeeper -for its use of guava.
    Here I'm for leaving Jets3t alone; and, once that Guava is updated, curator and ZK should be aligned.

    Low risk:

    Generally happy to upgrade these as later versions come out.
    • SLF4J yes, repeatedly
    • log4j 1.x (2.x is out as it doesn't handle files)
    • avro as long as you don't propose picking up a pre-release.
      (No: Avro 1.7 to 1.8 update is incompatible with generated compiled classes, same as protobuf.)
    • apache commons-lang,(minor -yes, major -no)
    • Junit

    I don't know which category the AWS SDK and azure SDKs fall into. Their jackson SDK dependency flags them as a transitive troublespot.

    Life would be much easier if (a) the guava team stopped taking things away and (b) either jackson stopped breaking things or someone else produced a good JSON library. I don't know of any -I have encountered worse.

    2016-05-31 Update: ZK doesn't use Guava. That's curator I'm thinking of.  Correction by Chris Naroth.
    Categories: FLOSS Project Planets

    Bryan Pendleton: A break in the rain

    Wed, 2017-04-19 20:50

    It was a beautiful day in the city, so I wandered over to the border between Chinatown and North Beach and hooked up with some old friends for a wonderful lunch.

    Thanks, all!

    Categories: FLOSS Project Planets

    Bryan Pendleton: Cop stories

    Wed, 2017-04-19 20:46

    I'll read almost everything; I'm pretty voracious that way.

    But certainly a good police procedural is always right up my alley.

    So, two recommendations, one old, and one new:

    • The Fairy Gunmother

      Pennac's novel is set in a post-imperial Paris of the mid-1980's, rich with the complexities that entails, and benefits from a truly superb translation by Ian Monk. The result is laugh-out-loud funny while still being atmospheric and compelling.

    • Leviathan Wakes

      Although you'll find this on your Science Fiction shelves at the local bookstore (hah! is there such a thing?), it's really a police procedural set in the future, in space, as more-than-haggard Detective Miller is trying to unravel why a simple missing persons case appears to be much, much deeper than it first seemed.

    Each of these is "Book 1 of a series".

    And I'll be reading more of each series, straightaway.

    Categories: FLOSS Project Planets

    Colm O hEigeartaigh: Securing Apache Hadoop Distributed File System (HDFS) - part I

    Wed, 2017-04-19 11:49
    Last year, I wrote a series of articles on securing Apache Kafka using Apache Ranger and Apache Sentry. In this series of posts I will look at how to secure the Apache Hadoop Distributed File System (HDFS) using Ranger and Sentry, such that only authorized users can access data stored in it. In this post we will look at a very basic way of installing Apache Hadoop and accessing some data stored in HDFS. Then we will look at how to authorize access to the data stored in HDFS using POSIX permissions and ACLs.

    1) Installing Apache Hadoop

    The first step is to download and extract Apache Hadoop. This tutorial uses version 2.7.3. The next step is to configure Apache Hadoop as a single node cluster so that we can easily get it up and running on a local machine. You will need to follow the steps outlined in the previous link to install ssh + pdsh. If you can't log in to localhost without a password ("ssh localhost") then you need to follow the instructions given in the link about setting up passphraseless ssh.

    In addition, we want to run Apache Hadoop in pseudo-distributed mode, where each Hadoop daemon runs as a separate Java process. Edit 'etc/hadoop/core-site.xml' and add:
    Next edit 'etc/hadoop/hdfs-site.xml' and add:

    Make sure that the JAVA_HOME variable in 'etc/hadoop/' is correct, and then format the filesystem and start Hadoop via:
    • bin/hdfs namenode -format
    • sbin/
    To confirm that everything is working correctly, you can open "http://localhost:50090" and check on the status of the cluster there. Once Hadoop has started then upload and then access some data to HDFS:
    • bin/hadoop fs -mkdir /data
    • bin/hadoop fs -put LICENSE.txt /data
    • bin/hadoop fs -ls /data
    • bin/hadoop fs -cat /data/*
    2) Securing HDFS using POSIX Permissions

    We've seen how to access some data stored in HDFS via the command line. Now how can we create some authorization policies to restrict how to access this data? Well the simplest way is to use the standard POSIX Permissions. If we look at the /data directory we see that it has the following permissions "-rw-r--r--", which means other users can read the LICENSE file stored there. Remove access to other users apart from the owner via:
    • bin/hadoop fs -chmod og-r /data
    Now create a test user called "alice" on your system and try to access the LICENSE we uploaded above via:
    • sudo -u alice bin/hadoop fs -cat /data/*
    You will see an error that says "cat: Permission denied: user=alice, access=READ_EXECUTE".

    3) Securing HDFS using ACLs

    Securing access to data stored in HDFS via POSIX permissions works fine, however it does not allow you for example to specify fine-grained permissions for users other than the file owner. What if we want to allow "alice" from the previous section to read the file but not "bob"? We can achieve this via Hadoop ACLs. To enable ACLs, we will need to add a property called "dfs.namenode.acls.enabled" with value "true" to 'etc/hadoop/hdfs-site.xml' + re-start HDFS.

    We can grant read access to 'alice' via:
    • bin/hadoop fs -setfacl -m user:alice:r-- /data/*
    • bin/hadoop fs -setfacl -m user:alice:r-x /data
    To check to see the new ACLs associated with LICENSE.txt do:
    • bin/hadoop fs -getfacl /data/LICENSE.txt
    In addition to the owner, we now have the ACL "user:alice:r--". Now we can read the data as "alice". However another user "bob" cannot read the data. To avoid confusion with future blog posts on securing HDFS, we will now remove the ACLs we added via:
    • bin/hadoop fs -setfacl -b /data
    • bin/hadoop fs -setfacl -b /data/LICENSE.txt
    Categories: FLOSS Project Planets

    Jan Materne: How to teach Sonar to find new bug?

    Wed, 2017-04-19 06:53

    How to teach Sonar to find new bug?


    A few days ago my Jenkins instance gave me
    the regurlar hint for updates. So I checked the changelog for changes which are
    interesting to me. One of them hit my eye: Jenkins 2.53 – „GC Performance:
    Avoid using FileInputStream and FileOutputStream in the core codebase.
    “ I
    read the two tickets (for Jenkins and the JDK itself) and was
    surprised. I hadn’t knew that.

    Some days later I regognized a longer
    about that by CloudBees on DZone.
    Also interesting.

    During thinking about changing the „new FIS/FOS“
    to something better in the open source projects I am working on (Apache Ant, Apache Commons) – Stefan
    was faster

    Categories: FLOSS Project Planets

    Colm O hEigeartaigh: Apache CXF 3.1.11 released

    Tue, 2017-04-18 08:11
    Apache CXF 3.1.11 (and 3.0.13) has been released. This release fixes a large number of bugs (there are over a 100 issues fixed in the CXF JIRA for this release). From a security POV, here are some of the more notable bug fixes and changes:
    • CXF-7315 - Abstract the STS client token caching behaviour to allow the user to plug in a custom implementation
    • CXF-7296 - Add support to enable revocation for TLS via configuration (see here). 
    • CXF-7314 - Custom BinarySecurityTokens are not used to set up the security context
    • CXF-4692 - Allow customization of Request Security Token Response
    • CXF-7252 - TLSParameterJaxBUtils.getTrustManagers getting password from wrong system property
    In addition, two new security advisories have been issued for bugs fixed in this release:
    • CVE-2017-5653 - Apache CXF JAX-RS XML Security streaming clients do not validate that the service response was signed or encrypted.
    • CVE-2017-5656 - Apache CXF's STSClient uses a flawed way of caching tokens that are associated with delegation tokens.
    Please update to the latest releases if you are affected by either of these issues.
    Categories: FLOSS Project Planets

    Ortwin Glück: [Code] Gentoo updates perl from 5.22 to 5.24

    Tue, 2017-04-18 03:43
    On desktop systems emerge usually complains that there are packages requiring 5.22 and refuses to update: !!! Multiple package instances within a single package slot have been pulled !!! into the dependency graph, resulting in a slot conflict: dev-lang/perl:0 (dev-lang/perl-5.24.1-r1:0/5.24::gentoo, ebuild scheduled for merge) pulled in by =dev-lang/perl-5.24* required by (virtual/perl-MIME-Base64-3.150.0-r2:0/0::gentoo, installed) ^ ^^^^^ (and 8 more with the same problem) (dev-lang/perl-5.22.3_rc4:0/5.22::gentoo, installed) pulled in by dev-lang/perl:0/5.22=[-build(-)] required by (dev-perl/Digest-HMAC-1.30.0-r1:0/0::gentoo, installed) ^^^^^^^^ (and 13 more with the same problem) To resolve that:

    Forcibly update perl (-O), then clean up:
    # emerge -1uavO perl # perl-cleaner --all (repeat perl-cleaner if emerge fails) There may still be perl virtuals that need reinstalling: # emerge -1av $(qlist -IC 'virtual/perl-*') This should leave you with a consistent perl build and emerge should no longer suggest a downgrade.
    Categories: FLOSS Project Planets