FLOSS Project Planets

Dirk Eddelbuettel: Rblpapi 0.3.6

Planet Debian - Thu, 2017-04-20 21:36

Time for a new release of Rblpapi -- version 0.3.6 is now on CRAN. Rblpapi provides a direct interface between R and the Bloomberg Terminal via the C++ API provided by Bloomberg Labs (but note that a valid Bloomberg license and installation is required).

This is the seventh release since the package first appeared on CRAN last year. This release brings a very nice new function lookupSecurity() contributed by Kevin Jin as well as a number of small fixes and enhancements. Details below:

Changes in Rblpapi version 0.3.6 (2017-04-20)
  • bdh can now store in double preventing overflow (Whit and John in #205 closing #163)

  • bdp documentation has another ovveride example

  • A new function lookupSecurity can search for securities, optionally filtered by yellow key (Kevin Jin and Dirk in #216 and #217 closing #215)

  • Added file init.c with calls to R_registerRoutines() and R_useDynamicSymbols(); also use .registration=TRUE in useDynLib in NAMESPACE (Dirk in #220)

  • getBars and getTicks can now return data.table objects (Dirk in #221)

  • bds has improved internal protect logic via Rcpp::Shield (Dirk in #222)

Courtesy of CRANberries, there is also a diffstat report for the this release. As always, more detailed information is on the Rblpapi page. Questions, comments etc should go to the issue tickets system at the GitHub repo.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

Categories: FLOSS Project Planets

KDE PIM update for Zesty available for testers

Planet KDE - Thu, 2017-04-20 21:31

Since we missed by a whisker getting updated PIM (kontact, kmail, akregator, kgpg etc..) into Zesty for release day, and we believe it is important that our users have access to this significant update, packages are now available for testers in the Kubuntu backports landing ppa.

While we believe these packages should be relatively issue-free, please bear in mind that they have not been tested as comprehensively as those in the main ubuntu archive.

Testers should be prepared to troubleshoot and hopefully report issues that may occur. Please provide feedback on our mailing list [1], IRC [2], or optionally via social media.

After a period of testing and verification, we hope to move this update to the main backports ppa.

You should have some command line knowledge before testing.
Reading about how to use ppa purge is also advisable.

How to test KDE PIM 16.12.3 for Zesty:

Testing packages are currently in the Kubuntu Backports Landing PPA.

sudo add-apt-repository ppa:kubuntu-ppa/backports-landing
sudo apt-get update
sudo apt-get dist-upgrade

1. Kubuntu-devel mailing list: https://lists.ubuntu.com/mailman/listinfo/kubuntu-devel
2. Kubuntu IRC channels: #kubuntu & #kubuntu-devel on irc.freenode.net

Categories: FLOSS Project Planets

Chapter Three: Installing Drupal 8 from configuration

Planet Drupal - Thu, 2017-04-20 20:46
Wouldn't it be great if???

Configuration management is one of the most useful site development features in Drupal 8. It makes a site's configuration exportable, importable and manageable in git. Whilst building the configuration management feature, a thought that often occurred was "Wouldn't it be great if you can take an existing set of configuration and install a new site from it?". Every Drupal developer has turned up to a new project and had to learn a different way to build a development site. Do you get the code from github? Download a database from production or some other special location? And is that database sanitised?

Categories: FLOSS Project Planets

Justin Mason: Links for 2017-04-20

Planet Apache - Thu, 2017-04-20 19:58
  • Amazon DynamoDB Accelerator (DAX)

    Amazon DynamoDB Accelerator (DAX) is a fully managed, highly available, in-memory cache for DynamoDB that delivers up to a 10x performance improvement – from milliseconds to microseconds – even at millions of requests per second. DAX does all the heavy lifting required to add in-memory acceleration to your DynamoDB tables, without requiring developers to manage cache invalidation, data population, or cluster management. No latency percentile figures, unfortunately. Also still in preview.

    (tags: amazon dynamodb aws dax performance storage databases latency low-latency)

  • I Just Love This Juicero Story So Much

    When we signed up to pump money into this juice company, it was because we thought drinking the juice would be a lot harder and more expensive. That was the selling point, because Silicon Valley is a stupid libertarian dystopia where investor-class vampires are the consumers and a regular person’s money is what they go shopping for. Easily opened bags of juice do not give these awful nightmare trash parasites a good bargain on the disposable income of credulous wellness-fad suckers; therefore easily opened bags of juice are a worse investment than bags of juice that are harder to open.

    (tags: juicero juicebros techbros silicon-valley funny dystopia fruit bags juice)

  • Zeynep Tufekci: Machine intelligence makes human morals more important | TED Talk | TED.com

    Machine intelligence is here, and we’re already using it to make subjective decisions. But the complex way AI grows and improves makes it hard to understand and even harder to control. In this cautionary talk, techno-sociologist Zeynep Tufekci explains how intelligent machines can fail in ways that don’t fit human error patterns — and in ways we won’t expect or be prepared for. “We cannot outsource our responsibilities to machines,” she says. “We must hold on ever tighter to human values and human ethics.” More relevant now that nVidia are trialing ML-based self-driving cars in the US…

    (tags: nvidia ai ml machine-learning scary zeynep-tufekci via:maciej technology ted-talks)

  • ‘Mathwashing,’ Facebook and the zeitgeist of data worship

    Fred Benenson: Mathwashing can be thought of using math terms (algorithm, model, etc.) to paper over a more subjective reality. For example, a lot of people believed Facebook was using an unbiased algorithm to determine its trending topics, even if Facebook had previously admitted that humans were involved in the process.

    (tags: maths math mathwashing data big-data algorithms machine-learning bias facebook fred-benenson)

  • Build a Better Monster: Morality, Machine Learning, and Mass Surveillance

    We built the commercial internet by mastering techniques of persuasion and surveillance that we’ve extended to billions of people, including essentially the entire population of the Western democracies. But admitting that this tool of social control might be conducive to authoritarianism is not something we’re ready to face. After all, we’re good people. We like freedom. How could we have built tools that subvert it? As Upton Sinclair said, “It is difficult to get a man to understand something, when his salary depends on his not understanding it.” I contend that there are structural reasons to worry about the role of the tech industry in American political life, and that we have only a brief window of time in which to fix this.

    (tags: advertising facebook google internet politics surveillance democracy maciej-ceglowski talks morality machine-learning)

Categories: FLOSS Project Planets

Carl Chenet: Only retweet if a specific hashtag is in a tweet with the Retweet Bot

Planet Python - Thu, 2017-04-20 18:00

You need to retweet tweets with a specific hashtag from another Twitter account to your own account? Just use the Retweet bot!

Configuration to retweet all tweets with a specific hashtag

Here is a complete example of a Retweet configuration to retweet all tweets with a specific hashtag. In this example every tweets with the hashtag #rt from the carl_chenet Twitter account will be retweeted to the Twitter account with the credentials above:

[twitter] screen_name_of_the_user_to_retweet=carl_chenet consumer_key=ml9jaiBnf3pmU9uIrKNIxAr3v consumer_secret=8Cmljklzerkhfer4hlj3ljl2hfvc123rezrfsdctpokaelzerp access_token=213416590-jgJnrJG5gz132nzerl5zerwi0ahmnwkfJFN9nr3j access_token_secret=3janlPMqDKlunJ4Hnr90k2bnfk3jfnwkFjeriFZERj32Z [retweet] only_if_hashtags=rt, [sqlite] sqlitepath=/var/lib/retweet/retweet.db

The [twitter] section uses the screen_name_of_the_user_to_retweet parameter to define which Twitter account should be listened to. The other parameters of this section are the Twitter credentials you need to automatically tweet (see the Configure chapter of the official Retweet bot documentation).

The [retweet] section does use here only one parameter only_if_hashtags. You just need a comma-separated string of your hashtags (without the hash).

The [sqlite] section only needs the sqlitepath parameter with a path to the Sqlite database storing the ids of the already-retweeted tweets.

Install and set up the Retweet bot

Ok now we have a complete Retweet configuration to retweet only if the hashtag #rt appears in the tweets. Now lets install Retweet:

# pip3 install retweet

Then lets create a retweet user with the home /var/lib/retweet:

# adduser --home /var/lib/retweet --gecos "" retweet

The proper way to store our configuration file is to put it in the /etc/retweet directory:

# mkdir -p /etc/retweet /var/lib/retweet # chown -R retweet:root /etc/retweet /var/lib/retweet

Now lets write our configuration file with the sections and the parameters defined above. You also have a full example available in the Configure section of the official Retweet documentation.

# vi /etc/retweet/retweet.ini

Almost ready! Last step: write a line in the crontab in order to launch Retweet on a regular basis:

*/10 * * * retweet retweet /etc/retweet/retweet.ini

And voila! Your Retweet bot is ready.

More information about the Retweet bot … and finally

You can help the Retweet Bot by donating anything through Liberaypay (also possible with cryptocurrencies). That’s a big factor motivation

Categories: FLOSS Project Planets

Ben's SEO Blog: Don’t Miss This Drupal 8 SEO Session at DrupalCon!

Planet Drupal - Thu, 2017-04-20 17:58

I hope you will be attending DrupalCon 2017 next week in Baltimore. This a great opportunity to update your Drupal knowledge and network with others. It’s also your chance to sign up for a special, two-hour training session on Drupal 8 SEO which is free to Drupalcon attendees.

I will be holding a Drupal 8 SEO Hands-On Seminar beginning at 15:45 on April 25 in room 321 at the Baltimore Convention Center. We will do the most important on-page optimizations that I’d execute for a Volacci SEO client. We’ll cover specific details that marketers should know to achieve SEO results with Drupal 8 with minimal need for developer help.

In addition, everyone who attends will receive a free electronic copy of my latest book, Drupal 8 SEO. This book is a step-by-step guide for ranking high in search engines with professional tips, modules, and best practices for Drupal 8 web sites.

Search Engine Optimization is a key part of the success of any Drupal website. With recent releases, Drupal 8 is ready for the SEO prime-time, but it can be difficult to know which modules to use and exactly how to configure them. This course will take the mystery out of Drupal 8 SEO.

In the hands-on portion of the class, you can optimize your very own website. Following Volacci’s Drupal SEO guidelines, the end-result will be a website that ranks better in search engine results, creates more leads and drives more revenue. If you want to do the hands-on portion of this class, you must bring your own dev environment. It can be your own Drupal website or a test website. Get the details here.

See you at DrupalCon!

2 Hours of Drupal SEO Training and a Free Book, Too!drupalcon, drupal 8 seo book, Planet Drupal
Categories: FLOSS Project Planets

Okular 1.1 released!

Planet KDE - Thu, 2017-04-20 17:08

Today KDE Applications 17.04 was released.

It includes Okular 1.1, it contains a nice set of features:
* Add annotation resize functionality
* Add support for auto-calculation of form contents via JavaScript
* Allow to rotate the page view using two-finger pinches on a touchscreen
* Change pages in presentation mode by swiping on touch screen
* Added support for Links that change the Optional Content visibility status
* Allow to disable automatic search while typing
* Allow to create bookmarks from the Table Of Contents

This release was brought to you by Albert Astals Cid, Oliver Sander, Luigi Toscano, Martin T. H. Sandsmark, Tobias Deiminger, Antonio Rojas, Burkhard Lück, Christoph Feck, Elvis Angelaccio, Gilbert Assaf, Heiko Becker, Hrvoje Senjan, Marco Scarpetta, Miklós Máté, Pino Toscano, Yuri Chornoivan.

Categories: FLOSS Project Planets

Drupal Modules: The One Percent: Drupal Modules: The One Percent — Toolbar Menu (video tutorial)

Planet Drupal - Thu, 2017-04-20 17:05
Drupal Modules: The One Percent — Toolbar Menu (video tutorial) NonProfit Thu, 04/20/2017 - 16:05 Episode 26

Here is where we bring awareness to Drupal modules running on less than 1% of reporting sites. Today we'll investigate Toolbar menu, a module which allows you to add menus to your toolbar.

Categories: FLOSS Project Planets

Shawn McKinney: Secure Web Apps with JavaEE and Apache Fortress

Planet Apache - Thu, 2017-04-20 16:18

ApacheCon is just a couple months away — coming up May 16-18 in Miami. We asked Shawn McKinney, Software Architect at Symas Corporation,  to share some details about his talk at ApacheCon. His presentation — “The Anatomy of a Secure Web Application Using Java EE, Spring Security, and Apache Fortress” will focus on an end-to-end application security architecture for an Apache Wicket Web app running in Tomcat. McKinney explains more in this interview.

Source: Secure Web Apps with JavaEE and Apache Fortress

Categories: FLOSS Project Planets

Drupal governance announcements: Community Discussion

Planet Drupal - Thu, 2017-04-20 16:08

Open conversation is essential to the wellbeing of any community. It is especially important now, as we collaboratively determine how to evolve our governance.

This discussion thread is being posted as a place for ongoing conversation about recent community issues and the governance improvements that the community is collaborating on together.

For background information on additional conversations being held, we direct you to:


...which has links to open community discussion sessions to be held at DrupalCon Baltimore, and that are being held virtually. After those sessions are completed, minutes will be posted to page above.

We encourage you to join us at those community discussions if you are able, and/or to continue the discussion here.

Categories: FLOSS Project Planets

Lullabot: Drupalcon Baltimore Edition

Planet Drupal - Thu, 2017-04-20 16:00
Matt and Mike sit down with several Lullabots who are presenting at Drupalcon Baltimore. We talk about our sessions, sessions that we're excited to see, and speaking tips for first-time presenters.
Categories: FLOSS Project Planets

CU Boulder - Webcentral: Upgrading A Drupal 7 Module to Drupal 8: Configuration Forms

Planet Drupal - Thu, 2017-04-20 15:41

In the last post in this series, we set up some routing for our module for three paths. One of those paths is to the module's main configuration form. Since this module has a Drupal 7 version, I am going to go by the old tried and true method of CDD, a.k.a Copy Driven Development. Copy, paste, cry, try to copy something else.

Developer Blog
Categories: FLOSS Project Planets

Acquia Developer Center Blog: Get the most out of your first DrupalCon!

Planet Drupal - Thu, 2017-04-20 14:59

To me, meeting and building relationships in person is the glue that holds us together and makes Drupal a community. If this is your first DrupalCon or first Drupal community event, it’ll be your first taste of this crazy, smart bunch of people scattered around the globe most of the rest of the year. Welcome! I’d like to help you get the most out of your first DrupalCon!

Tags: acquia drupal planetcommunityfirst timerseventdrupalcon
Categories: FLOSS Project Planets

Steve Loughran: The interruption economy

Planet Apache - Thu, 2017-04-20 14:19
With the untimely death of a laptop in Boston in February, I've rebuilt two laptops recently.

The first: a replacement for the dead one: a development macbook pro wired up to the various bits of work infra: MS office, VPN,  even hipchat. The second, a formerly dead 2009 macbook brought back to life with a 256GB SSD and a boost of its RAM to 8GB (!).

Doing this has brought home to be a harsh truth

The majority of applications you install on an OSX laptop consider it not just a right, but a duty, to interrupt you while you are trying to work.

It's not just the things where someone actually want's to talk  to (e.g. skype), it's pretty much everything you can install

For example, iTunes wants to be able to interrupt me, including playing sounds. It's a music player application, and it also wants to make beeping noises? Same for spotify. Why should background music apps or foreground media playback apps think they need to be able to interrupt you when they are running in the background?

Dropbox. I didn't realise this was doing notifications until it suddenly popped up to tell me the good news that it was keeping itself up to date automatically.

Keeping your installation up to date is something we should expect all applications to do. It should not be so important that you should pop up a dialog box "good news, you are only at risk from 0-day exploits we haven't found or patched yet!". Once I was aware that dropbox was happy to interrupt me, I went to its settings, only to discover that it also wants to interrupt me on "comments, share's and @mentions", and on synced files.

I hadn't noticed that a tool I used to sync files across machines had evolved into a groupware app where people could @mention me, but clearly it has, and in teams, interruptions whenever someone comments on things is clearly considered good. It also wants to interrupt me on files syncing. Think about that. We have an application whose primary purpose is "synchronising files across machines", and suddenly it wants to start popping up notifications when it is doing its job? What else should we have? Note taking applications sharing the good news that they haven't crashed yet?

Maybe, because amongst the apps which also consider interruption and inalienable right are: OneNote and macOS notes app. I have no idea what they want to interrupt me about: Notes doesn't specify what it wants to alert me about, only that it wants to notify me on locked screens and make a noise. OneNote? Lets you spec which notebooks can trigger interrupts, but again, the why is missing.

The list goes on. My password manager, text editor, IDE. Everything I install defaults to interrupting me.

Yes, you can turn the features off, but on a newly installed machine, that means that you have to go through every single app and disable every single interruption point. Miss out some small detail and while you are trying to get some work done, something pops up to say "lucky you! Something has happened which Photos thinks it is so important you should stop what you are doing and use it instead!". when you are building up two laptops, it means there's about 20+ times I've had to bring up the notifications preference pane, scroll down to whichever app last interrupted me, turn off all its notifications, then continue until something else chooses to break my concentration.

The web browsers want to let web pages interrupt you too.

Firefox you can't disable it, at least not without delving into about:config.

You can block it in the OS notifications settings, which implies it is at least integrated with the OS and the system-wide do-not-disturb feature.

Chrome: you can manage it in the browser —even though google don't want you to stop it, but it doesn't appear to  integrated with the OS;

With the OS integration, OSX's do-not-disturb feature won't work. will work here, so if you do let Chrome notify you, webapps gain the right to interrupt you during presentations, watching media content, etc.

Safari? Permitted, but OS controlled, completely blockable. This doesn't mean that webapps shouldn't be able to interrupt you: google calendar is a good example, it's just the easier we make it to do this, the more sites will want to.

The OS isn't even consistent itself. There is no way to tell time machine to not annoy you with the fact that it hasn't updated for 11 days. It's not part of the notification system, even though it came from the same building. What kind of example is that to set for others?

Because the default behaviour of every application is to interrupt, I have to go through every single installed app to disable it else my life is a constant noise of popups stating irrelevant facts. You may not notice that as you install one application at a time, turning off the settings individually, but when you build up a new box, the arrogance of all these applications becomes obvious, as it takes some time to actually stop your attention being attacked by the software you install.

Getting users to look at your app, your web site, is roped in as "The attention economy". That certainly applies to things like twitter, facebook, snapchat, etc. But how does translate into dropbox trying to get my attention to tell me that it's keeping itself up to date? Or whatever itunes or photos wants to interrupt me on? Why does OneNote need to tell me something about a saved workbook? This isn't "the attention economy". This is "interruption economy": people terrified that users may not be making full use of their features, so trying to keep popping up to encourage you to use the app or whatever new feature they've just installed

Interrupting people while they are trying to work is not a good use of the life of people whose work depends on "getting things done without interruptions". As my colleagues should know, though some of them forget, I don't run with hipchat on precisely because I hate getting popups "hey Steve, can i just ask..." , where the ask is something that I'd google for the answer myself, so why somebody asks me to google for them, I don't know. But even with the workflow interrupts off, things keep trying to stop me getting anything done

Then there's the apps which interrupt without any warning at all. I got caught out at this at Dataworks summit, where halfway through a presentation GPGMail popped up telling me there was a new version. This was a presentation where I'd explicitly set "do not disturb" on and war running full screen, but GPG mail checks weren't using it. Lesson: turn off the wifi as well as setting everything to do-not-disturb/offline.

Those update prompts, they are important. But everything keeps going "update me! now!" they end up being an irritant to ignore, just like the way the "service now!" alert pops up our car when we use it. It's just another low-level hint, not something which matters like "low pressure in tyres".

What it does really highlight is that having an applications keep itself up to date with security patches is still considered, on OSX, to be something worth interrupting the user to let them know about. All I can say it's a good thing that Linux apps don't feel the same way, or apt-get upgrade would be unbearable.

Finally, there's the OS
  • It'd be good if the OS recognised when a full screen media/presentation app was underway and automatically went into silent mode at that point.
  • All the OS's own notifications "upgrade available", "no time machine backups" should be integrated with the same notification mechanisms for app viewers. That's to help the users, but also set an example for all others.

What to to really do about it?

I'd really like to be able to tell the OS that the default settings for any newly installed app is "no notifications". Maybe now I've built up the laptops I won't have to go through the torment of disabling it across many apps, so it'll just be that case by case irritant. Even so, there's still the pain of being reminded of update options even

What I can do though, is promise not to personally write applications which interrupt people by default.

Here then, is my pledge:
  1. I pledge to give my users the opportunity to live a life free of interruptions, at least from my own code.
  2. I pledge not to write applications which bring up notification boxes to tell you that they have kept themselves up to date automatically, that someone has logged in to another machine, or that someone else is viewing a document a user has co-authored.
  3. Ideally, the update mech should integrate that from the OS, and so it can handle the notifications (or not).
  4. If I then add a notifications in an application for what I consider to be relevant information, I pledge for the default state to be "don't".
  5. They will all go away when left alone.
  6. Furthermore, I pledge to use the OS supplied mechanism and integrate with any do- not-disturb mechanism the OS implements.
I know, I haven't done do client side code for a long time, but I can assure people, if I did: I'd try to be much less annoying than what we have today. Because I recognise how much pain this causes.
    Categories: FLOSS Project Planets

    Mediacurrent: Dropcast: Episode 31 - DRUPALCON

    Planet Drupal - Thu, 2017-04-20 12:58
    Dropcast: Episode 31 - DRUPALCON

    Recorded April 12th, 2017

    Categories: FLOSS Project Planets

    The Accidental Coder: The State-of-Drupal Poll

    Planet Drupal - Thu, 2017-04-20 11:25

    Speak out about your feelings on several topics that are swirling in the Drupalsphere. The results of the poll will be published here during Drupalcon Baltimore. 

    Take the Poll!

    Tags: Drupal Planet
    Categories: FLOSS Project Planets

    MTech, LLC: Ultimate guide to migrating data into Drupal 8

    Planet Drupal - Thu, 2017-04-20 11:16
    Ultimate guide to migrating data into Drupal 8

    I give special greetings to all the people who read us and particularly to Baltimore at the headquarters of Drupalcon 2017.

    Charlotte León Thu, 04/20/2017 - 09:16
    Categories: FLOSS Project Planets

    Continuum Analytics News: Two Peas in a Pod: Anaconda + IBM Cognitive Systems

    Planet Python - Thu, 2017-04-20 11:05
    Company Blog Thursday, April 20, 2017 Travis Oliphant President, Chief Data Scientist & Co-Founder Continuum Analytics


    There is no question that deep learning has come out to play across a wide range of sectors—finance, marketing, pharma, legal...the list goes on. What’s more, from now until 2022, the deep learning market is expected to grow more than 65 percent. Clearly, companies are increasingly looking deeply at this popular machine learning approach to help fulfill business needs. Deep learning makes it possible to process giant datasets with billions of elements and extract useful predictive models. Deep learning is transforming the businesses of leading consumer Web and mobile app companies and is also being adopted by more traditional business enterprises. 

    That’s why this week we are pleased to announce the availability of Anaconda on IBM’s Cognitive Systems, the company’s high performance deep learning platform, highlighting the fact that Anaconda is regarded as an important capability for developers building cognitive solutions. The platform empowers these developers and data scientists to build and deploy deep learning applications that are ready to scale. Anaconda is also integrating with the IBM PowerAI software distribution that makes it simpler for companies to take advantage of Power performance and GPU optimization for data intensive cognitive workloads. 

    At Anaconda, we’re helping leading businesses across the world, like IBM, solve the world’s most challenging problems—from improving medical treatments to discovering planets to predicting effects of public policy—by handing them tools to identify patterns in data, uncover key insights and transform basic data into a goldmine of intelligence. This news reiterates the importance of Open Data Science in all factors of business. 

    Want to learn more about this news? Read the press release here

    Categories: FLOSS Project Planets

    GNUnet News: gnURL 7.54.0 released

    GNU Planet! - Thu, 2017-04-20 10:41

    Today the microfork of cURL, gnURL, has been released in version 7.54.0 following the last version-release of cURL. This fixes fixes CVE-2017-7468: switch off SSL session id when client cert is used among other issues (see https://curl.haxx.se/changes.html for the full Changelog).

    You have to run "./buildconf" before compiling gnURL.

    The download is available as usual at https://gnunet.org/gnurl

    Categories: FLOSS Project Planets

    Colm O hEigeartaigh: Securing Apache Hadoop Distributed File System (HDFS) - part II

    Planet Apache - Thu, 2017-04-20 10:23
    This is the second in a series of posts on securing HDFS. The first post described how to install Apache Hadoop, and how to use POSIX permissions and ACLs to restrict access to data stored in HDFS. In this post we will look at how to use Apache Ranger to authorize access to data stored in HDFS. The Apache Ranger Admin console allows you to create policies which are retrieved and enforced by a HDFS authorization plugin. Apache Ranger allows us to create centralized authorization policies for HDFS, as well as an authorization audit trail stored in SOLR or HDFS.

    1) Install the Apache Ranger HDFS plugin

    First we will install the Apache Ranger HDFS plugin. Follow the steps in the previous tutorial to setup Apache Hadoop, if you have not done this already. Then download Apache Ranger and verify that the signature is valid and that the message digests match. Due to some bugs that were fixed for the installation process, I am using version 1.0.0-SNAPSHOT in this post. Now extract and build the source, and copy the resulting plugin to a location where you will configure and install it:
    • mvn clean package assembly:assembly -DskipTests
    • tar zxvf target/ranger-1.0.0-SNAPSHOT-hdfs-plugin.tar.gz
    • mv ranger-1.0.0-SNAPSHOT-hdfs-plugin.tar.gz ${ranger.hdfs.home}
    Now go to ${ranger.hdfs.home} and edit "install.properties". You need to specify the following properties:
    • POLICY_MGR_URL: Set this to "http://localhost:6080"
    • REPOSITORY_NAME: Set this to "HDFSTest".
    • COMPONENT_INSTALL_DIR_NAME: The location of your Apache Hadoop installation
    Save "install.properties" and install the plugin as root via "sudo ./enable-hdfs-plugin.sh". The Apache Ranger HDFS plugin should now be successfully installed. Start HDFS with:
    • sbin/start-dfs.sh
    2) Create authorization policies in the Apache Ranger Admin console

    Next we will use the Apache Ranger admin console to create authorization policies for our data in HDFS. Follow the steps in this tutorial to install the Apache Ranger admin service. Start the Apache Ranger admin service with "sudo ranger-admin start" and open a browser and go to "http://localhost:6080/" and log on with "admin/admin". Add a new HDFS service with the following configuration values:
    • Service Name: HDFSTest
    • Username: admin
    • Password: admin
    • Namenode URL: hdfs://localhost:9000
    Click on "Test Connection" to verify that we can connect successfully to HDFS + then save the new service. Now click on the "HDFSTest" service that we have created. Add a new policy for the "/data" resource path for the user "alice" (create this user if you have not done so already under "Settings, Users/Groups"), with permissions of "read" and "execute".

    3) Testing authorization in HDFS

    Now let's test the Ranger authorization policy we created above in action. Note that by default the HDFS authorization plugin checks for a Ranger authorization policy that grants access first, and if this fails it falls back to the default POSIX permissions. The Ranger authorization plugin will pull policies from the Admin service every 30 seconds by default. For the "HDFSTest" example above, they are stored in "/etc/ranger/HDFSTest/policycache/" by default. Make sure that the user you are running Hadoop as can access this directory.

    Now let's test to see if I can read the data file as follows:
    • bin/hadoop fs -cat /data/LICENSE* (this should work via the underlying POSIX permissions)
    • sudo -u alice bin/hadoop fs -cat /data/LICENSE* (this should work via the Ranger authorization policy)
    • sudo -u bob bin/hadoop fs -cat /data/LICENSE* (this should fail as we don't have an authorization policy for "bob").

    Categories: FLOSS Project Planets
    Syndicate content