Are you the publisher? Claim or contact us about this channel

Embed this content in your HTML


Report adult content:

click to rate:

Account: (login)

More Channels

Channel Catalog

Articles on this Page

(showing articles 1 to 50 of 50)
(showing articles 1 to 50 of 50)

    0 0

    I’ve been mentoring a graduating senior in computer science.  Here’s what I told him…

    First, read the 10 steps to becoming the developer everyone wants.  It contains some pretty good strategies for what to do after you’ve landed that first job, how to become indispensable.

    But even if you get that first job straight away, it’s never too early to start building a public reputation.  If you’re not already a member, join the social media outlets like linkedin, twitter, and the like.  Where you can collaborate over concepts and ideas. Linkedin has some pretty good groups to join.  Once you become fluent in a topic, you can start your own group.  For example, here’s one that I manage: Linkedin Open Source IAM group.

    Even more important, open a github account and start publishing work there.  Read about The impact github is having on your software career.

    Also be sure to join tech groups in your hometown.  These will put you in the same room with like-minded professionals.  Here’s one that I’ve recently joined: Little Rock JUG.

    Then publish articles about topics that interest you.  If they interest you they will likely interest others.  Write about the research that you have completed.  Yes, the nitty-gritty details.  People love technical details when well thought out.  Retweet articles (written by others) that you like or agree with. Follow people that have work that you admire rather than for personal (friendship) reasons.  If you see something you like, let the other person know, ask questions about it.  If you see something you disagree with, offer constructive criticisms.  Above all be respectful and positive in your communications with others.  This is healthy collaboration in action and will be an important part of your technical career, as it blossoms.

    Forget about being the genius capable of writing superb software all by yourself.  That genius is a unicorn, at most 1% of the population.  If that’s you (and I don’t think that it is) congratulations and carry on!  You won’t need any more of my advice.  Otherwise, if like 99% of the population (the rest of us), you absorb knowledge by working around others. Surround yourself with the smartest people you can find.  Be humble.  Admit that you don’t understand how it works yet.  Keep your mouth (mostly) shut until you’ve learned from the people who came before, the current experts.  They will respect you for that and will encourage your ideas as they become viable.  Later, once you’ve mastered the basics, you may tell them how to improve, and they will listen.

    Eventually, perhaps after many years (less if you are lucky), you’ll have earned a good public reputation, and with it, a large number of loyal followers.  These people will then help you communicate about software projects that you’re interested in.  The latest releases of your software, conferences that you’re speaking at, articles that you’ve written, etc…

    Afterwards you need not worry about finding a job again.  They will find you.  A public reputation supersedes any single organizational boundary and gives you complete control over your career’s path.

    0 0

    Next week I am giving a talk at Red Hat Summit 2017 in Boston.

    S104668 - Developing cloud-ready Camel microservices

    Its a talk how to get started as a Java developer to build Java based microserves that runs on Kubernetes or OpenShift. Its a talk that is a mix between slides and live demo where its all coded and running locally on my old laptop.  I am using three of my favorite Java stack with Apache Camel, Spring Boot, and WildFly Swarm in the demos.

    The talk is on Tuesday May 2nd before lunch, eg 11:30 to 12:15pm so you can come and bild up an appetite.

    On Wednesday from 3-5pm I am on boot duty at the Red Hat Community Central, so that's a chance for you to come find me and have a chat. I am actually not aware where that is located, but I would assume its in the exhibition hall.

    I will be in Boston all week and attend Summit from Tuesday till Thursday. On Friday evening I am flying back home.

    I have started running for the last year or so, so I am also signed up for the 5km Summit run which happens at 6 am on Wednesday. In my time zone that would be noon so I am up and awake already.

    Other Camel Talks

    There is a number of other talks that I plan to attend.

    For example Rajith's talk with TD Bank where they talk how they are using Camel. I love to hear about what the real world does with Camel and Red Hat summit is a conference where also the customer stories are present.

    S103873 - Migrating TD Bank's monolithic Java EE application to a microservices architecture

    There is also the stuff I have been worked on lately a new product iPaaS that uses Camel under the hood. I am really looking forward to Keith and Hiram presenting this.

    S101856 - Red Hat iPaaS—integration made easy

    Christian Posta is always an inspiration. I always learn something when he give talks about microservices. He is out there in the field and see first hand what our customers want to do, and what they are doing/can do.

    S101993 - The hardest part of microservices is your data

    I have formerly worked in the health care industry and have a chance to hear Quinn Stevenson's talk. Quinn has been fantastic in the Camel community where he has contributed code patches, components and help Camel work better with OSGi and HL7.

    S103149 - Deploying Red Hat JBoss Fuse in healthcare—notes from the field

    I don't want to miss the chance to hear about how to migrate a monolith 10 year old system to a modern microservice based with Camel, Vert.X, and other cool technologies.

    S99785 - How to handle the complexity of migrating to microservices from 10 years of monolithic code

    There are more talks about Camel that I will try to attend. There are 12 sessions listed in the Red Hat Summit agenda.

    If you are attending Red Hat Summit. Then I hope we get a chance to meet and say hi. I love the hallway conversations at conferences and also to hear the good, bad, and ugly. Apache Camel is not perfect, likely far from it. But its adaptive and very flexible, and we have a very active, vibrant and open community.

    And btw Camel 2.19 is on the way. We are building the release candidate this week. So hopefully its released by Summit so I can tell Jim Whitehurst to announce that at the keynote ;)

    0 0

    We are proud to announce that Apache ManifoldCF 2.7 was released just some days ago.

    0 0

    PGP is not broken.  It has long been the best framework most of us have for digital identity, and a secure means of communication.

    Sadly the same cannot be said for certain popular PGP tools, nor for vast numbers of tutorials out there.  The usage we enjoyed and became accustomed to for a quarter century will now lead at best to confusion, and at worst to mistakes that could defeat the entire purpose of PGP and leave users wide open to spoofing.  That applies both to longstanding users who understand it well, and to the newbie who has read and understood a tutorial.

    The underlying problem is that 32-bit (8 hex character) key IDs are comprehensively broken.  The story of that is told at, by (I think) the people who originally demonstrated the issue.  It’s developed further since I last paid attention to it (and drew my colleagues’ attention to the need to stop using those 32-bit key IDs), in that an entire ‘shadow strong set’ has now been uploaded to the keyservers.  Those imposters were revoked by the evil32 folks, but with the idea being out there, anyone could now repeat that exercise and generate their own fake identities and fake Web of Trust.  And when a real malefactor does that, they’ll have the private keys, so there’ll be no-one to revoke them.

    Let’s take a look at a recent sequence of events, when I rolled a release candidate for an Apache software package, and PGP-signed it.  Bear in mind, this is all happening in a techie community: people who have been happily using PGP for years.

    [me] Signs a software bundle, upload it with the signature to web space.
    [colleague] Checks the software, comes back with a number of comments.  Among them:

    - Key B87F79A9 is listed as "revoked: 2016-08-16" in key server

    Where does that come from?  I take great care of my PGP keys, and I certainly don’t recollect revoking that one.  To have revoked it, someone needs to have had access to both my private key and my passphrase, which is kind-of equivalent to having both the chip and the PIN to use my bank card (and that’s ignoring risks like someone tampering with my post on its way from the bank).  This is … impossible … alarming!

    Yet this is exactly what happens if you RTFM:

    % gpg --verify bundle.ascgpg:Signaturemade Sun 16 Apr 2017 00:00:14 BST using RSA key ID B87F79A9

    We don’t have the release manager’s public key ( B87F79A9 ) in our local system. You now need to retrieve the public key from a key server.

    % gpg --recv-key B87F79A9gpg:requestingkey B87F79A9 B87F79A9:publickey"Nick Kew<me>"importedgpg:Totalnumberprocessed: 1
    gpg:imported: 1

    That’s a paraphrased extract from a real tutorial (which I intend to update, if noone else gets there first).  It was fine when it was written, but now imports not one but two keys.  Here they are:

    $ gpg --list-keys B87F79A9
     pub 4096R/B87F79A9 2011-01-30
     uid Nick Kew <niq@apache...>
     uid Nick Kew (4096-bit key) <nick@webthing...>
     sub 4096R/862BA082 2011-01-30
    pub 4096R/B87F79A9 2014-06-16 [revoked: 2016-08-16]
     uid Nick Kew <niq@apache...>

    Both appear to be me; one is really me, the other an imposter from the evil32 set.  It’s easy to see when we know what we’re looking for, but could be confusing if unexpected!

    The problem goes away if we use 64-bit Key IDs, or (nowadays strongly recommended) the full 160-bit (40 character) fingerprint.  It is computationally infeasible anyone could impersonate that, and indeed, they haven’t.

    $ gpg --fingerprint B87F79A9
     pub 4096R/B87F79A9 2011-01-30
     Key fingerprint = 3CE3 BAC2 EB7B BC62 4D1D 22D8 F3B9 D88C B87F 79A9
     uid Nick Kew <niq@apache...>
     uid Nick Kew (4096-bit key) <nick@webthing...>
     sub 4096R/862BA082 2011-01-30
    pub 4096R/B87F79A9 2014-06-16 [revoked: 2016-08-16]
     Key fingerprint = C74C 8AA5 91CB 3766 9D6F 73C0 2DF2 C6E4 B87F 79A9
     uid Nick Kew <niq@apache...>

    The imposter’s fingerprint is completely different from mine.  It’s not PGP that’s broken, it’s the use of 32-bit/8-character key IDs in our tools, our tutorials, and our minds, that’s at fault.

    However, the problem is a whole lot worse than that.  It’s not just my key (and everyone else in the Strong Set at the time of the evil32 demo) that has an imposter, it’s the entire WoT.  Let’s see if WordPress will let me present these side-by-side if I truncate the lines a bit.  The commandline used here is

    $ gpg --list-sigs [fingerprint] |egrep ^sig|cut -c14-50|sort|uniq|head -5

    which lists me:

    My Key Imposter
    010D6F3A 2012-04-11  dirk astrath (mo
    02D1BC65 2011-02-07  Peter Van Eynde 
    0AA3BF0E 2011-02-06  Christophe De Wo
    16879738 2011-02-07  Markus Reichelt 
    1DFBA164 2011-02-07  Bernhard Wiedema
    010D6F3A 2014-08-05  dirk astrath (mo
    02D1BC65 2014-08-05  Peter Van Eynde 
    0AA3BF0E 2014-08-05  Christophe De Wo
    16879738 2014-08-05  Markus Reichelt 
    1DFBA164 2014-08-05  Bernhard Wiedema

    The first field there is the culprit 8-hex-char Key IDs for my signatories and their evil32 doppelgangers.  The only clue is in those dates, which would be easy to overlook.  Otherwise we have a complete imposter WoT.   Those IDs offer no more security than a checksum (such as MD5 or SHA) if used without due care, and without a chain of trust right back to the user’s own signature (which is something you probably don’t have if you’re not a geek).

    There are a lot of tools and tutorials out there that need updating to prevent this becoming yet another phisher’s playground.  Tools should not merely stop displaying 8-character key IDs, they shouldn’t even accept them.  I don’t think mere disambiguation is enough when an innocent user might thoughtlessly just select, say, the first of competing options.

    I’ve already been diving in to some of those tutorials where I have write access to update them, but the task is complicated by having to work in the context of a document that deals with more than just the one thing, and without adding too much complexity for readers.  So I decided to work through the story here first!

    0 0

    Starting with 2.0.3, Apache Syncope provides native integration with Flowable BPM, the raising star of Open Source workflow management systems

    0 0

    Microsoft logo

    I'm proud to share with all of you that Microsoft invited me to join as a member of their Technical Advisory Group for contributing in the Open Source area for improving interoperability and Cloud SDK.

    0 0

    I started creating Angular 2 applications when it was in beta (back in March). To keep up with Angular 2's changes, I wrote a tutorial about developing with RC1 in June. Earlier this month, RC5 was released and many things changed once again. I think Scott Davis sums it up nicely in a tweet.

    To keep up with the rapid pace of change in Angular 2, I decided to write another tutorial, this time using Angular CLI. The biggest change I found since writing the last tutorial is testing infrastructure changes. Since Angular's Testing documentation hasn't been updated recently, hopefully this tutorial will help.

    Below is a table of contents in case you want to skip right to a particular section.

    What you'll build

    You'll build a simple web application with Angular CLI, a new tool for Angular 2 development. You'll create an application with search and edit features.

    What you'll need

    The latest release of Angular CLI (beta 10) uses Angular 2 RC4. Because of this, I used the master branch of Angular CLI to create this tutorial. To do this, clone angular-cli and run npm link in the directory you cloned it into. If you have issues, see #1733.

    Angular Augury is a Google Chrome Dev Tools extension for debugging Angular 2 applications. I haven't needed it much myself, but I can see how it might come in handy.

    Create your project

    Create a new project using the ng new command:

    ng new ng2-demo

    This will create a ng2-demo project and run npm install in it. It takes about a minute to complete, but will vary based on your internet connection speed.

    [mraible:~/dev] 45s $ ng new ng2-demo
    installing ng2
      create .editorconfig
      create src/app/app.component.css
      create src/app/app.component.html
      create src/app/app.component.spec.ts
      create src/app/app.component.ts
      create src/app/environment.ts
      create src/app/index.ts
      create src/app/shared/index.ts
      create src/favicon.ico
      create src/index.html
      create src/main.ts
      create src/system-config.ts
      create src/tsconfig.json
      create src/typings.d.ts
      create angular-cli-build.js
      create angular-cli.json
      create config/
      create config/environment.js
      create config/
      create config/karma-test-shim.js
      create config/karma.conf.js
      create config/protractor.conf.js
      create e2e/app.e2e-spec.ts
      create e2e/app.po.ts
      create e2e/tsconfig.json
      create e2e/typings.d.ts
      create .gitignore
      create package.json
      create public/.npmignore
      create tslint.json
      create typings.json
    Successfully initialized git.
    - Installing packages for tooling via npm
      -- es6-shim (global)
      -- angular-protractor (global dev)
      -- jasmine (global dev)
      -- selenium-webdriver (global dev)
    Installed packages for tooling via npm.
    [mraible:~/dev] 1m5s $

    You can see the what version of Angular CLI you're using with ng --version.

    $ ng --version
    angular-cli: local (v1.0.0-beta.11-webpack.2, branch: master)
    node: 4.4.7
    os: darwin x64

    Run the application

    The project is configured with a simple web server for development. To start it, run:

    ng serve

    You should see a screen like the one below at http://localhost:4200.

    Default Homepage

    You can make sure your new project's tests pass, run ng test:

    $ ng test
    Built project successfully. Stored in "dist/".
    Chrome 52.0.2743 (Mac OS X 10.11.6): Executed 2 of 2 SUCCESS (0.039 secs / 0.012 secs)

    Add a search feature

    To add a search feature, open the project in an IDE or your favorite text editor. For IntelliJ IDEA, use File > New Project > Static Web and point to the ng2-demo directory.

    The Basics

    In a terminal window, cd into your project's directory and run the following command. This will create a search component.

    $ ng g component search
    installing component
      create src/app/search/search.component.css
      create src/app/search/search.component.html
      create src/app/search/search.component.spec.ts
      create src/app/search/search.component.ts
      create src/app/search/index.ts

    Adding a Search Route
    In previous versions of CLI, you could generate a route and a component. However, since beta 8, route generation has been disabled. This will likely be re-enabled in a future release.

    The Router documentation for Angular 2 RC5 provides the information you need to setup a route to the SearchComponent you just generated. Here's a quick summary:

    Create src/app/app.routing.ts to define your routes.

    import { Routes, RouterModule } from '@angular/router';
    import { SearchComponent } from './search/index';
    const appRoutes: Routes = [
      { path: 'search', component: SearchComponent },
      { path: '', redirectTo: '/search', pathMatch: 'full' }
    export const appRoutingProviders: any[] = [];
    export const routing = RouterModule.forRoot(appRoutes);

    Without the last path to redirect, there's a Cannot match any routes: '' console error.

    In src/app/app.module.ts, import the two constants you exported and configure them in @NgModule:

    import { routing, appRoutingProviders } from './app.routing';
    import { SearchComponent } from './search/search.component';
      imports: [
      providers: [appRoutingProviders],
    export class AppModule { }

    In src/app/app.component.html, add a RouterOutlet to display routes.

    <!-- Routed views go here -->

    Now that you have routing setup, you can continue writing the search feature.

    To allow navigation to the SearchComponent, you can add a link in src/app/app.component.html.

      <a routerLink="/search" routerLinkActive="active">Search</a>

    Open src/app/search/search.component.html and replace its default HTML with the following:

      <input type="search" name="query" [(ngModel)]="query" (keyup.enter)="search()">
      <button type="button" (click)="search()">Search</button>
    <pre>{{searchResults | json}}</pre>

    If you still have ng serve running, your browser should refresh automatically. If not, navigate to http://localhost:4200, and you should see the search form.

    Search component

    If you want to add CSS for this components, open src/app/search/search.component.css and add some CSS. For example:

    :host {
      display: block;
      padding: 0 20px;

    This section has shown you how to generate a new component to a basic Angular 2 application with Angular CLI. The next section shows you how to create a use a JSON file and localStorage to create a fake API.

    The Backend

    To get search results, create a SearchService that makes HTTP requests to a JSON file. Start by generating a new service.

    ng g service search

    Move the generated search.service.ts and its test to app/shared/search. You will likely need to create this directory.

    Then, create src/app/shared/search/data/people.json to hold your data.

        "id": 1,
        "name": "Peyton Manning",
        "phone": "(303) 567-8910",
        "address": {
          "street": "1234 Main Street",
          "city": "Greenwood Village",
          "state": "CO",
          "zip": "80111"
        "id": 2,
        "name": "Demaryius Thomas",
        "phone": "(720) 213-9876",
        "address": {
          "street": "5555 Marion Street",
          "city": "Denver",
          "state": "CO",
          "zip": "80202"
        "id": 3,
        "name": "Von Miller",
        "phone": "(917) 323-2333",
        "address": {
          "street": "14 Mountain Way",
          "city": "Vail",
          "state": "CO",
          "zip": "81657"

    Modify src/app/shared/search/search.service.ts and provide Http as a dependency in its constructor. In this same file, create a getAll() method to gather all the people. Also, define the Address and Person classes that JSON will be marshalled to.

    import { Injectable } from '@angular/core';
    import { Http, Response } from '@angular/http';
    export class SearchService {
      constructor(private http: Http) {}
      getAll() {
        return this.http.get('app/shared/search/data/people.json').map((res: Response) => res.json());
    export class Address {
      street: string;
      city: string;
      state: string;
      zip: string;
      constructor(obj?: any) {
        this.street = obj && obj.street || null; = obj && || null;
        this.state = obj && obj.state || null; = obj && || null;
    export class Person {
      id: number;
      name: string;
      phone: string;
      address: Address;
      constructor(obj?: any) { = obj && Number( || null; = obj && || null; = obj && || null;
        this.address = obj && obj.address || null;

    To make these classes available for consumption by your components, edit src/app/shared/index.ts and add the following:

    export * from './search/search.service';

    In search.component.ts, add imports for these classes.

    import { Person, SearchService } from '../shared/index';

    You can now add query and searchResults variables. While you're there, modify the constructor to inject the SearchService.

    export class SearchComponent implements OnInit {
      query: string;
      searchResults: Array<Person>;
      constructor(private searchService: SearchService) {}

    Then implement the search() method to call the service's getAll() method.

    search(): void {
        data => { this.searchResults = data; },
        error => console.log(error)

    At this point, you'll likely see the following message in your browser's console.

    ORIGINAL EXCEPTION: No provider for SearchService!

    To fix the "No provider" error from above, update app.component.ts to import the SearchService and add the service to the list of providers.

    import { SearchService } from './shared/index';
      styleUrls: ['app.component.css'],
      viewProviders: [SearchService]

    Now clicking the search button should work. To make the results look better, remove the <pre> tag and replace it with a <table>.

    <table *ngIf="searchResults">
      <tr *ngFor="let person of searchResults; let i=index">
          {{}}, {{person.address.state}} {{}}

    Then add some additional CSS to improve its table layout.

    table {
      margin-top: 10px;
      border-collapse: collapse;
    th {
      text-align: left;
      border-bottom: 2px solid #ddd;
      padding: 8px;
    td {
      border-top: 1px solid #ddd;
      padding: 8px;

    Now the search results look better.

    Search Results

    But wait, we still don't have search functionality! To add a search feature, add a search() method to SearchService.

    search(q: string) {
      if (!q || q === '*') {
        q = '';
      } else {
        q = q.toLowerCase();
      return this.getAll().map(data => {
        let results: any = []; => {
          if (JSON.stringify(item).toLowerCase().includes(q)) {
        return results;

    Then refactor SearchComponent to call this method with its query variable.

    search(): void {
        data => { this.searchResults = data; },
        error => console.log(error)

    Now search results will be filtered by the query value you type in.

    This section showed you how to fetch and display search results. The next section builds on this and shows how to edit and save a record.

    Add an edit feature

    Modify search.component.html to add a click handler for editing a person.

    <td><a (click)="onSelect(person)">{{}}</a></td>

    In previous versions of Angular 2, you could embed a link with parameters directly into the HTML. For example:

    <a [routerLink]="['/edit',]">

    Unfortunately, this doesn't work with RC5. Another issue is adding href="" causes the page to refresh. Without href, the link doesn't look like a link. If you know of a solution to this problem, please send me a pull request.

    Then add onSelect(person) to search.component.ts. You'll need to import Router and set it as a local variable to make this work.

    import { Router } from '@angular/router';
    export class SearchComponent implements OnInit {
      constructor(private searchService: SearchService, private router: Router) { }
      onSelect(person: Person) {

    Run the following command to generate an EditComponent.

    $ ng g component edit
    installing component
      create src/app/edit/edit.component.css
      create src/app/edit/edit.component.html
      create src/app/edit/edit.component.spec.ts
      create src/app/edit/edit.component.ts
      create src/app/edit/index.ts

    Add a route for this component in app.routing.ts:

    import { EditComponent } from './edit/index';
    const appRoutes: Routes = [
      { path: 'search', component: SearchComponent },
      { path: 'edit/:id', component: EditComponent },
      { path: '', redirectTo: '/search', pathMatch: 'full' }

    Update src/app/edit/edit.component.html to display an editable form. You might notice I've added id attributes to most elements. This is to make things easier when writing integration tests with Protractor.

    <div *ngIf="person">
        <input [(ngModel)]="editName" name="name" id="name" placeholder="name"/>
        <input [(ngModel)]="editPhone" name="phone" id="phone" placeholder="Phone"/>
          <input [(ngModel)]="editAddress.street" id="street"><br/>
          <input [(ngModel)]="" id="city">,
          <input [(ngModel)]="editAddress.state" id="state" size="2">
          <input [(ngModel)]="" id="zip" size="5">
      <button (click)="save()" id="save">Save</button>
      <button (click)="cancel()" id="cancel">Cancel</button>

    Modify EditComponent to import model and service classes and to use the SearchService to get data.

    import { Component, OnInit, OnDestroy } from '@angular/core';
    import { Address, Person, SearchService } from '../shared/index';
    import { Subscription } from 'rxjs';
    import { ActivatedRoute, Router } from '@angular/router';
      selector: 'app-edit',
      templateUrl: 'edit.component.html',
      styleUrls: ['edit.component.css']
    export class EditComponent implements OnInit, OnDestroy {
      person: Person;
      editName: string;
      editPhone: string;
      editAddress: Address;
      sub: Subscription;
      constructor(private route: ActivatedRoute,
                  private router: Router,
                  private service: SearchService) {
      ngOnInit() {
        this.sub = this.route.params.subscribe(params => {
          let id = + params['id']; // (+) converts string 'id' to a number
          this.service.get(id).subscribe(person => {
            if (person) {
              this.editName =;
              this.editPhone =;
              this.editAddress = person.address;
              this.person = person;
            } else {
      ngOnDestroy() {
      cancel() {
      save() { = this.editName; = this.editPhone;
        this.person.address = this.editAddress;;
      gotoList() {
        if (this.person) {
          this.router.navigate(['/search', {term:} ]);
        } else {

    Modify SearchService to contain functions for finding a person by their id, and saving them. While you're in there, modify the search() method to be aware of updated objects in localStorage.

    search(q: string) {
      if (!q || q === '*') {
        q = '';
      } else {
        q = q.toLowerCase();
      return this.getAll().map(data => {
        let results: any = []; => {
          // check for item in localStorage
          if (localStorage['person' +]) {
            item = JSON.parse(localStorage['person' +]);
          if (JSON.stringify(item).toLowerCase().includes(q)) {
        return results;
    get(id: number) {
      return this.getAll().map(all => {
        if (localStorage['person' + id]) {
          return JSON.parse(localStorage['person' + id]);
        return all.find(e => === id);
    save(person: Person) {
      localStorage['person' +] = JSON.stringify(person);

    You can add CSS to src/app/edit/edit.component.css if you want to make the form look a bit better.

    :host {
      display: block;
      padding: 0 20px;
    button {
      margin-top: 10px;

    At this point, you should be able to search for a person and update their information.

    Edit form

    The <form> in src/app/edit/edit.component.html calls a save() function to update a person's data. You already implemented this above. The function calls a gotoList() function that appends the person's name to the URL when sending the user back to the search screen.

    gotoList() {
      if (this.person) {
        this.router.navigate(['/search', {term:} ]);
      } else {

    Since the SearchComponent doesn't execute a search automatically when you execute this URL, add the following logic to do so in its constructor.

    import { Router, ActivatedRoute } from '@angular/router';
    import { Subscription } from 'rxjs';
      sub: Subscription;
      constructor(private searchService: SearchService, private router: Router, private route: ActivatedRoute) {
        this.sub = this.route.params.subscribe(params => {
          if (params['term']) {
            this.query = decodeURIComponent(params['term']);

    You'll want to implement OnDestroy and define the ngOnDestroy method to clean up this subscription.

    import { Component, OnInit, OnDestroy } from '@angular/core';
    export class SearchComponent implements OnInit, OnDestroy {
      ngOnDestroy() {

    After making all these changes, you should be able to search/edit/update a person's information. If it works - nice job!


    Now that you've built an application, it's important to test it to ensure it works. The best reason for writing tests is to automate your testing. Without tests, you'll likely be testing manually. This manual testing will take longer and longer as your application grows.

    In this section, you'll learn to use Jasmine for unit testing controllers and Protractor for integration testing. Angular's testing documentation lists good reasons to test, but doesn't currently have many examples.

    Unit test the SearchService

    Modify src/app/shared/search/search.service.spec.ts and setup the test's infrastructure using MockBackend and BaseRequestOptions.

    import { MockBackend } from '@angular/http/testing';
    import { Http, ConnectionBackend, BaseRequestOptions, Response, ResponseOptions } from '@angular/http';
    import { SearchService } from './search.service';
    import { tick, fakeAsync } from '@angular/core/testing/fake_async';
    import { inject, TestBed } from '@angular/core/testing/test_bed';
    describe('SearchService', () => {
      beforeEach(() => {
          providers: [
              provide: Http, useFactory: (backend: ConnectionBackend, defaultOptions: BaseRequestOptions) => {
              return new Http(backend, defaultOptions);
            }, deps: [MockBackend, BaseRequestOptions]
            {provide: SearchService, useClass: SearchService},
            {provide: MockBackend, useClass: MockBackend},
            {provide: BaseRequestOptions, useClass: BaseRequestOptions}

    If you run ng test, you will likely see some errors about the test stubs that Angular CLI created for you. You can ignore these for now.

    ERROR in [default] /Users/mraible/ng2-demo/src/app/edit/edit.component.spec.ts:10:20
    Supplied parameters do not match any signature of call target.
    ERROR in [default] /Users/mraible/ng2-demo/src/app/search/search.component.spec.ts:10:20
    Supplied parameters do not match any signature of call target.

    Add the first test of getAll() to search.service.spec.ts. This test shows how MockBackend can be used to mock results and set the response.

    TIP: When you are testing code that returns either a Promise or an RxJS Observable, you can use the fakeAsync helper to test that code as if it were synchronous. Promises are be fulfilled and Observables are notified immediately after you call tick().

    The test below should be on the same level as beforeEach.

    it('should retrieve all search results',
      inject([SearchService, MockBackend], fakeAsync((searchService: SearchService, mockBackend: MockBackend) => {
        let res: Response;
        mockBackend.connections.subscribe(c => {
          let response = new ResponseOptions({body: '[{"name": "John Elway"}, {"name": "Gary Kubiak"}]'});
          c.mockRespond(new Response(response));
        searchService.getAll().subscribe((response) => {
          res = response;
        expect(res[0].name).toBe('John Elway');

    Notice that tests continually run as you add them when using ng test. You can run tests once by using ng test --watch=false. You will likely see "Executed 5 of 5 (1 FAILED)" in your terminal. Add a couple more tests for filtering by search term and fetching by id.

    it('should filter by search term',
      inject([SearchService, MockBackend], fakeAsync((searchService: SearchService, mockBackend: MockBackend) => {
        let res;
        mockBackend.connections.subscribe(c => {
          let response = new ResponseOptions({body: '[{"name": "John Elway"}, {"name": "Gary Kubiak"}]'});
          c.mockRespond(new Response(response));
        });'john').subscribe((response) => {
          res = response;
        expect(res[0].name).toBe('John Elway');
    it('should fetch by id',
      inject([SearchService, MockBackend], fakeAsync((searchService: SearchService, mockBackend: MockBackend) => {
        let res;
        mockBackend.connections.subscribe(c => {
          let response = new ResponseOptions({body: '[{"id": 1, "name": "John Elway"}, {"id": 2, "name": "Gary Kubiak"}]'});
          c.mockRespond(new Response(response));
        });'2').subscribe((response) => {
          res = response;
        expect(res[0].name).toBe('Gary Kubiak');

    Unit test the SearchComponent

    To unit test the SearchComponent, create a MockSearchProvider that has spies. These allow you to spy on functions to check if they were called.

    Create src/app/shared/search/mocks/search.service.ts and populate it with spies for each method, as well as methods to set the response and subscribe to results.

    import { SpyObject } from './helper';
    import { SearchService } from '../search.service';
    import Spy = jasmine.Spy;
    export class MockSearchService extends SpyObject {
      getAllSpy: Spy;
      getByIdSpy: Spy;
      searchSpy: Spy;
      saveSpy: Spy;
      fakeResponse: any;
      constructor() {
        super( SearchService );
        this.fakeResponse = null;
        this.getAllSpy = this.spy('getAll').andReturn(this);
        this.getByIdSpy = this.spy('get').andReturn(this);
        this.searchSpy = this.spy('search').andReturn(this);
        this.saveSpy = this.spy('save').andReturn(this);
      subscribe(callback: any) {
      setResponse(json: any): void {
        this.fakeResponse = json;

    In this same directory, create a helper.ts class to implement the SpyObject that MockSearchService extends.

    import {StringMapWrapper} from '@angular/core/src/facade/collection';
    export interface GuinessCompatibleSpy extends jasmine.Spy {
      /** By chaining the spy with and.returnValue, all calls to the function will return a specific
       * value. */
      andReturn(val: any): void;
      /** By chaining the spy with and.callFake, all calls to the spy will delegate to the supplied
       * function. */
      andCallFake(fn: Function): GuinessCompatibleSpy;
      /** removes all recorded calls */
    export class SpyObject {
      static stub(object = null, config = null, overrides = null) {
        if (!(object instanceof SpyObject)) {
          overrides = config;
          config = object;
          object = new SpyObject();
        let m = StringMapWrapper.merge(config, overrides);
        StringMapWrapper.forEach(m, (value, key) => { object.spy(key).andReturn(value); });
        return object;
      constructor(type = null) {
        if (type) {
          for (let prop in type.prototype) {
            let m = null;
            try {
              m = type.prototype[prop];
            } catch (e) {
              // As we are creating spys for abstract classes,
              // these classes might have getters that throw when they are accessed.
              // As we are only auto creating spys for methods, this
              // should not matter.
            if (typeof m === 'function') {
      spy(name) {
        if (!this[name]) {
          this[name] = this._createGuinnessCompatibleSpy(name);
        return this[name];
      prop(name, value) { this[name] = value; }
      /** @internal */
      _createGuinnessCompatibleSpy(name): GuinessCompatibleSpy {
        let newSpy: GuinessCompatibleSpy = <any>jasmine.createSpy(name);
        newSpy.andCallFake = <any>newSpy.and.callFake;
        newSpy.andReturn = <any>newSpy.and.returnValue;
        newSpy.reset = <any>newSpy.calls.reset;
        // revisit return null here (previously needed for rtts_assert).
        return newSpy;

    Alongside, create routes.ts to mock Angular's Router and ActivatedRoute.

    import { ActivatedRoute, Params } from '@angular/router';
    import { Observable } from 'rxjs';
    export class MockActivatedRoute extends ActivatedRoute {
      params: Observable<Params>
      constructor(parameters?: { [key: string]: any; }) {
        this.params = Observable.of(parameters);
    export class MockRouter {
      navigate = jasmine.createSpy('navigate');

    With mocks in place, you can TestBed.configureTestingModule() to setup SearchComponent to use these as providers.

    import { ActivatedRoute, Router } from '@angular/router';
    import { MockActivatedRoute, MockRouter } from '../shared/search/mocks/routes';
    import { MockSearchService } from '../shared/search/mocks/search.service';
    import { SearchComponent } from './search.component';
    import { TestBed } from '@angular/core/testing/test_bed';
    import { FormsModule } from '@angular/forms';
    import { SearchService } from '../shared/search/search.service';
    describe('Component: Search', () => {
      let mockSearchService: MockSearchService;
      let mockActivatedRoute: MockActivatedRoute;
      let mockRouter: MockRouter;
      beforeEach(() => {
        mockSearchService = new MockSearchService();
        mockActivatedRoute = new MockActivatedRoute({'term': 'peyton'});
        mockRouter = new MockRouter();
          declarations: [SearchComponent],
          providers: [
            {provide: SearchService, useValue: mockSearchService},
            {provide: ActivatedRoute, useValue: mockActivatedRoute},
            {provide: Router, useValue: mockRouter}
          imports: [FormsModule]

    Add two tests, one to verify a search term is used when it's set on the component, and a second to verify search is called when a term is passed in as a route parameter.

    it('should search when a term is set and search() is called', () => {
      let fixture = TestBed.createComponent(SearchComponent);
      let searchComponent = fixture.debugElement.componentInstance;
      searchComponent.query = 'M';;
    it('should search automatically when a term is on the URL', () => {
      let fixture = TestBed.createComponent(SearchComponent);

    After adding these tests, you should see the first instance of all tests passing (Executed 8 of 8 SUCCESS).

    Update the test for EditComponent, verifying fetching a single record works. Notice how you can access the component directly with fixture.debugElement.componentInstance, or its rendered version with fixture.debugElement.nativeElement.

    import { MockSearchService } from '../shared/search/mocks/search.service';
    import { EditComponent } from './edit.component';
    import { TestBed } from '@angular/core/testing/test_bed';
    import { SearchService } from '../shared/search/search.service';
    import { MockRouter, MockActivatedRoute } from '../shared/search/mocks/routes';
    import { ActivatedRoute, Router } from '@angular/router';
    import { FormsModule } from '@angular/forms';
    describe('Component: Edit', () => {
      let mockSearchService: MockSearchService;
      let mockActivatedRoute: MockActivatedRoute;
      let mockRouter: MockRouter;
      beforeEach(() => {
        mockSearchService = new MockSearchService();
        mockActivatedRoute = new MockActivatedRoute({'id': 1});
        mockRouter = new MockRouter();
          declarations: [EditComponent],
          providers: [
            {provide: SearchService, useValue: mockSearchService},
            {provide: ActivatedRoute, useValue: mockActivatedRoute},
            {provide: Router, useValue: mockRouter}
          imports: [FormsModule]
      it('should fetch a single record', () => {
        const fixture = TestBed.createComponent(EditComponent);
        let person = {name: 'Emmanuel Sanders', address: {city: 'Denver'}};
        // verify service was called
        // verify data was set on component when initialized
        let editComponent = fixture.debugElement.componentInstance;
        // verify HTML renders as expected
        let compiled = fixture.debugElement.nativeElement;
        expect(compiled.querySelector('h3').innerHTML).toBe('Emmanuel Sanders');

    You should see "Executed 8 of 8 SUCCESS (0.238 secs / 0.259 secs)" in the shell window that's running ng test. If you don't, try cancelling the command and restarting.

    Integration test the search UI

    To test if the application works end-to-end, you can write tests with Protractor. These are also known as integration tests, since they test the integration between all layers of your application.

    To verify end-to-end tests work in the project before you begin, run the following commands in three different console windows.

    ng serve
    ng e2e

    All tests should pass.

    $ ng e2e
    > ng2-demo@0.0.0 pree2e /Users/mraible/dev/ng2-demo
    > webdriver-manager update
    Updating selenium standalone to version 2.52.0
    Updating chromedriver to version 2.21
    downloading downloaded to /Users/mraible/dev/ng2-demo/node_modules/protractor/selenium/
    selenium-server-standalone-2.52.0.jar downloaded to /Users/mraible/dev/ng2-demo/node_modules/protractor/selenium/selenium-server-standalone-2.52.0.jar
    > ng2-demo@0.0.0 e2e /Users/mraible/dev/ng2-demo
    > protractor "config/protractor.conf.js"
    [00:01:07] I/direct - Using ChromeDriver directly...
    [00:01:07] I/launcher - Running 1 instances of WebDriver
    Spec started
      ng2-demo App
        ✔ should display message saying app works
    Executed 1 of 1 spec SUCCESS in 0.684 sec.
    [00:01:09] I/launcher - 0 instance(s) of WebDriver still running
    [00:01:09] I/launcher - chrome #01 passed
    All end-to-end tests pass.

    Testing the search feature

    Create end-to-end tests in e2e/search.e2e-spec.ts to verify the search feature works. Populate it with the following code:

    describe('Search', () => {
      beforeEach(() => {
      it('should have an input and search button', () => {
        expect(element(by.css('app-root app-search form input')).isPresent()).toEqual(true);
        expect(element(by.css('app-root app-search form button')).isPresent()).toEqual(true);
      it('should allow searching', () => {
        let searchButton = element(by.css('button'));
        let searchBox = element(by.css('input'));
        searchBox.sendKeys('M'); => {
          var list = element.all(by.css('app-search table tbody tr'));

    Testing the edit feature

    Create a e2e/edit.e2e-spec.ts test to verify the EditComponent renders a person's information and that their information can be updated.

    describe('Edit', () => {
      beforeEach(() => {
      let name = element('name'));
      let street = element('street'));
      let city = element('city'));
      it('should allow viewing a person', () => {
        expect(element(by.css('h3')).getText()).toEqual('Peyton Manning');
        expect(name.getAttribute('value')).toEqual('Peyton Manning');
        expect(street.getAttribute('value')).toEqual('1234 Main Street');
        expect(city.getAttribute('value')).toEqual('Greenwood Village');
      it('should allow updating a name', function () {
        let save = element('save'));
        // send individual characters since sendKeys passes partial values sometimes
        ' Won!'.split('').forEach((c) => name.sendKeys(c));;
        // verify one element matched this change
        var list = element.all(by.css('app-search table tbody tr'));

    Run ng e2e to verify all your end-to-end tests pass. You should see a success message similar to the one below in your terminal window.

    Protractor success

    If you made it this far and have all your specs passing - congratulations! You're well on your way to writing quality code with Angular 2 and verifying it works.

    You can see the test coverage of your project by opening coverage/index.html in your browser. You might notice that the new components and service could use some additional coverage. If you feel the need to improve this coverage, please send me a pull request!

    Test coverage

    Continuous Integration

    At the time of this writing, Angular CLI did not have any continuous integration support. However, it's easy to add with Travis CI. If you've checked in your project to GitHub, you can easily use Travis CI. Simply login and enable builds for the GitHub repo you created the project in. Then add the following .travis.yml in your root directory and git push. This will trigger the first build.

    language: node_js
    sudo: true
        - node
        - node_modules
    dist: trusty
      - '5.6.0'
      - master
     - npm install -g angular-cli
     - export CHROME_BIN=/usr/bin/google-chrome
     - export DISPLAY=:99.0
     - sh -e /etc/init.d/xvfb start
     - sudo apt-get update
     - sudo apt-get install -y libappindicator1 fonts-liberation
     - wget
     - sudo dpkg -i google-chrome*.deb
     - ng test --watch false #
     - ng serve &
     - ng e2e
        on_success: change  # options: [always|never|change] default: always
        on_failure: always  # options: [always|never|change] default: always
        on_start: false     # default: false

    Here is a build showing all unit and integration tests passing.

    Source code

    A completed project with this code in it is available on GitHub at If you have ideas for improvements, please leave a comment or send a pull request.

    This tutorial was originally written using Asciidoctor. This means you can read it using DocGist if you like.


    I hope you've enjoyed this in-depth tutorial on how to get started with Angular 2 and Angular CLI. Angular CLI takes much of the pain out of setting up an Angular 2 project and using Typescript. I expect great things from Angular CLI, mostly because the Angular 2 setup process can be tedious and CLI greatly simplifies things.

    0 0

    One thing about having several computers, and about never having quite enough time to work on them, is that whenever I turn on a particular computer, it's almost certain that I'll have updates to perform:

    • Windows updates
    • Java updates
    • nVidia driver updates
    • Steam updates
    • etc.

    In fact, I'll usually have at least 2 or 3 updates that run whenever I switch one of my computers on.

    At least the updates are mostly self-sufficient, though I can never really get the hang of which updates just run automatically, and which require me to baby-sit them at least to the point where they put up a confirmation prompt requesting me to authorize them to update their own software.


    Meanwhile, in the world of updates, I'm trying to figure out if Windows Subsystem for Linux has matured to the point where I can run Java 8 on it.

    As best I can understand from poking around on duh Netz, it seems that:

    • Oracle's Java 8 distribution has made a number of fixes, and now can be successfully installed and run on Windows Subsystem for Linux, at least according to this StackOverflow answer
    • But Java 8 in general really seems to prefer Ubuntu 16 over Ubuntu 14,
    • And Microsoft themselves suggest that both Java 8 and Ubuntu 16 are able to be used once I have upgraded to Windows 10 Creators Update (see this MSDN blog article)

    So it seems like the bottom line is that for the time being, I should continue to do my Java work using either the vanilla Windows JDK, or using my full Linux installation on my VirtualBox instance(s).

    But hopefully Windows 10 Creators Update will reach my machine soon (if I get really impatient, Microsoft says I can possibly hurry the process along using the Update Assistant).

    And then I can start a whole new round of updates!

    0 0

    Devoxx France is one of my favorite conferences. As you might know from my post about Jfokus, I thrive on a sense of community and the memories created by conferences. Last week in Paris, I experienced a passionate community and created several memories, with many good people and friends.

    I had two speaking events at the conference:

    For the workshop, I intro'd Angular, had the class create an Angular application, then talked about testing Angular. In additional, I showed them a number of demos:

    NOTE: Videos of my past performances about Angular can be found on YouTube:

    Unfortunately, we ran out of time before folks could complete the testing Angular exercise, but it was a fun session nevertheless. I hope the students enjoyed it as much as I did!

    Speaking about Cloud Native PWAs was a fantastic experience, mostly because of my good friend Josh Long. For those of you that have watched a @starbuxman talk, you know it's a great experience. Josh's well-timed jokes and stage presence is a source of envy for me. Sharing the stage with him was truly an honor.

    We had a fine time creating a resilient beer craft service that was consumable by an Angular UI that works offline. The fancy name for this type of UI is a progressive web app, but I like to call it an installable webapp. It's a cool concept that leverages services workers to allow webapps to work offline. Besides service workers, all you need is TLS (HTTPS) and a bunch of icons (referenced in a linked manifest) to give an app installability. Unfortunately, service workers are not present in all browsers, so this works best for Firefox/Chrome users.

    You can find the code we developed (from scratch!) in our talks on GitHub. The slide deck we used can be found on Speaker Deck.

    Thanks to the organizers of Devoxx France for creating such a wonderful conference experience! I sure had a great time.

    Update: Videos of Josh and my Cloud Native PWAs talks have been published to YouTube. Hope you enjoy!

    0 0

    The fabric8 project has been at the forefront on innovation to enable software professionals to develop and deploy faster to cloud native environments. With the announcement of at Red Hat Summit, 2017, the eco system of fabric8 has expanded to incorporate all (except the IDE - which is based on Che) of the technologies that make up the developer platform:

    The fabric8 project will continue to innovate and increase its scope to ensure it becomes the best environment for developers to continue to accelerate from idea to production. The fabric8 platform consists of over a hundred repositories under the GitHub fabric8io organization - all built on top of the fabric8 platform itself, keeping all releases in sync, and allowing us to continually improve our delivery.

    0 0

    Red Hat at Red Hat Summit 2017.

    Its a SaaS based development environment, that provides everything you need to get up and running and start producing code.  The marketing speak actually says is a "A free, end to end, cloud native development environment".

    This obviously caused a lot of questions for developers - it's sometimes (well, 99.99%) difficult to join the dots between marketing and reality. This caused some questions on Hacker News, but luckily Tyler Jewell, CEO of Codeny, was on hand to provide more clarity - its worth a read.

    Just to reiterate - what is going to provide for developers is a really  convient  hosteddevelopment environment to start their journey to developing apps for the cloud.

    1. A deployment environment for your code native apps: You can use GitHub as your src repo, but deploy onto openshift to run your apps, for free (within limits). Red Hat wants to give developers the opportunity to run reasonable applications at no cost, but they don't have bottomless pockets, so there has to be some reasonable constraints - obviously you can chip in if you need more space/cpu.

    2. A continuous deployment pipeline, that will enable you to run any code changes through test, stage to run, (which uses fabric8 and openshift pipelines under the covers).

    3. A web based IDE, based on Eclipse Che, so you can edit your code in situ, without leaving This is a great feature for development, and allows you to develop and test the application you are building. However, if you don't want to leave the comfort blanket of your favourite IDE running on your laptop, and just push code to GitHub for to pick up and deploy, that's OK too.

    4. Analytics built in (using fabric8 analytics): to identify security risks in dependencies you maybe using, and also identify other dependencies that might be a better fit (e.g. flag you're using a really old version of commons math).

    5. Agile management, to allow you to plan, and track development items for your code. This is really useful for collaborative development.

    The fabric8 ecosystem also provides lots of developer tooling and examples to help developers get started.

    0 0
  • 05/03/17--01:14: Ian Boston: Fouling
  • Fouling for any boat, large or small eats boatspeed, fuel and satisfaction. Most boats haul out every year, pressure wash off or scrape off the marine flora and fauna that have taken up residence. Owners then repaint the hull with a toxic antifouling paint. Toxic to marine life, and unpleasant to human. In small boats the toxicity has been greatly reduced over the years, some would argue so has the effectiveness. The problem is the paint erodes and the toxins leak. High concentrations of boats lead to high concentrations of toxins.

    For larger ships the cost of antifouling is significant. Interruption to service and the cost of a period in dry dock. Antifouling on large ships is considerably more toxic than available for the pleasure boat industry to extend the periods between maintenance to several years.

    About 10 years ago I coated my boat with copper particles embedded in epoxy. The exposed copper surface of the particles reacts with sea water to create a non soluble copper compound. This doesn’t leach, but like clipper ships coated in solid copper sheets discourages marine fouling. I have not painted since. Until a few years ago I did need to scrub off. I would see a few barnacles and some marine growth, but no more than would be normal with toxic paint.

    A few years ago I added 2 ultrasonic antifouling units. These send low power ultrasonic pulses into the hull and the water. According to research performed at Singapore University, barnacle larvae use antennae  to feel the surface they are about to attach to. Once attached the antenna stick to the surface and convert into the shell. Once attached, they never fall off. Ultrasound at various frequencies excites the antenna which disrupts the sensing process. The larvae swim past to attach elsewhere. There is also some evidence that ultrasound reduces algae growth. The phenomena was first discovered testing submarine sonar over 50 years ago. My uncle did submarine research in various scottish locks during the war.

    The ultrasonic antifouling I have fitted currently was a punt. 2 low cost units from Jaycar, first published in an Australian electronics magazine, that you put together yourself. Those are driven by a 8 pin PIC driving 2 MOSFETS. I think it’s made a difference. After a year in the water I have no barnacles and a bit of soft slime. There are expensive commercial units available at more cost, but the companies selling them seem to come and go. I am not convinced enough to spend that sort of money but I am curious and prepared to design and build a unit.

    The new unit (board above), for the new boat is a bit more sophisticated.  Its a 6 channel unit controlled by an Arduino Mega with custom code, controlling a MOSFET signal generator and 6 pairs of MOSFETS. It outputs 15-150Khz at up to 60W per channel. A prototype on the bench works and has my kids running out the house (they can hear the harmonics). My ears are a little older so can’t, but still ring a bit. I won’t run it at those levels as that will likely cavitate the water and kill things as well as eat power. It remains to be seen if the production board works, I have just ordered a batch of 5 from an offshore fabrication shop.

    0 0

    Recently I went back to Craig Hospital for an annual spinal cord injury re-evaluation and the results were very positive. It was really nice to see some familiar faces of the people for whom I have such deep admiration like my doctors, physical therapists and administrative staff. My doctor and therapists were quite surprised to see how well I am doing, especially given that I'm still seeing improvements three years later. Mainly because so many spinal cord injury patients have serious issues even years later. I am so lucky to no longer be taking any medications and to be walking again.

    It has also been nearly one year since I have been back to Craig Hospital and it seems like such a different place to me now. Being back there again feels odd for a couple of reasons. First, due to the extensive construction/remodel, the amount of change to the hospital makes it seem like a different place entirely. It used to be much smaller which encouraged more close interaction between patients and staff. Now the place is so big (i.e., big hallways, larger individual rooms, etc.) that patients can have more privacy if they want or even avoid some forms of interaction. Second, although I am comfortable being around so many folks who have been so severely injured (not everyone is), I have noticed that some folks are confused by me. I can tell the way they look at me that they are wondering what I am doing there because, outwardly, I do not appear as someone who has experienced a spinal cord injury. I have been lucky enough to make it out of the wheelchair and to walk on my own. Though my feet are still paralyzed, I wear flexible, carbon fiber AFO braces on my legs and walk with one arm crutch, the braces are covered by my pants so it's puzzling to many people.

    The folks who I wish I could see more are the nurses and techs. These are the folks who helped me the most when I was so vulnerable and confused and to whom I grew very attached. To understand just how attached I was, simply moving to a more independent room as I was getting better was upsetting to me because I was so emotionally attached to them. I learned that these people are cut from a unique cloth and possess very big hearts to do the work they do every day. Because they are so involved with the acute care of in-patients, they are very busy during the day and not available for much socializing as past patients come through. Luckily, there was one of my nurses who I ran into and was able to spend some time speaking with him. I really enjoyed catching up with him and hearing about new adventures in his career. He was one of the folks I was attached to at the time and he really made a difference in my experience. I will be eternally thankful for having met these wonderful people during such a traumatic time in my life.

    Today I am walking nearly 100% of the time with the leg braces and have been for over two years. I am working to rebuild my calves and my glutes, but this is a very, very long and slow process due to severe muscle atrophy after not being able to move my glutes for five months and my calves for two years. Although my feet are not responding yet, we will see what the future holds. I still feel so very lucky to be alive and continuing to make progress.

    Although I cannot run at all or cycle the way I did previously, I am very thankful to be able to work out as much as I can. I am now riding the stationary bike regularly, using my Total Gym (yes, I have a Chuck Norris Total Gym) to build my calves, using a Bosu to work on balance and strength in my lower body, doing ab roller workouts and walking as much as I can both indoors on a treadmill and outside. I'd like to make time for swimming laps again, but all of this can be time consuming (and tiring!). I am not nearly as fit as I was at the time of my injury, but I continue to work hard and to see noticeable improvements for which I am truly thankful.

    Thank you to everyone who continues to stay in touch and check in on me from time-to-time. You may not think it's much to send a quick message, but these messages have meant a lot to me through this process. The support from family and friends has been what has truly kept me going. The patience displayed by Bailey, Jade and Janene is pretty amazing.

    Later this month, I will mark the three year anniversary of my injury. It seems so far away and yet it continues to affect my life every day. My life will never be the same but I do believe I have found peace with this entire ordeal.

    0 0

    This is the fifth in a series of blog posts on securing HDFS. The first post described how to install Apache Hadoop, and how to use POSIX permissions and ACLs to restrict access to data stored in HDFS. The second post looked at how to use Apache Ranger to authorize access to data stored in HDFS. The third post looked at how Apache Ranger can create "tag" based authorization policies for HDFS using Apache Atlas. The fourth post looked at how to implement transparent encryption for HDFS using Apache Ranger. Up to now, we have not shown how to authenticate users, concentrating only on authorizing local access to HDFS. In this post we will show how to configure HDFS to authenticate users via Kerberos.

    1) Set up a KDC using Apache Kerby

    If we are going to configure Apache Hadoop to use Kerberos to authenticate users, then we need a Kerberos Key Distribution Center (KDC). Typically most documentation revolves around installing the MIT Kerberos server, adding principals, and creating keytabs etc. However, in this post we will show a simpler way of getting started by using a pre-configured maven project that uses Apache Kerby. Apache Kerby is a subproject of the Apache Directory project, and is a complete open-source KDC written entirely in Java.

    A github project that uses Apache Kerby to start up a KDC is available here:

    • bigdata-kerberos-deployment: This project contains some tests which can be used to test kerberos with various big data deployments, such as Apache Hadoop etc.
    The KDC is a simple junit test that is available here. To run it just comment out the "org.junit.Ignore" annotation on the test method. It uses Apache Kerby to define the following principals:
    • hdfs/
    • HTTP/
    Keytabs are created in the "target" folder for "alice", "bob" and "hdfs" (where the latter has both the hdfs/localhost + HTTP/localhost principals included). Kerby is configured to use a random port to lauch the KDC each time, and it will create a "krb5.conf" file containing the random port number in the target directory. So all we need to do is to point Hadoop to the keytabs that were generated and the krb5.conf, and it should be able to communicate correctly with the Kerby-based KDC.

    2) Configure Hadoop to authenticate users via Kerberos

    Download and configure Apache Hadoop as per the first tutorial. For now, we will not enable the Ranger authorization plugin, but rather secure access to the "/data" directory using ACLs, as described in section (3) of the first tutorial, such that "alice" has permission to read the file stored in "/data" but "bob" does not. The next step is to configure Hadoop to authenticate users via Kerberos.

    Edit 'etc/hadoop/core-site.xml' and adding the following property name/values:
    • kerberos
    • dfs.block.access.token.enable: true 
    Next edit 'etc/hadoop/hdfs-site.xml' and add the following property name/values to configure Kerberos for the namenode:
    • dfs.namenode.keytab.file: Path to Kerby hdfs.keytab (see above).
    • dfs.namenode.kerberos.principal: hdfs/
    • dfs.namenode.kerberos.internal.spnego.principal: HTTP/
    Add the exact same property name/values for the secondary namenode, except using the property name "secondary.namenode" instead of "namenode". We also need to configure Kerberos for the datanode:
    • 700
    • dfs.datanode.address:
    • dfs.datanode.http.address:
    • dfs.web.authentication.kerberos.principal: HTTP/
    • dfs.datanode.keytab.file: Path to Kerby hdfs.keytab (see above).
    • dfs.datanode.kerberos.principal: hdfs/
    As we are not using SASL to secure the the data transfer protocol (see here), we need to download and configure JSVC into JSVC_HOME. Then edit 'etc/hadoop/' and add the following properties:
    • export HADOOP_SECURE_DN_USER=(the user you are running HDFS as)
    • export JSVC_HOME=(path to JSVC as above)
    • export HADOOP_OPTS="<path to Kerby target/krb5.conf"
    You also need to make sure that you can ssh to localhost as "root" without specifying a password.

    3) Launch Kerby and HDFS and test authorization

    Now that we have hopefully configured everything correctly it's time to launch the Kerby based KDC and HDFS. Start Kerby by running the JUnit test as described in the first section. Now start HDFS via:
    • sbin/
    • sudo sbin/
    Now let's try to read the file in "/data" using "bin/hadoop fs -cat /data/LICENSE.txt". You should see an exception as we have no credentials. Let's try to read as "alice" now:
    • export KRB5_CONFIG=/pathtokerby/target/krb5.conf
    • kinit -t -k /pathtokerby/target/alice.keytab alice
    • bin/hadoop fs -cat /data/LICENSE.txt
    This should be successful. However the following should result in a "Permission denied" message:
    • kdestroy
    • kinit -t -k /pathtokerby/target/bob.keytab bob
    • bin/hadoop fs -cat /data/LICENSE.txt

    0 0

    Lost Crew WiP

    Guava problems have surfaced again.

    Hadoop 2.x has long-shipped Guava 14, though we have worked to ensure it runs against later versions, primarily by re-implementing our own classes of things pulled/moved across versions.

    Hadoop trunk has moved up to Guava 21.0, HADOOP-10101.This has gone and overloaded the Preconditions.checkState() method, such that: if you compile against Guava 21, your code doesn't link against older versions of Guava. I am so happy about this I could drink some more coffee.

    Classpaths are the gift that keeps on giving, and any bug report with the word "Guava" in it is inevitably going to be a mess. In contrast, Jackson is far more backwards compatible; the main problem there is getting every JAR in sync.

    What to do?

    Shade Guava Everywhere
    This is going too be tricky to pull off. Andrew Wang has taken on this task. this is one of those low level engineering projects which doesn't have press-release benefits but which has the long-term potential to reduce pain. I'm glad someone else is doing it & will keep an eye on it.

    Rush to use Java 9
    I am so looking forward to this from an engineering perspective:

    Pull Guava out
    We could do our own Preconditions, our own VisibleForTesting attribute. More troublesome are the various cache classes, which do some nice things...hence they get used. That's a lot of engineering.

    Fork Guava
    We'd have to keep up to date with all new Guava features, while reinstating the bits they took away. The goal: stuff build with old Guava versions still works.

    I'm starting to look at option four. Biggest issue: cost of maintenance.

    There's also the fact that once we use our own naming "org.apache.hadoop/hadoop-guava-fork" then maven and ivy won't detect conflicting versions, and we end up with > 1 version of the guava JARs on the CP, and we've just introduced a new failure mode.

    Java 9 is the one that has the best long-term potential, but at the same time, the time it's taken to move production clusters onto Java 8 makes it 18-24 months out at a minimum. Is that so bad though?

    I actually created the "Move to Java 9": JIRA in 2014. It's been lurking there, Akira Ajisaka doing the equally unappreciated step-by-step movement towards it.

    Maybe I should just focus some spare-review-time onto Java 9; see what's going on, review those patches and get them in. That would set things up for early adopters to move to Java 9, which, for in-cloud deployments, is something where people can be more agile and experimental.

    (photo: someone painting down in Stokes Croft. Lost Crew tag)

    0 0

    I am sitting at Boston Logan Airport and having a Samuel Adams lager and checking up my twitter timeline, emails and whatelse is happening.

    Two days ago I had my talk about developing cloud ready Camel microservices at Red Hat Summit 2017. The talk was video recorded and it is already online on you tube.

    The source code and slides is posted on my github account at:

    I had a great time at Red Hat Summit and enjoyed meeting up with fellow Red Hat co-workers and others whom I know from twitter or the open source communities.

    0 0

    Today, the ASF received yet another complaint from a distraught individual who had, in their opinion, received spam from the Apache Software Foundation. This time, via our Facebook page. As always, this is because someone sent email, and in that email is a link to a website – in this case, , which is displaying a default (ie, incorrectly configured) Apache web server, running on CentOS.

    This distraught individual threatened legal action against the ASF, and against CentOS, under FBI, Swedish, and International law, for sending them spam.

    No, Apache didn’t send you spam. Not only that, but Apache software wasn’t used to send you spam. Unfortunately, the spammer happened to be running a misconfigured copy of software we produced. That’s the extent of the connection. Also, they aren’t even compentent enough to correctly configure their web server.

    It would be like holding  a shovel company liable because someone dug a hole in your yard.

    Or, better yet, holding a shovel company liable because someone crashed into your car, and also happened to have a shovel in their trunk at the time.

    We get these complaints daily, to various email addresses at the Foundation, and via various websites and twitter accounts. While I understand that people are irritated at receiving spam, there’s absolutely nothing we can do about it.

    And, what’s more, it’s pretty central to the philosophy of open source that we don’t put restrictions on what people use our software for – even if they *had* used our software to send that email. Which they didn’t.

    So stop it.


    0 0

    The Last Pickle (TLP) intends to hire a project manager in the US to work directly with customers in the US and around the world. You will be part of the TLP team, coordinating and managing delivery of high-quality consulting services including expert advice, documentation and run books, diagnostics and troubleshooting, and proof-of-concept code.

    This role reports to the COO, and works closely with our CTO and technical team of Apache Cassandra consultants.

    Responsibilities Include

    • Managing project budgets, resources, and schedules to ensure on-time and on-budget delivery services.
    • Enabling the Consulting team to be as effective and efficient as possible, taking a bias towards action to remove barriers to delivery as needed.
    • Providing the highest level of customer service to multiple projects simultaneously.
    • Taking the lead on day-to-day client communication and/or coordination.
    • Assisting with gathering business requirements, estimating, and scoping new projects.
    • Ensuring timely response to customer requests.
    • Managing the expectations of the internal team and clients.
    • Escalating and coordinating resolution of issues internally as needed.

    Skills and Experience

    • Excellent written and oral communication skills, ensuring zero ambiguity and clear direction in a manner suiting the audience.
      • Ability to communicate in an open and ongoing way.
      • Must be comfortable asking questions.
    • Comfortable with spreadsheets and project management tools.
    • 3 years of relatable experience with similar responsibilities.
    • Experience working remotely with a high-level of autonomy.
    • Solid understanding of consulting business operations.
    • Strong organizational skills with the ability to “see at least three steps ahead” of everyone else.
    • Great attention to detail. Please address your cover letter to “Cristen.”
    • Ability to anticipate and manage risk.

    Bonus points for

    • Experience with open source and/or big data platforms.
    • Ability to problem solve and think intuitively about managing teams and clients (you are not a robot).
    • Have a technical aptitude and can help translate business needs and technical delivery requirements.
    • Are currently living in Austin, Texas.

    In return we offer

    • Being part of a globally recognised team of experts.
    • Flexible workday and location.
    • Time to work on open source projects and support for public speaking.
    • As much or as little business travel as you want.
    • No on-call roster.
    • A great experience helping companies big and small be successful.


    If this sounds like the right job for you let us know by emailing .


    The Last Pickle was born out of our passion for the open source community and firmly held belief that Cassandra would become the ubiquitous database platform of the next generation. We have maintained our focus on Apache Cassandra since starting in March 2011 as the first pure Apache Cassandra consultancy in the world. In the last six years we have been part of the success of customers big and small around the globe as they harness the power of Cassandra to grow their companies.

    Now we are looking to grow our own company by expanding our team in the US. As a profitable, self-funded start-up, TLP is able to place people at the heart of what we do. After years of working in a globally distributed team, with staff in New Zealand, America, Australia, and France, we realise happiness is the most important element in everything we do. We offer flexible work days with staff working from a mix of home and share offices, while still finding time in their day to pick up kids from school, go running, or check the surf conditions. With the help of our dedicated Happiness Coordinator, we work together to create a work-life balance that is mutually beneficial.

    0 0

    Last week I had the honour of giving a keynote at Adobe's Open Source Summit EU in Basel. Among many interesting talks they hosted a panel to answer questions around all things open source, strategy, inner source, open development. One of the questions I found intersting is how inner source/ open development and agile are related.

    To get everyone on the same page, what do I mean when talking about inner source/ open development? I first encountered the concept at ApacheCon EU Amsterdam in 2008: Bertrand Delacretaz, then Day Software, now Adobe was talking about how to apply open source development principles in a corporate environment. A while ago I ran across the term that essentially wants to achieve the same thing. What I find intersting about it is the idea of applying values like transparency, openess to contributions as well as concepts like asynchronous communication and decision making at a wider context.

    What does that have to do with Agile? Looking at the mere mechanics of Agile like being co-located, talking face to face frequently, syncing daily in in person meetings this seems to be pretty much opposite of what open source teams do.

    Starting with the values and promises of Agile though I think the two can compliment each other quite nicely:

    "Individuals and interactions over processes and tools" - there's a saying in many successful popular projects that the most important ingredient to a successful open source project is to integrate people into a team working towards a shared vision. Apache calls this Community over code, ZeroMQ's C4 calls it "People before code", I'm sure there are many other incarnations of the same concept.

    Much like in agile, this doesn't mean that tools are neglected alltogether. Git was created because there was nothing quite like this tool before. However at the core of it's design was the desire for ideal support for the way the Linux community works together.

    "Working software over comprehensive documentation" - While people strive for good documentation I'd be so bold as to suggest that for the open source projects I'm familiar with the most pressing motivation to provide good documentation is to lower the number of user questions as well as the amount of questions people involved with the project have to answer.

    For projects like Apache Lucene or Apache httpd I find the documentation that comes with the project to be exceptionally good. In addition both projects are supported with dead tree documentation even. My guess would be that working on geographically distributed teams who's members work on differing schedules is one trigger for that: Given teams that are geographically distributed over multiple time zones means that most communication will happen in writing anyway. So instead of re-typing the same answers over and over, it's less cost intensive to type them up in a clean way and post them to a place that has a stable URL and is easy to discoveri. So yes, while working software clearly is still in focus, the lack for frequent face to face communication can have a positive influence on the availability of documentation.

    "Customer collaboration over contract negotiation" - this is something open source teams take to the extreme: Anybody is invited to participate. Answers to feature requests like "patches welcome" typically are honest requests for support and collaboration, usually if taken up resulting in productive work relationships.

    "Responding to change over following a plan" - in my opinion this is another value that's taken to the extreme in open source projects. Without control over your participants, no ability to order people to do certain tasks and no influence over when and how much time someone else can dedicate to a certain task reaciting to changes is core to each open source project cosisting of more than one maintaining entity (be it single human being or simply one company paying for all currently active developers).

    One could look further, digging deeper into how the mechanics of specific Agile frameworks like Scrum match against what's being done in at least some open source projects (clearly I cannot speak for each and ever project out there given that I only know a limited set of them), but that's a topic for a another post.

    The idea of applying open source development practices within corporations has been hanging in the air for over 15 years, driven by different individuals over time. I think now that people are gathering to collect these ideas in one place and role them out in more corporations discussions around the topic are going to become interesting: Each open source community has their own specifics of working together, even though all seem to follow pretty much the same style when looked at from very high above. I'm curious to what common values and patterns can be derived from that which work across organisations.

    0 0

    Need to integrate Apache Syncope in your microservice architecture? Spring Boot comes to the rescue!

    0 0

    Apache Camel 2.19 was released on May 5th 2017 and its about time I do a little blog about what this release includes of noteworthy new features and improvements.

    Here is a list of the noteworthy new features and improvements.

    1. Spring Boot Improvements

    The Camel 2.19 release has been improved for Spring Boot in numerous ways. For example all the Camel components now include more details in their spring boot metadata files for auto configuration. This means tooling can now show default values, documentation etc for all the options on each component, language, and data format you may use, and configure in or .yml files.

    The release is also up to date with latest Spring Boot 1.5.3 release.

    Some component has improved auto configuration which makes it even easier to use, such as camel-servlet where you can easily setup the context-path from the file.

    We have also made available to configure many more options on CamelContext as well so you can tweak JMX, stream caching, and many other options.

    2. Camel Catalog Improvements

    The Camel Catalog now includes fine grained details of every artifact shipped in the release, also for the other kinds such as camel-hystrix, camel-cdi etc.

    The catalog now also include all the documentation in ascii doc and html format.

    The catalog has specialized providers for Spring Boot and Karaf runtimes, which allows tooling to know which of the Camel artifacts you can use on those runtimes.

    The Camel project uses the catalog itself, so we now use this to automatic generate and keep a full list of all the artifacts on the website, and when each artifact was added. You can therefore see whether its a new artifact in this release, or was introduced in Camel 2.17 etc.

    There is a specialized runtime version of the CamelCatalog provided in camel-core RuntimeCamelCatalog, which allows you to tap into the catalog when running Camel. The offline catalog is camel-catalog which is totally standalone.

    3. Camel Maven Plugin can now validate

    There is a new validate goal on the camel-maven-plugin which allows you to check your source code and validate all your Camel endpoints and simple expressions whether they have any invalid configuration or options. I have previously blogged about this.

    4. Auto reload XML files

    If you develop Camel routes in XML files, then you can now turn on auto reload, so Camel will watch the XML files for changes and then automatic update the routes on the fly. I have previously blogged and recorded a video of this.

    5. Service Call EIP improvements

    Luca has been buys improving the Service Call EIP so it works better and easier with Camel on the cloud, such as kubernetes or spring-boot-cloud. 

    Luca blogged recently about this.

    6. Calling REST services is easier

    If you want to use Camel to call RESTful services then its now easier as we add a producer side to the Rest DSL. This means you can call REST service using the rest component that can then plugin and use any of the HTTP based component in Camel such as restlet, http4, undertow etc.

    For more information see the rest-producer example.

    We also added a new camel-swagger-rest component that makes it even easier to call Swagger REST APIs, where you can refer to their operation id, and then let Camel automatic map to its API.

    For more information see the rest-swagger example and the rest-swagger documentation.

    7. CDI with JEE transactions

    The camel-cdi component now supports JEE transactions so you can leverage that out of the box without having to rely on spring transactions anymore.

    8. Example documentation improved

    We now generate a table with all the examples and sorted by category. This allows users to find the beginner examples, rest, cloud etc. And also ensure that we keep a better documentation for our examples in the future as the generator tool will WARN if we have examples without documentation.

    Also all examples have a readme file with information about the example and how to run.

    9. Spring Cloud components

    There is new Camel components that integrate with Spring Cloud and Spring Cloud Netflix. This makes it easy to use for example the ServiceCall EIP or Hystrix EIP with Spring Cloud Netflix or just Camel with Spring Cloud in general. You can find more information in the example.

    10. Kafka improvements

    The camel-kafka component has been improved to work more intuitively. This unfortunately means the uri syntax has changed in a backwards incompatible way. So if you are upgrading then make sure to change your uris. However the new syntax resemble how other messaging components does it by using kafka:topicName?options. 

    Also the component can now automatic convert to the kafka serializer and deserializer out of the box, so you dont have to hazzle with that. We provide converts to the typically used such as byte[] and string types.

    The component also has been upgraded to latest Kafka release and its now possible to store the offset state offline so you can resume from this offset in case you stop and later start your application.

    Its also much easier to configure and use custom key and partition key which can be supplied as header values.

    And there is a new Kafka idempotent repository.

    11. Route Contracts 

    We have added initial support for being able to specify an incoming and outgoing type to a Camel route (called transformer and validator inside Camel). This then allows both Camel at runtime, and Camel developers to know what payload the routes is expected as input and what it returns. For example you can specify that a route takes in XML and returns JSon. And with XML you can even specify the namespace. Likewise you can specify Java types for POJO classes. Based on these contracts Camel is able at runtime to automatic be able to type-covert the message payload (if possible) between these types if needed.

    We will continue with more improvements in this area. For example we hope we can add such capabilities to Camel components so they will be able to provide such information so your Camel routes is more type-safe with the message payloads during during routing.

    And tooling will also be able to tap into this formation and then for example "flag" users with hints about routes not being compatible etc.

    You can find more details in this example (we have for CDI and XML as well) and in the documentation.

    12. Reactive Camel

    There is a new camel-reactive-streams component that makes Camel work as first-class with the reactive-streams API so you can easily use Camel components in your reactive flows, or call flows from your Camel routes.

    For the next release there is a camel-rx2 component in the works which has improved support for Camel with the popular RxJava 2 framework.

    For users that want to use reactive with vert.x then there is a camel-vertx and vertx-camel-bridge components in both projects. We plan to merge them together and bring the best features from each of them together in the future, when we get some time. However Claus is in talks with the vert.x team about this. 

    You can find more information in some of this example. And the Camel in Action 2nd ed book contains an entire chapter 21 covering all of this.

    13. Java 8 DSL improvements

    And just on top of my head the Java 8 DSL has been slightly improved to allow using more of the Java 8 lambda and functional style in your Camel routes and EIPs. We will continue to improve this from time to time when we find EIPs that can be made more awesome for savvy Java 8 users. We are also looking for feedback in this area so if you are kneed deep in the Java 8 style then help us identify where we can improve the DSL.

    14. Camel Connectors

    We have introduced a new concept called Camel Connector. However its still early stages and we will over the next couple of releases further improve and refine what a Camel connector is.

    The short story is that a Camel Connector is a specialized and pre-configured Camel Component that can do one thing and one thing only. For example if you need to known when someone mentions you on twitter, then you can use the camel-twitter component. But it can do 10 things and it can take time to understand how to use the component and make it work. So instead you can build a connector that can just do that, a camel-twitter-mention connector.  It's pre-build and configured to just do that. So all you need to do is to configure your twitter credentials and off you go. At runtime the connector is a Camel component, so from Camel point of view, they are all components and therefore it runs as first-class in Camel.

    We have provided some connector examples in the source code.

    15. Many more components

    As usual there is a bunch of new components in every Camel release and this time we have about 20 new components. You can find the list of new components in the release notes, or on the Camel components website where you can search by the 2.19 release number.

    For example there is a camel-opentracing component that allows to use Camel with distributed tracing. Gary Brown has blogged about this.

    There is also a few new Camel components for IoT such as camel-milo that Jens Reimann blogged about.

    There is a bunch of other smaller improvements which you can find in the release notes. For example the jsonpath language now allows to use embedded simple language, and you can define predicates in a much simpler syntax without too many of the confusing jsonpath tokens, in case you just want to say > 1000 etc.

    0 0

    I recently took a trip to Puerto Vallarta, Mexico with my wife and rented a car for our 8 day stay. I had done the research in the past with regards to insurance and knew that as long as I use my VISA card, I should decline any extra coverage that the rental car agency offers. It turns out that in Mexico, you need at least one basic level of coverage called SLI or SAI insurance. This covers you against damage to someone else. This ran us an extra $115 on our bill. If we had gone for their full inclusive insurance, it would have been an additional $300! Way more than we paid to rent the car.

    I've also read articles like this one that claim "Declining to buy the insurance (some of which is mandatory, anyway) is foolhardy to the extreme, but buying the full package without knowing what you're buying is only slightly less so." The article must be out of date or I got bad information from the person at the rental counter, but the SLI/SAI insurance was mandatory.

    Well, to make a long story shorter, our brand new rental car got a nice big scratch on the side of it...

    In the US, this would cost probably $1000-1500 to fix. This had me worried that we would run into all sorts of trouble at the rental car company, so on our last day, we left a bit early to take care of things. I kept thinking, maybe we should have gotten the all inclusive insurance.

    When we arrived, they noticed the scratch immediately, of course. They were very nice about it and simply asked for a copy of the original rental agreement through the 3rd party company that we got the car from. They asked me to write up an accident report detailing what happened. This was a simple sentence. Then, they told me the price for the damage... only about $89! I didn't argue it. I've opened a ticket with VISA and I expect they will pay it after some period of time.

    This got me thinking... the rental car place must have their own insurance which covers mishaps like this. Why in the world would anyone go with the all inclusive insurance for $300+ when simple damage can be paid for relatively cheaply by the rental car company itself. Sure, there is probably a risk of total loss of the car, but that is super rare. Even still, VISA would cover it under their own insurance.

    So, unless you don't have VISA coverage, don't fret not getting the extra insurance. I'm sure others have worse stories, but this one turned out pretty well for me.

    0 0

    Another regular event at Apachecon North America will be the PGP Keysigning. Here we talk to Jean- Frederic Clere about the the Key Signing and why it is so important for Apache to build  and expand our web our web of trust.

    (NOTE: If you act as a Release Manager for any Apache project then you should really be attending this).

    If you want to participate in the Key Signing, please send your key to and see additional information in the ApacheCon Wiki.


    0 0

    Friday's news was full of breaking panic about an "attack" on the NHS, making it sound like someone had deliberately made an attempt to get in there and cause damage.

    It turns out that it wasn't an attack against the NHS itself, just a wide scale ransomware attack which combined click-through installation and intranet propagation by way of a vulnerability which the NSA had kept for internal use for some time.

    Laptops, Lan ports and SICP

    The NHS got decimated for a combination of issues:

    1. A massive intranet for SMB worms to run free.
    2. Clearly, lots of servers/desktops running the SMB protocol.
    3. One or more people reading an email with the original attack, bootstrapping the payload into the network.
    4. A tangible portion of the machines within some parts of the network running unpatched versions of Windows, clearly caused in part by the failure of successive governments to fund a replacement program while not paying MSFT for long-term support.
    5. Some of these systems within part of medical machines: MRI scanners, VO2 test systems, CAT scanners, whatever they use in the radiology dept —to name but some of the NHS machines I've been through in the past five years.
    The overall combination then is: a large network/set of networks with unsecured, unpatched targets were vulnerable to a drive-by attack, the kind of attack, which, unlike a nation state itself, you may stand a chance of actually defending against.

    What went wrong?

    Issue 1: The intranet. Topic for another post.

    Issue 2: SMB.

    In servers this can be justified, though it's a shame that SMB sucks as a protocol. Desktops? It's that eternal problem: these things get stuck in as "features", but which sometimes come to burn you. Every process listening on a TCP or UDP port is a potential attack point. A 'netstat -a" will list running vulnerabilities on your system; enumerating running services "COM+, Sane.d? mDNS, ..." which you should review and decide whether they could be halted. Not that you can turn mDNS off on a macbook...

    Issue 3: Email

    With many staff, email clickthrough is a function of scale and probability: someone will, eventually. Probability always wins.

    Issue 4: The unpatched XP boxes.

    This is why Jeremy Hunt is in hiding, but it's also why our last Home Secretary, tasked with defending the nation's critical infrastructure, might want to avoid answering questions. Not that she is answering questions right now.

    Finally, 5: The medical systems.

    This is a complication on the "patch everything" story because every update to a server needs to be requalified. Why? Therac-25.

    What's critical here is that the NHS was 0wned, not by some malicious nation state or dedicated hacker group: it fell victim to drive-by ransomware targeted at home users, small businesses, and anyone else with a weak INFOSEC policy This is the kind of thing that you do actually stand a chance of defending against, at least in the laptop, desktop and server.


    Defending against malicious nation state is probably near-impossible given physical access to the NHS network is trivial: phone up at 4am complaining of chest pains and you get a bed with a LAN port alongside it and told to stay there until there's a free slot in the radiology clinic.

    What about the fact that the NSA had an exploit for the SMB vulnerability and were keeping quiet on it until the Shadow Brokers stuck up online? This is a complex issue & I don't know what the right answer is.

    Whenever critical security patches go out, people try and reverse engineer them to get an attack which will work against unpatched versions of: IE, Flash, Java, etc. The problems here were:
    • the Shadow Broker upload included a functional exploit, 
    • it was over the network to enable worms, 
    • and it worked against widely deployed yet unsupported windows versions.
    The cost of developing the exploit was reduced, and the target space vast, especially in a large organisation. Which, for a worm scanning and attacking vulnerable hosts, is a perfect breeding ground.

    If someone else had found and fixed the patch, there'd still have been exploits out against it -the published code just made it easier and reduced the interval between patch and live exploit

    The fact that it ran against an old windows version is also something which would have existed -unless MSFT were notified of the issue while they were still supporting WinXP. The disincentive for the NSA to disclose that is that a widely exploitable network attack is probably the equivalent of a strategic armament, one step below anything that can cut through a VPN and the routers, so getting you inside a network in the first place.

    The issues we need to look at are
    1. How long is it defensible to hold on to an exploit like this?
    2. How to keep the exploit code secure during that period, while still using it when considered appropriate?
    Here the MSFT "tomahawk" metaphor could be pushed a bit further. The US govt may have tomahawk missiles with nuclear payloads, but the ones they use are the low-damage conventional ones. That's what got out this time.

    WMD in the Smithsonia

    One thing that MSFT have to consider is: can they really continue with the "No more WinXP support" policy? I know they don't want to do it, the policy of making customers who care paying for the ongoing support is a fine way to do it, it's just it leaves multiple vulnerabilites. People at home, organisations without the money and who think "they won't be a target", and embedded systems everywhere -like a pub I visited last year whose cash registers were running Windows XP embedded; all those ATMs out there, etc, etc.

    Windows XP systems are a de-facto part of the nation's critical infrastructure.

    Having the UK and US governments pay for patches for the NHS and everyone else could be a cost effective way of securing a portion of the national infrastructure, for the NHS and beyond.

    (Photos: me working on SICP during an unplanned five day stay and the Bristol Royal Infirmary. There's a LAN port above the bed I kept staring at; Windows XP Retail packaging, Smithsonian aerospace museum, the Mall, Washington DC)

    0 0

    ApacheCon NA 2017 attendee interview with Paul Angus

    0 0

    Its nice to see, Sir Tim Berners-Lee as a recipient of A.M. Turing Award. More details are available on

    0 0
  • 05/17/17--08:53: Nick Kew: The Great B Minor
  • This Sunday, May 21st, we’re performing Bach’s B Minor Mass at the Guildhall, Plymouth.  This work needs no introduction, and I have no hesitation recommending it for readers who enjoy music and are within evening-out distance of Plymouth.

    Tickets are cheaper in advance than on the door, so you might want to visit your favourite regular ticket vendor or google for online sales.

    Minor curiosity: the edition we’re using was edited by Arthur Sullivan.  Yes, he of G&S, and an entirely different era and genre of music!   It’s also the Novello edition used in most performances in Britain.

    0 0

    As you may already know Apache CXF has been offering a simple but effective support for tracing CXF client and server calls with HTrace since 2015.

    What is interesting about this feature is that it was done after the DevMind attended to Apache Con NA 2015 and got inspired about integrating CXF with HTrace.

    You'll be glad to know this feature has now been enhanced to get the trace details propagated to the logs which is the least intrusive way of working with HTrace though should you need more advanced control, CXF will help, see this section for example.

    CXF has also been integrated with Brave. That should do better for CXF OSGI users. The integration work with Brave 4 is under way now.

    0 0

    Microsoft logo

    The last week I joined Microsoft Build 2017 in Seattle together with some members of the Technical Advisory Group (TAG).

    I would like to share some of the good points for developers and the Enterprise field in terms of how Microsoft is continuing to adopt Open Source and Open Standards in different ways.

    0 0

    A recent series of blog posts showed how to install and configure Apache Hadoop as a single node cluster, and how to authenticate users via Kerberos and authorize them via Apache Ranger. Interacting with HDFS via the command line tools as shown in the article is convenient but limited. Talend offers a freely-available product called Talend Open Studio for Big Data which you can use to interact with HDFS instead (and many other components as well). In this article we will show how to access data stored in HDFS that is secured with Kerberos as per the previous tutorials.

    1) HDFS setup

    To begin with please follow the first tutorial to install Hadoop and to store the LICENSE.txt in a '/data' folder. Then follow the fifth tutorial to set up an Apache Kerby based KDC testcase and configure HDFS to authenticate users via Kerberos. To test everything is working correctly on the command line do:

    • export KRB5_CONFIG=/pathtokerby/target/krb5.conf
    • kinit -k -t /pathtokerby/target/alice.keytab alice
    • bin/hadoop fs -cat /data/LICENSE.txt
    2) Download Talend Open Studio for Big Data and create a job

    Now we will download Talend Open Studio for Big Data (6.4.0 was used for the purposes of this tutorial). Unzip the file when it is downloaded and then start the Studio using one of the platform-specific scripts. It will prompt you to download some additional dependencies and to accept the licenses. Click on "Create a new job" called "HDFSKerberosRead". In the search bar under "Palette" on the right hand side enter "tHDFS" and hit enter. Drag "tHDFSConnection" and "tHDFSInput" to the middle of the screen. Do the same for "tLogRow":
    We now have all the components we need to read data from HDFS. "tHDFSConnection" will be used to configure the connection to Hadoop. "tHDFSInput" will be used to read the data from "/data" and finally "tLogRow" will just log the data so that we can be sure that it was read correctly. The next step is to join the components up. Right click on "tHDFSConnection" and select "Trigger/On Subjob Ok" and drag the resulting line to "tHDFSInput". Right click on "tHDFSInput" and select "Row/Main" and drag the resulting line to "tLogRow":
    3) Configure the components

    Now let's configure the individual components. Double click on "tHDFSConnection". For the "version", select the "Hortonworks" Distribution with version HDP V2.5.0 (we are using the original Apache distribution as part of this tutorial, but it suffices to select Hortonworks here). Under "Authentication" tick the checkbox called "Use kerberos authentication". For the Namenode principal specify "hdfs/". Select the checkbox marked "Use a keytab to authenticate". Select "alice" as the principal and "<>/target/alice.keytab" as the "Keytab":
    Now click on "tHDFSInput". Select the checkbox for "Use an existing connection" + select the "tHDFSConnection" component in the resulting component list. For "File Name" specify the file we want to read: "/data/LICENSE.txt":
    Now click on "Edit schema" and hit the "+" button. This will create a "newColumn" column of type "String". We can leave this as it is, because we are not doing anything with the data other than logging it. Save the job. Now the only thing that remains is to point to the krb5.conf file that is generated by the Kerby project. Click on "Window/Preferences" at the top of the screen. Select "Talend" and "Run/Debug". Add a new JVM argument: "":

    Now we are ready to run the job. Click on the "Run" tab and then hit the "Run" button. If everything is working correctly, you should see the contents of "/data/LICENSE.txt" displayed in the Run window.

    0 0

    ApacheCon 2017 attendee interview with Kevin McGrail. We talk to Kevin about his new role as VP Fundraising and the goals he’d like to achieve over the coming year.

    0 0

    • Spotting a million dollars in your AWS account · Segment Blog

      You can easily split your spend by AWS service per month and call it a day. Ten thousand dollars of EC2, one thousand to S3, five hundred dollars to network traffic, etc. But what’s still missing is a synthesis of which products and engineering teams are dominating your costs.  Then, add in the fact that you may have hundreds of instances and millions of containers that come and go. Soon, what started as simple analysis problem has quickly become unimaginably complex.  In this follow-up post, we’d like to share details on the toolkit we used. Our hope is to offer up a few ideas to help you analyze your AWS spend, no matter whether you’re running only a handful of instances, or tens of thousands.

      (tags: segmentmoneycostsbillingawsec2ecsops)

    0 0

    Here’s another blog post of mine that was initially published by Computerworld UK.

    My current Fiat Punto Sport is the second Diesel car that I own, and I love those engines. Very smooth yet quite powerful acceleration, good fuel savings, a discount on state taxes thanks to low pollution, and it’s very reliable and durable. And fun to drive. How often does Grandma go “wow” when you put the throttle down in your car? That happens here, and that Grandma is not usually a car freak.

    Diesel engines used to be boring, but they have made incredible progress in the last few years – while staying true to their basic principles of simplicity, robustness and reliability.

    The recent noise about the Apache Software Foundation (ASF) moving to Git, or not, made me think that the ASF might well be the (turbocharged, like my car) Diesel engine of open source. And that might be a good thing.

    The ASF’s best practices are geared towards project sustainability, and building communities around our projects. That might not be as flashy as creating a cool new project in three days, but sometimes you need to build something durable, and you need to be able to provide your users with some reassurances that that will be the case – or that they can take over cleanly if not.

    In a similar way to a high tech Diesel engine that’s built to last and operate smoothly, I think the ASF is well suited for projects that have a long term vision. We often encourage projects that want to join the ASF via its Incubator to first create a small community and release some initial code, at some other place, before joining the Foundation. That’s one way to help those projects prove that they are doing something viable, and it’s also clearly faster to get some people together and just commit some code to one of the many available code sharing services, than following the ASF’s rules for releases, voting etc.

    A Japanese 4-cylinder 600cc gasoline-powered sports bike might be more exciting than my Punto on a closed track, but I don’t like driving those in day-to-day traffic or on long trips. Too brutal, requires way too much attention. There’s space for both that and my car’s high tech Diesel engine, and I like both styles actually, depending on the context.

    Open Source communities are not one-size-fits-all: there’s space for different types of communities, and by exposing each community’s positive aspects, instead of trying to get them to fight each other, we might just grow the collective pie and live happily ever after (there’s a not-so-hidden message to sensationalistic bloggers in that last paragraph).

    I’m very happy with the ASF being the turbocharged Diesel engine of Open Source – it does have to stay on its toes to make sure it doesn’t turn into a boring old-style Diesel, but there’s no need to rush evolution. There’s space for different styles.

    0 0

    Had a great time this week at ApacheCon.  This talk was presented on Thursday…

    0 0
  • 05/21/17--06:01: Bryan Pendleton: Back online
  • I took a break from computers.

    I had a planned vacation, and so I did something that's a bit rare for me: I took an 11 day break from computers.

    I didn't use any desktops or laptops. I didn't have my smartphone with me.

    I went 11 days without checking my email, or signing on to various sites where I'm a regular, or opening my Feedly RSS read, or anything like that.

    Now, I wasn't TOTALLY offline: there were newspapers and television broadcasts around, and I was traveling with other people who had computers.

    But, overall, it was a wonderful experience to just "unplug" for a while.

    I recommend it highly.

    0 0

    0 0

    We took an altogether-too-short but thoroughly wonderful trip to the Upper Rhine Valley region of Europe. I'm not sure that "Upper Rhine Valley" is a recognized term for this region, so please forgive me if I've abused it; more technically, we visited:

    1. The Alsace region of France
    2. The Schwarzenwald region of Germany
    3. The neighboring areas of Frankfurt, Germany, and Basel, Switzerland.
    But since we were at no point more than about 40 miles from the Rhine river, and since we were several hundred miles from the Rhine's mouth in the North Sea, it seems like a pretty good description to me.

    Plus, it matches up quite nicely with this map.

    So there you go.

    Anyway, we spent 10 wonderful days there, which was hardly even close to enough, but it was what we had.

    And I, in my inimitable fashion, packed about 30 days of sightseeing into those 10 days, completely exhausting my travel companions.

    Once again, no surprise.

    I'll have more to write about various aspects of the trip subsequently, but here let me try to crudely summarize the things that struck me about the trip.

    • Rivers are incredibly important in Europe, much more so than here in America. Rivers provide transportation, drinking water, sewage disposal, electric power, food (fish), and form the boundaries between regions and nations. They do some of these things in America, too, but we aren't nearly as attached to our rivers as they are in Central Europe, where some of the great rivers of the world arise.
    • For centuries, castles helped people keep an eye on their rivers, and make sure that their neighbors were behaving as they should in the river valleys.
    • Trains are how you go places in Europe. Yes, you can fly, or you can drive, but if you CAN take a train, you should. And, if you can take a first class ticket on TGV, you absolutely, absolutely should. I have never had a more civilized travel experience than taking the TGV from Frankfurt to Strasbourg. (Though full credit to Lufthansa for being a much-better-than-ordinary airline. If you get a chance to travel Lufthansa, do it.)
    • To a life-long inhabitant of the American West, Central Europe is odd for having almost no animals. People live in Central Europe, nowadays; animals do not. BUT: storks!
    • France, of course, is the country that perfected that most beautiful of beverages: wine. While most of the attention to wine in France goes to Southern France, don't under-rate Alsace, for they have absolutely delicious wines of many types, and have been making wine for (at least) 2,000 years. We Californians may think we know something about wine; we don't.
    • The visible history of the Upper Rhine Valley is deeply formed by the Franks. Don't try to understand the cathedrals, villages, cities, etc. without spending some time thinking about Charlemagne, etc. And, if you were like me and rather snored through this part of your schooling, prepare to have your eyes opened.
    • The other major history of the Upper Rhine Valley involves wars. My, but this part of the world has been fought over for a long time. Most recently, of course, we can distinguish these major events:
      1. The Franco-Prussian war, which unified Germany and resulted in Alsace being a German territory
      2. World War One
      3. World War Two
      Although the most recent of these events is now 75 years in the past, the centuries and centuries of conflict over who should rule these wonderful lands has left its mark, deeply.

      So often through my visit I thought to myself: "Am I in French Germany? Or perhaps is this German France?" Just trying to form and phrase these questions in my head, I realized how little I knew, and how much there is to learn, about how people form their bonds with their land, and their neighbors, and their thoughts. Language, food, customs, politics, literature: it's all complex and it's all one beautiful whole.

      This, after all, is the land where Johannes Gutenberg invented the printing press, where people like Johann Wolfgang von Goethe, Louis Pasteur, John Calvin, and Albert Schweitzer lived and did their greatest work.

    I could, of course, have been much terser:

    1. The Upper Rhine Valley is one of the most beautiful places on the planet. The people who live there are very warm and welcoming, and it is a delightful place to take a vacation
    2. Early May is an absolutely superb time to go there.

    I'll write more later, as I find time.

    0 0

    Two security advisories were recently issued for Apache CXF Fediz. In addition to fixing these issues, the recent releases of Fediz impose tighter security constraints in some areas by default compared to older releases. In this post I will document the advisories and the other security-related changes in the recent Fediz releases.

    1) Security Advisories

    The first security advisory is CVE-2017-7661: "The Apache CXF Fediz Jetty and Spring plugins are vulnerable to CSRF attacks.". Essentially, both the Jetty 8/9 and Spring Security 2/3 plugins are subject to a CSRF-style vulnerability when the user doesn't complete the authentication process. In addition, the Jetty plugins are vulnerable even if the user does first complete the authentication process, but only the root context is available as part of this attack.

    The second advisory is CVE-2017-7662: "The Apache CXF Fediz OIDC Client Registration Service is vulnerable to CSRF attacks". The OIDC client registration service is a simple web application that allows the creation of clients for OpenId Connect, as well as a number of other administrative tasks. It is vulnerable to CSRF attacks, where a malicious application could take advantage of an existing session to make changes to the OpenId Connect clients that are stored in the IdP.

    2) Fediz IdP security constraints

    This section only concerns the WS-Federation (and SAML-SSO) IdP in Fediz. The WS-Federation RP application sends its address via the 'wreply' parameter to the IdP. For SAML SSO, the address to reply to is taken from the consumer service URL of the SAML SSO Request. Previously, the Apache CXF Fediz IdP contained an optional 'passiveRequestorEndpointConstraint' configuration value in the 'ApplicationEntity', which allows the admin to specify a regular expression constraint on the 'wreply' URL.

    From Fediz 1.4.0, 1.3.2 and 1.2.4, a new configuration option is available in the 'ApplicationEntity' called 'passiveRequestorEndpoint'. If specified, this is directly matched against the 'wreply' parameter. In a change that breaks backwards compatibility, but that is necessary for security reasons, one of 'passiveRequestorEndpointConstraint' or 'passiveRequestorEndpoint must be specified in the 'ApplicationEntity' configuration. This ensures that the user cannot be redirected to a malicious client. Similarly, new configuration options are available called 'logoutEndpoint' and 'logoutEndpointConstraint' which validate the 'wreply' parameter in the case of redirecting the user after logging out, one of which must be specified.

    3) Fediz RP security constraints

    This section only concerns the WS-Federation RP plugins available in Fediz. When the user tries to log out of the Fediz RP application, a 'wreply' parameter can be specified to give the address that the Fediz IdP can redirect to after logout is complete. The old functionality was that if 'wreply' was not specified, then the RP plugin instead used the value from the 'logoutRedirectTo' configuration parameter.

    From Fediz 1.4.0, 1.3.2 and 1.2.4, a new configuration option is available called 'logoutRedirectToConstraint'. If a 'wreply' parameter is presented, then it must match the regular expression that is specified for 'logoutRedirectToConstraint', otherwise the 'wreply' value is ignored and it falls back to 'logoutRedirectTo'. 

    0 0

    It looks like the Russians interfered with the US elections, not just from the alleged publishing of the stolen emails, or through the alleged close links with the Trump campaign, but in the social networks, creating astroturfed campaigns and repeating the messages the country deemed important.

    Now the UK is having an election. And no doubt the bots will be out. But if the Russians can do bots: so can I.

    This then, is @dissidentbot.

    Dissident bot is a Raspbery Pi running a 350 line ruby script tasked with heckling politicans
    unrelated comments seem to work, if timely
    It offers:

    • The ability to listen to tweets from a number of sources: currently a few UK politicians
    • To respond pick a random responses from a set of replies written explicitly for each one
    • To tweet the reply after a 20-60s sleep.
    • Admin CLI over Twitter Direct Messaging
    • Live update of response sets via github.
    • Live add/remove of new targets (just follow/unfollow from the twitter UI)
    • Ability to assign a probability of replying, 0-100
    • Random response to anyone tweeting about it when that is not a reply (disabled due to issues)
    • Good PUE numbers, being powered off the USB port of the wifi base station, SSD storage and fanless naturally cooled DC. Oh, and we're generating a lot of solar right now, so zero-CO2 for half the day.
    It's the first Ruby script of more than ten lines I've ever written; interesting experience, and I've now got three chapters into a copy of the Pickaxe Book I've had sitting unloved alongside "ML for the working programmer".  It's nice to be able to develop just by saving the file & reloading it in the interpreter...not done that since I was Prolog programming. Refreshing.
    Strong and Stable my arse
    Without type checking its easy to ship code that's broken. I know, that's what tests are meant to find, but as this all depends on the live twitter APIs, it'd take effort, including maybe some split between Model and Control. Instead: broken the code into little methods I can run in the CLI.

    As usual, the real problems surface once you go live:
    1. The bot kept failing overnight; nothing in the logs. Cause: its powered by the router and DD-WRT was set to reboot every night. Fix: disable.
    2. It's "reply to any reference which isn't a reply itself" doesn't work right. I think it's partly RT related, but not fully tracked it down.
    3. Although it can do a live update of the dissident.rb script, it's not yet restarting: I need to ssh in for that.
    4. I've been testing it by tweeting things myself, so I've been having to tweet random things during testing.
    5. Had to add handling of twitter blocking from too many API calls. Again: sleep a bit before retries.
    6. It's been blocked by the conservative party. That was because they've been tweeting 2-4 times/hour, and dissidentbot originally didn't have any jitter/sleep. After 24h of replying with 5s of their tweets, it's blocked.
    The loopback code is the most annoying bug; nothing too serious though.

    The DM CLI is nice, the fact that I haven't got live restart something which interferes with the workflow.
      Dissidentbot CLI via Twitter DM
    Because the Pi is behind the firewall, I've no off-prem SSH access.

    The fact the conservatives have blocked me, that's just amusing. I'll need another account.

    One of the most amusing things is people argue with the bot. Even with "bot" in the name, a profile saying "a raspberry pi", people argue.
    Arguing with Bots and losing

    Overall the big barrier is content.  It turns out that you don't need to do anything clever about string matching to select the right tweet: random heckles seems to blend in. That's probably a metric of political debate in social media: a 350 line ruby script tweeting random phrases from a limited set is indistinguishable from humans.

    I will accept Pull Requests of new content. Also: people are free to deploy their own copies. without the self.txt file it won't reply to any random mentions, just listen to its followed accounts and reply to those with a matching file in the data dir.

    If the Russians can do it, so can we.

    0 0

    The board design went off to PCBWay via web browser and 5 days later 5 boards arrived by DHL from China. The whole process was unbelievably smooth. This was the first time I had ordered boards using the output of KiCad so I was impressed with both KiCad and PCBWay. The boards were simple, being 2 layer, but complex being large with some areas needing to carry high amps. So how did I do ?

    I made 1 mistake on the footprints. The 2 terminal connectors for the 600v ultrasound output didn’t have pads on both sides. This didn’t matter as being through hole the connectors soldered ok. Other than that PCBWay did exactly what I had instructed them to. Even the Arduino Mega footprint fitted perfectly.

    How did it perform ?

    Once populated the board initially appeared to perform well. Random frequency from 20KHz to 150KHz worked. The drive waveform from the Mostfet drivers into the Mosfet was near perfect with no high frequency ringing on the edges with levels going from 0-12v and back in much less than 1us. However I noticed some problems with the PWM control. There was none. With PWM pulses at 10% the MOSFETS would turn on for 90% of the time and drive a wildly resonant waverform through the coil. Rather like a little hammer hitting a bit pendulum and having it feedback into resonance. On further investigation the scope showed that when the Mosfet tried to switch off the inductor carried on producing a flyback voltage causing the MostFet to continue conducting till the opposing mosfet turned on. Initially I thought this was ringing, but it turned out a simple pair of 1A high frequency Schottky diodes across each winding of the primary coil returned the energy to the the 12V line eliminating the fly back. Now I had ringing, at 10MHz, but control over the power output via a digital pot. I could leave it at that, but this 10MHz would probably transmit and cause problems with other equipment on the boat.

    I think the difference between the red and blue signals is due to slightly different track lengths on each Mosfet. The shorter track not ringing nearly as much shown in the blue signal. The longer track with more capacitance ringing more and inducing a parasitic ring in the blue track. To eliminate this 2 things were done. Traditional Snubber RC networks had little or no impact. So a 100nF cap as close as possible to the Drain and Source on each Mosfet (RPF50N6) eliminated some of the high frequency, and a 100uF cap on the center tap to store the energy returned to the 12V line by flyback. This reduced the peak current.

    There is still some ringing, but now the frequency is less and it is less violent. The ripple on the 12V line is now less than 0.2v and filtered out by decoupling caps on the supply pins to the Ardiono Mega. All of these modifications have been accommodated on the underside of the board.

    The board now produces 60W per transducer between 20 and 150 KHz at 50% PWM drawing 5A from the supply. This is very loud on my desk and far louder than the Ultrasound Antifouling installed in Isador, which seems to work. I will need to implement a control program that balances power consumption against noise levels against effectiveness, but that is all software. There are sensors on board for temperature, current and voltage so it should be possible to have the code adapt to its environment.

    Board Layout mistakes

    Apart from the circuit errors, I made some mistakes in the MoSFET power connections. Rev2 of the board will have the MosFETS placed as close to the primary of the transformer with identical track lengths. Hopefully this will eliminate the ringing seen on the red trace and made both line the blue trace.

    I have 4 spare unpopulated PCBs. If I do a rev2 board, I will use PCBWay again. Their boards were perfect, all the mistakes were mine.



    0 0

    0 0

    Auto bootstrapping is a handy feature when it comes to growing an Apache Cassandra cluster. There are some unknowns about how this feature works which can lead to data inconsistencies in the cluster. In this post I will go through a bit about the history of the feature, the different knobs and levers available to operate it, and resolving some of the common issues that may arise.


    Here are links to the various sections of the post to give you an idea of what I will cover.


    The bootstrap feature in Apache Cassandra controls the ability for the data in cluster to be automatically redistributed when a new node is inserted. The new node joining the cluster is defined as an empty node without system tables or data.

    When a new node joins the cluster using the auto bootstrap feature, it will perform the following operations

    • Contact the seed nodes to learn about gossip state.
    • Transition to Up and Joining state (to indicate it is joining the cluster; represented by UJ in the nodetool status).
    • Contact the seed nodes to ensure schema agreement.
    • Calculate the tokens that it will become responsible for.
    • Stream replica data associated with the tokens it is responsible for from the former owners.
    • Transition to Up and Normal state once streaming is complete (to indicate it is now part of the cluster; represented by UN in the nodetool status).

    The above operations can be seen in the logs.

    Contact the seed nodes to learn about gossip state

    INFO  [HANDSHAKE-/] 2017-05-12 16:14:45,290 - Handshaking version with /
    INFO  [GossipStage:1] 2017-05-12 16:14:45,318 - Node / is now part of the cluster
    INFO  [GossipStage:1] 2017-05-12 16:14:45,325 - Node / is now part of the cluster
    INFO  [GossipStage:1] 2017-05-12 16:14:45,326 - Node / is now part of the cluster
    INFO  [GossipStage:1] 2017-05-12 16:14:45,328 - Node / is now part of the cluster
    INFO  [SharedPool-Worker-1] 2017-05-12 16:14:45,331 - InetAddress / is now UP
    INFO  [HANDSHAKE-/] 2017-05-12 16:14:45,331 - Handshaking version with /
    INFO  [HANDSHAKE-/] 2017-05-12 16:14:45,383 - Handshaking version with /
    INFO  [HANDSHAKE-/] 2017-05-12 16:14:45,387 - Handshaking version with /
    INFO  [SharedPool-Worker-1] 2017-05-12 16:14:45,438 - InetAddress / is now UP
    INFO  [SharedPool-Worker-2] 2017-05-12 16:14:45,438 - InetAddress / is now UP
    INFO  [SharedPool-Worker-3] 2017-05-12 16:14:45,438 - InetAddress / is now UP
    INFO  [main] 2017-05-12 16:14:46,289 - Starting up server gossip

    Transition to Up and Joining state

    INFO  [main] 2017-05-12 16:14:46,396 - JOINING: waiting for ring information

    Contact the seed nodes to ensure schema agreement

    Take note of the last entry in this log snippet.

    INFO  [GossipStage:1] 2017-05-12 16:14:49,081 - Node / is now part of the cluster
    INFO  [SharedPool-Worker-1] 2017-05-12 16:14:49,082 - InetAddress / is now UP
    INFO  [GossipStage:1] 2017-05-12 16:14:49,095 - Updating topology for /
    INFO  [GossipStage:1] 2017-05-12 16:14:49,096 - Updating topology for /
    INFO  [HANDSHAKE-/] 2017-05-12 16:14:49,096 - Handshaking version with /
    INFO  [GossipStage:1] 2017-05-12 16:14:49,098 - Node / is now part of the cluster
    INFO  [SharedPool-Worker-1] 2017-05-12 16:14:49,102 - InetAddress / is now UP
    INFO  [GossipStage:1] 2017-05-12 16:14:49,103 - Updating topology for /
    INFO  [HANDSHAKE-/] 2017-05-12 16:14:49,104 - Handshaking version with /
    INFO  [GossipStage:1] 2017-05-12 16:14:49,104 - Updating topology for /
    INFO  [GossipStage:1] 2017-05-12 16:14:49,106 - Node / is now part of the cluster
    INFO  [SharedPool-Worker-1] 2017-05-12 16:14:49,111 - InetAddress / is now UP
    INFO  [GossipStage:1] 2017-05-12 16:14:49,112 - Updating topology for /
    INFO  [HANDSHAKE-/] 2017-05-12 16:14:49,195 - Handshaking version with /
    INFO  [GossipStage:1] 2017-05-12 16:14:49,236 - Updating topology for /
    INFO  [GossipStage:1] 2017-05-12 16:14:49,247 - Node / is now part of the cluster
    INFO  [SharedPool-Worker-1] 2017-05-12 16:14:49,248 - InetAddress / is now UP
    INFO  [InternalResponseStage:1] 2017-05-12 16:14:49,252 - Enqueuing flush of schema_keyspaces: 1444 (0%) on-heap, 0 (0%) off-heap
    INFO  [MemtableFlushWriter:2] 2017-05-12 16:14:49,254 - Writing Memtable-schema_keyspaces@1493033009(0.403KiB serialized bytes, 10 ops, 0%/0% of on/off-heap limit)
    INFO  [MemtableFlushWriter:2] 2017-05-12 16:14:49,256 - Completed flushing .../node5/data0/system/schema_keyspaces-b0f2235744583cdb9631c43e59ce3676/system-schema_keyspaces-tmp-ka-1-Data.db (0.000KiB)for commitlog position ReplayPosition(segmentId=1494569684606, position=119856)
    INFO  [InternalResponseStage:1] 2017-05-12 16:14:49,367 - Enqueuing flush of schema_columnfamilies: 120419 (0%) on-heap, 0 (0%) off-heap
    INFO  [MemtableFlushWriter:1] 2017-05-12 16:14:49,368 - Writing Memtable-schema_columnfamilies@1679976057(31.173KiB serialized bytes, 541 ops, 0%/0% of on/off-heap limit)
    INFO  [MemtableFlushWriter:1] 2017-05-12 16:14:49,396 - Completed flushing .../node5/data0/system/schema_columnfamilies-45f5b36024bc3f83a3631034ea4fa697/system-schema_columnfamilies-tmp-ka-1-Data.db (0.000KiB)for commitlog position ReplayPosition(segmentId=1494569684606, position=119856)
    INFO  [InternalResponseStage:5] 2017-05-12 16:14:50,824 - Enqueuing flush of schema_usertypes: 160 (0%) on-heap, 0 (0%) off-heap
    INFO  [MemtableFlushWriter:2] 2017-05-12 16:14:50,824 - Writing Memtable-schema_usertypes@1946148009(0.008KiB serialized bytes, 1 ops, 0%/0% of on/off-heap limit)
    INFO  [MemtableFlushWriter:2] 2017-05-12 16:14:50,826 - Completed flushing .../node5/data0/system/schema_usertypes-3aa752254f82350b8d5c430fa221fa0a/system-schema_usertypes-tmp-ka-10-Data.db (0.000KiB)for commitlog position ReplayPosition(segmentId=1494569684606, position=252372)
    INFO  [main] 2017-05-12 16:14:50,404 - JOINING: schema complete, ready to bootstrap

    Calculate the tokens that it will become responsible for

    INFO  [main] 2017-05-12 16:14:50,404 - JOINING: waiting for pending range calculation
    INFO  [main] 2017-05-12 16:14:50,404 - JOINING: calculation complete, ready to bootstrap
    INFO  [main] 2017-05-12 16:14:50,405 - JOINING: getting bootstrap token

    Stream replica data associated with the tokens it is responsible for from the former owners

    Take note of the first and last entries in this log snippet.

    INFO  [main] 2017-05-12 16:15:20,440 - JOINING: Starting to bootstrap...
    INFO  [main] 2017-05-12 16:15:20,461 - [Stream #604b5690-36da-11e7-aeb6-9d89ad20c2d3] Executing streaming plan for Bootstrap
    INFO  [StreamConnectionEstablisher:1] 2017-05-12 16:15:20,462 - [Stream #604b5690-36da-11e7-aeb6-9d89ad20c2d3] Starting streaming to /
    INFO  [StreamConnectionEstablisher:2] 2017-05-12 16:15:20,462 - [Stream #604b5690-36da-11e7-aeb6-9d89ad20c2d3] Starting streaming to /
    INFO  [StreamConnectionEstablisher:3] 2017-05-12 16:15:20,462 - [Stream #604b5690-36da-11e7-aeb6-9d89ad20c2d3] Starting streaming to /
    INFO  [StreamConnectionEstablisher:1] 2017-05-12 16:15:20,478 - [Stream #604b5690-36da-11e7-aeb6-9d89ad20c2d3, ID#0] Beginning stream session with /
    INFO  [StreamConnectionEstablisher:2] 2017-05-12 16:15:20,478 - [Stream #604b5690-36da-11e7-aeb6-9d89ad20c2d3, ID#0] Beginning stream session with /
    INFO  [StreamConnectionEstablisher:3] 2017-05-12 16:15:20,478 - [Stream #604b5690-36da-11e7-aeb6-9d89ad20c2d3, ID#0] Beginning stream session with /
    INFO  [STREAM-IN-/] 2017-05-12 16:15:24,339 - [Stream #604b5690-36da-11e7-aeb6-9d89ad20c2d3 ID#0] Prepare completed. Receiving 11 files(10176549820 bytes), sending 0 files(0 bytes)
    INFO  [STREAM-IN-/] 2017-05-12 16:15:27,201 - [Stream #604b5690-36da-11e7-aeb6-9d89ad20c2d3] Session with / is complete
    INFO  [STREAM-IN-/] 2017-05-12 16:15:33,256 - [Stream #604b5690-36da-11e7-aeb6-9d89ad20c2d3] Session with / is complete
    INFO  [StreamReceiveTask:1] 2017-05-12 16:36:31,249 - [Stream #604b5690-36da-11e7-aeb6-9d89ad20c2d3] Session with / is complete
    INFO  [StreamReceiveTask:1] 2017-05-12 16:36:31,256 - [Stream #604b5690-36da-11e7-aeb6-9d89ad20c2d3] All sessions completed
    INFO  [main] 2017-05-12 16:36:31,257 - Bootstrap completed! for the tokens [1577102245397509090, -713021257351906154, 5943548853755748481, -186427637333122985, 89474807595263595, -3872409873927530770, 269282297308186556, -2090619435347582830, -7442271648674805532, 1993467991047389706, 3250292341615557960, 3680244045045170206, -6121195565829299067, 2336819841643904893, 8366041580813128754, -1539294702421999531, 5559860204752248078, 4990559483982320587, -5978802488822380342, 7738662906313460122, -8543589077123834538, 8470022885937685086, 7921538168239180973, 5167628632246463806, -8217637230111416952, 7867074371397881074, -6728907721317936873, -5403440910106158938, 417632467923200524, -5024952230859509916, -2145251677903377866, 62038536271402824]

    Transition to Up and Normal state once streaming is complete

    INFO  [main] 2017-05-12 16:36:31,348 - Node / state jump to NORMAL

    During the bootstrapping process, the new node joining the cluster has no effect on the existing data in terms of Replication Factor (RF). However, the new node will accept new writes for the token ranges acquired while existing data from the other nodes is being streamed to it. This ensures that no new writes are missed while data changes hands. In addition, it ensures that Consistency Level (CL) is respected all the time during the streaming process and even in the case of bootstrap failure. Once the bootstrapping process for the new node completes, it will begin to serve read requests (and continue to receive writes). Like the pre-existing nodes in the cluster, it too will then have an effect on the data in terms of RF and CL.

    While the bootstrapping feature can be a time saver when expanding a cluster, there are some “gotchas” that are worth noting. But before we do, we need first revisit some basics.

    Back to basics

    Cassandra uses a token system to work out which nodes will hold which partition keys for the primary replica of data. To work out where data is stored in the cluster, Cassandra will first apply a hashing function to the partition key. The generated hash is then used to calculate a token value using an algorithm; most commonly Murmur3 or RandomPartitioner.

    As seen from the log snippets, when a new node is added to the cluster it will calculate the tokens of the different data replicas that is to be responsible for. This process where tokens are calculated and acquired by the new node is often referred to as a range movement. i.e. token ranges are being moved between nodes. Once the range movement has completed, the node will by default begin the bootstrapping process where it streams data for the acquired tokens from other nodes.


    Range movements

    Whilst range movements may sound simple, the process can create implications with maintaining data consistency. A number of patches have been added over time to help maintain data consistency during range movements. A fairly well known issue was CASSANDRA-2434 where it was highlighted that range movements violated consistency for Apache Cassandra versions below 2.1.x using vnodes.

    A fix was added for the issue CASSANDRA-2434 to ensure range movements between nodes were consistent when using vnodes. Prior to this patch inconsistencies could be caused during bootstrapping as per the example Jeff Jirsa gave on the dev mailing list.

    Consider the case of a cluster containing three nodes A, B and D with a RF of 3. If node B was offline and a key ‘foo’ was written with CL of QUORUM, the value for key ‘foo’ would go to nodes A and D.

    At a later point in time node B is resurrected and added back into the cluster. Around the same time a node C is added to the cluster and begins bootstrapping.

    One of the tokens node C calculates and acquires during the bootstrap process is for key ‘foo’. Node B is the closest node with data for the newly acquired token and thus node C begins streaming from the neighbouring node B. This process violates the consistency guarantees of Cassandra. This is because the data on node C will be the same as node B, and both are missing the value for key ‘foo’.

    Thus, a query with a CL of QUORUM may query nodes B and C and return no data which is incorrect, despite there being data for ‘foo’ on node A. Node D previously had the correct data, but it stopped being a replica after C was inserted into the cluster.

    The above issue was solved in CASSANDRA-2434 by changing the default behaviour to always trying to perform a consistent range movement. That is, when node C is added (in the previous example), data is streamed from the correct replica it is replacing, node D. In this case all queries with CL of QUORUM for the key ‘foo’ would always return the correct value.

    The JVM option cassandra.consistent.rangemovement was added as part of this patch. The option allows consistent range movements during bootstrapping to be disabled should the user desire this behaviour. This fix is no silver bullet though, because it requires that the correct node be available for a consistent range moment during a bootstrap. This may not always be possible, and in such cases there are two options:

    1. Get the required node back online (preferred option).
    2. If the required node is unrecoverable, set JVM_OPTS="$JVM_OPTS -Dcassandra.consistent.rangemovement=false" in the file to perform inconsistent range movements when auto bootstrapping. Once bootstrapping is complete, a repair will need to be run using the following command on the node. This is to ensure the data it streamed is consistent with the rest of the replicas.
    nodetool repair -full

    Adding multiple nodes

    Another common cause of grief for users was bootstrapping multiple node simultaneously; captured in CASSANDRA-7069. Adding two new nodes simultaneously to a cluster could potentially be harmful, given the operations performed by a new node when joining. Waiting two minutes for the gossip state to propagate before adding a new node is possible, however as noted in CASSANDRA-9667, there is no coordination between nodes during token selection. For example consider that case if Node A was bootstrapped, then two minutes later Node B was bootstrapped. Node B could potentially pick token ranges already selected by Node A.

    The above issue was solved in CASSANDRA-7069 by changing the default behaviour such that adding a node would fail if another node was already bootstrapping in a cluster. Similar to CASSANDRA-2434, this behaviour could be disabled by setting the JVM option JVM_OPTS="$JVM_OPTS -Dcassandra.consistent.rangemovement=false" in the file on the bootstrapping node. This means that if cassandra.consistent.rangemovement=false is set to allow multiple nodes to bootstrap, the cluster runs the risk of violating consistency guarantees because of CASSANDRA-2434.

    Changes made by CASSANDRA-7069 mean that the default behaviour forces a user to add a single node at a time to expand the cluster. This is the safest way of adding nodes to expand a cluster and ensure that the correct amount of data is streamed between nodes.

    Data streaming

    To further add to the confusion there is a misconception about what the auto_bootstrap property does in relation to a node being added to the cluster. Despite its name, this property controls the data streaming step only in the bootstrap process. The boolean property is by default set to true. When set to true, the data streaming step will be performed during the bootstrap process.

    Setting auto_bootstrap to false when bootstrapping a new node exposes the cluster to huge inconsistencies. This is because all the other steps in the process are carried out but no data is streamed to the node. Hence, the node would be in the UN state without having any data for the token ranges it has been allocated! Furthermore, the new node without data will be serving reads and nodes that previously owned the tokens will no longer be serving reads. Effectively, the token ranges for that replica would be replaced with no data.

    It is worth noting that the other danger to using auto_bootstrap set to false is no IP address collision check occurs. As per CASSANDRA-10134, if a new node has auto_bootstrap set to false and has the same address as an existing down node, the new node will take over the token range of the old node. No error is thrown, only a warning messages such as the following one below is written to the logs of the other nodes in the cluster. At the time of writing this post, the fix for this issue only appears in Apache Cassandra version 3.6 and above.

    WARN  [GossipStage:1] 2017-05-19 17:35:10,994 - Changing /'s host ID from 1938db5d-5f23-46e8-921c-edde18e9c829 to c30fbbb8-07ae-412c-baea-90865856104e

    The behaviour of auto_bootstrap: false can lead to data inconsistencies in the following way. Consider the case of a cluster containing three nodes A, B and D with a RF of 3. If node B was offline and a key ‘foo’ was written with CL of QUORUM, the value for key ‘foo’ would go to nodes A and D. In this scenario Node D is the owner of the token relating to the key ‘foo’.

    At a later point in time node B is resurrected and added back into the cluster. Around the same time a node C is added to the cluster with auto_bootstrap set to false and begins the joining process.

    One of the tokens node C calculates and acquires during the bootstrap process is for key ‘foo’. Now node D is no longer the owner and hence its data for the key ‘foo’ will no longer be used during reads/writes. This process causes inconsistencies in Cassandra because both nodes B and C contain no data for key ‘foo’.

    Thus, a query with a CL of QUORUM may query nodes B and C and return no data which is incorrect, despite there being data for ‘foo’ on node A. Node D previously had data, but it stopped being a replica after C was inserted.

    This confusing behaviour is one of the reasons why if you look into the cassandra.yaml file you will notice that the auto_bootstrap configuration property is missing. Exposure of the property in the cassandra.yaml was short lived, as it was removed via CASSANDRA-2447 in version 1.0.0. As a result, the property is hidden and its default value of true means that new nodes will stream data when they join the cluster.

    Adding a replacement node

    So far we have examined various options that control the bootstrapping default behaviour when a new node is added to a cluster. Adding a new node is just one case where bootstrapping is performed, what about the case of replacing a node in the cluster if one goes down?

    Should an existing node go down and needs to be replaced, the JVM option cassandra.replace_address can be used. Note that this option is only available for Apache Cassandra versions 2.x.x and higher. This feature has been around for a while and blogged about by other users in the past.

    As the name suggests, it effectively replaces a down or dead node in the cluster with a new node. It is because of this that replace address option should only be used if the node is in a Down and Normal state (represented by DN in the nodetool status). Furthermore, there are no range movements that occur when using this feature, the new replacement node will simply inherit the old dead node’s token ranges. This is simpler than decommissioning the dead node and bootstrapping a fresh one, which would involve two range movements and two streaming phases. Yuck! To use the option, simply add JVM_OPTS="$JVM_OPTS -Dcassandra.replace_address=<IP_ADDRESS>" to the file of the new node that will be replacing the old node. Where <IP_ADDRESS> is the IP address of the node to be replaced.

    Once the node completes bootstrapping and joins the cluster, the JVM_OPTS="$JVM_OPTS -Dcassandra.replace_address=<IP_ADDRESS>" option must be removed from the file or the node will fail to start on a restart. This is a short coming of the cassandra.replace_address feature. Many operators will typically be worried about a dead node being replaced and as a result forget to update the file after the job is complete. It was for this reason that CASSANDRA-7356 was raised and resulted in a new option being added; cassandra.replace_address_first_boot. This option works once when Cassandra is first started and the replacement node inserted into the cluster. After that, the option is ignored for all subsequent restarts. It works in the same way as its predecessor; simply add JVM_OPTS="$JVM_OPTS -Dcassandra.replace_address_first_boot=<IP_ADDRESS>" to the and the new node is ready to be inserted.

    Hang on! What about adding a replacement a seed node?

    Ok, so you need to replace a seed node. Seed nodes are just like every other node in the cluster. As per the Apache Cassandra documentation, the only difference being seed nodes are the go to node when a new node joins the cluster.

    There are a few extra steps to replace a seed node and bootstrap a new one in its place. Before adding the replacement seed node, the IP address of the seed node will need to be removed from the seed_provider list in the cassandra.yaml file and replaced with another node in the cluster. This needs to be done for all the nodes in the cluster. Naturally, a rolling restart will need to be performed for the changes to take effect. Once the change is complete the replacement node can be inserted as described in the previous section of this post.

    What to do after it completes successfully

    Once your node has successfully completed the bootstrapping process, it will transition to Up and Normal state (represented by UN in the nodetool status) to indicate it is now part of the cluster. At this point it is time to cleanup the nodes on your cluster. Yes, your nodes are dirty and need to be cleaned. “Why?” you ask, well the reason is the data that has been acquired by the newly added node still remains on the nodes that previously owned it. Whilst the nodes that previously owned the data have streamed it to the new node and relinquished the associated tokens, the data that was streamed still remains on the original nodes. This “orphaned” data is consuming valuable disk space, and in the cases large data sets; probably consuming a significant amount.

    However, before running off to the console to remove the orphaned data from the nodes, make sure it is done as a last step in a cluster expansion. If the expansion of the cluster requires only one node to be added, perform the cleanup after the node has successfully completed bootstrapping and joined the cluster. If the expansion requires three nodes to be added, perform the cleanup after all three nodes have successfully completed bootstrapping and joined the cluster. This is because the cleanup will need to be executed on all nodes in the cluster, except for the last node that was added to the cluster. The last node added to the cluster will contain only the data it needed for the tokens acquired, where as other nodes may contain data for tokens they no longer have. It is still ok to run cleanup on the last node, it will likely return immediately after it is called.

    The cleanup can be executed on each node using the following command.

    nodetool cleanup -j <COMPACTION_SLOTS>

    Where <COMPACTION_SLOTS> is the number of compaction slots to use for cleanup. By default this is 2. If set to 0 it will use use all available compaction threads.

    It is probably worth limiting the number of compaction slots used by cleanup otherwise it could potentially block compactions.

    Help! It failed

    The bootstrap process for a joining node can fail. Bootstrapping will put extra load on the network so should bootstrap fail, you could try tweaking the streaming_socket_timeout_in_ms. Set streaming_socket_timeout_in_ms in the cassandra.yaml file to 24 hours (60 * 60 * 24 * 1000 = 86,400,000ms). Having a socket timeout set is crucial for catching streams that hang and reporting them via an exception in the logs as per CASSANDRA-11286.

    If the bootstrap process fails in Cassandra version 2.1.x, the process will need to be restarted all over again. This can be done using the following steps.

    1. Stop Cassandra on the node.
    2. Delete all files and directories from the data, commitlog and save_cache directories but leave the directories there.
    3. Wait about two minutes.
    4. Start Cassandra on the node.

    If the bootstrap process fails in Cassandra 2.2.x, the process can be easily be resumed using the following command thanks to CASSANDRA-8942.

    nodetool bootstrap resume

    Testing the theory

    We have gone through a lot of theory in this post, so I thought it would be good to test some of it out to demonstrate what can happen when bootstrapping multiple nodes at the same time.


    In my test I used a three node local cluster running Apache Cassandra 2.1.14 which was created with the ccm tool. Each node was configured to use vnodes; specifically num_tokens was set to 32 in the cassandra.yaml file. The cluster was loaded with around 20 GB of data generated from the killrweather dataset. Data loading was performed in batches using cdm. Prior to starting the test the cluster looked like this.

    $ ccm node1 nodetool status
    Datacenter: datacenter1
    |/ State=Normal/Leaving/Joining/Moving
    --  Address    Load       Tokens  Owns (effective)  Host ID                               Rack
    UN  19.19 GB   32      29.1%             cfb50e13-52a4-4821-bca2-4dba6061d38a  rack1
    UN  9.55 GB    32      37.4%             5176598f-bbab-4165-8130-e33e39017f7e  rack1
    UN  19.22 GB   32      33.5%             d261faaf-628f-4b86-b60b-3825ed552aba  rack1

    It was not the most well balanced cluster, however it was good enough for testing. It should be noted that the node with IP address was set to be the only seed node in the cluster. Taking a quick peak at the keyspace configuration in using CQLSH and we can see that it was using replication_factor: 1 i.e. RF = 1.

    cqlsh> describe killrweather
    CREATE KEYSPACE killrweather WITH replication ={'class': 'SimpleStrategy', 'replication_factor': '1'}  AND durable_writes =true;

    Adding a new node

    A new node (node4) was added to the cluster.

    $ ccm node4 start

    After a minute or so node4 was in the UJ state and began the bootstrap process.

    $ ccm node1 nodetool status
    Datacenter: datacenter1
    |/ State=Normal/Leaving/Joining/Moving
    --  Address    Load       Tokens  Owns (effective)  Host ID                               Rack
    UN  19.19 GB   32      29.1%             cfb50e13-52a4-4821-bca2-4dba6061d38a  rack1
    UN  9.55 GB    32      37.4%             5176598f-bbab-4165-8130-e33e39017f7e  rack1
    UN  19.22 GB   32      33.5%             d261faaf-628f-4b86-b60b-3825ed552aba  rack1
    UJ  14.44 KB   32      ?                 ae0a26a6-fab5-4cab-a189-697818be3c95  rack1

    It was observed that node4 had started streaming data from node1 (IP address and node2 (IP address

    $ ccm node4 nodetool netstats
    Mode: JOINING
    Bootstrap f4e54a00-36d9-11e7-b18e-9d89ad20c2d3
            Receiving 9 files, 10258729018 bytes total. Already received 2 files, 459059994 bytes total
                .../node4/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-tmp-ka-3-Data.db 452316846/452316846 bytes(100%) received from idx:0/
                .../node4/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-tmp-ka-2-Data.db 6743148/6743148 bytes(100%) received from idx:0/
            Receiving 11 files, 10176549820 bytes total. Already received 1 files, 55948069 bytes total
                .../node4/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-tmp-ka-1-Data.db 55948069/55948069 bytes(100%) received from idx:0/
    Read Repair Statistics:
    Attempted: 0
    Mismatch (Blocking): 0
    Mismatch (Background): 0
    Pool Name                    Active   Pending      Completed
    Commands                        n/a         0              6
    Responses                       n/a         0            471

    Adding another new node

    A few minutes later another new node (node5) was added to the cluster. To add this node to the cluster while node4 was bootstrapping the JVM option JVM_OPTS="$JVM_OPTS -Dcassandra.consistent.rangemovement=false" was added to the node’s file. The node was then started.

    $ ccm node5 start

    After about a minute node5 was in the UJ state and it too began the bootstrap process.

    $ ccm node1 nodetool status
    Datacenter: datacenter1
    |/ State=Normal/Leaving/Joining/Moving
    --  Address    Load       Tokens  Owns (effective)  Host ID                               Rack
    UN  19.19 GB   32      29.1%             cfb50e13-52a4-4821-bca2-4dba6061d38a  rack1
    UN  9.55 GB    32      37.4%             5176598f-bbab-4165-8130-e33e39017f7e  rack1
    UN  19.22 GB   32      33.5%             d261faaf-628f-4b86-b60b-3825ed552aba  rack1
    UJ  106.52 KB  32      ?                 ae0a26a6-fab5-4cab-a189-697818be3c95  rack1
    UJ  14.43 KB   32      ?                 a71ed178-f353-42ec-82c8-d2b03967753a  rack1

    It was observed that node5 had started streaming data from node2 as well; the same node that node4 was streaming data from.

    $ ccm node5 nodetool netstats
    Mode: JOINING
    Bootstrap 604b5690-36da-11e7-aeb6-9d89ad20c2d3
            Receiving 11 files, 10176549820 bytes total. Already received 1 files, 55948069 bytes total
                .../node5/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-tmp-ka-1-Data.db 55948069/55948069 bytes(100%) received from idx:0/
    Read Repair Statistics:
    Attempted: 0
    Mismatch (Blocking): 0
    Mismatch (Background): 0
    Pool Name                    Active   Pending      Completed
    Commands                        n/a         0              8
    Responses                       n/a         0            255

    The interesting point to note when looking at the netstats was that both node4 and node5 were each streaming a Data.db file exactly 55948069 bytes from node2.

    Data streaming much

    It had appeared that both node4 and node5 were streaming the same data from node2. This continued as the bootstrapping process progressed; the size of the files being streamed from node2 were the same for both node4 and node5. Checking the netstats on node4 produced the following.

    $ ccm node4 nodetool netstats
    Bootstrap f4e54a00-36d9-11e7-b18e-9d89ad20c2d3
            Receiving 9 files, 10258729018 bytes total. Already received 6 files, 10112487796 bytes total
                .../node4/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-tmp-ka-13-Data.db 1788940555/1788940555 bytes(100%) received from idx:0/
                .../node4/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-tmp-ka-5-Data.db 7384377358/7384377358 bytes(100%) received from idx:0/
                .../node4/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-tmp-ka-12-Data.db 27960312/27960312 bytes(100%) received from idx:0/
                .../node4/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-tmp-ka-3-Data.db 452316846/452316846 bytes(100%) received from idx:0/
                .../node4/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-tmp-ka-11-Data.db 452149577/452149577 bytes(100%) received from idx:0/
                .../node4/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-tmp-ka-2-Data.db 6743148/6743148 bytes(100%) received from idx:0/
            Receiving 11 files, 10176549820 bytes total. Already received 10 files, 10162463079 bytes total
                .../node4/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-tmp-ka-1-Data.db 55948069/55948069 bytes(100%) received from idx:0/
                .../node4/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-tmp-ka-9-Data.db 55590043/55590043 bytes(100%) received from idx:0/
                .../node4/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-tmp-ka-6-Data.db 901588743/901588743 bytes(100%) received from idx:0/
                .../node4/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-tmp-ka-15-Data.db 14081154/14081154 bytes(100%) received from idx:0/
                .../node4/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-tmp-ka-16-Data.db 1450179/1450179 bytes(100%) received from idx:0/
                .../node4/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-tmp-ka-8-Data.db 901334951/901334951 bytes(100%) received from idx:0/
                .../node4/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-tmp-ka-10-Data.db 3622476547/3622476547 bytes(100%) received from idx:0/
                .../node4/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-tmp-ka-17-Data.db 56277615/56277615 bytes(100%) received from idx:0/
                .../node4/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-tmp-ka-4-Data.db 3651310715/3651310715 bytes(100%) received from idx:0/
                .../node4/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-tmp-ka-7-Data.db 902405063/902405063 bytes(100%) received from idx:0/
    Read Repair Statistics:
    Attempted: 0
    Mismatch (Blocking): 0
    Mismatch (Background): 0
    Pool Name                    Active   Pending      Completed
    Commands                        n/a         0              6
    Responses                       n/a         0           4536

    Then checking netstats on node5 produced the following.

    $ ccm node5 nodetool netstats
    Mode: JOINING
    Bootstrap 604b5690-36da-11e7-aeb6-9d89ad20c2d3
            Receiving 11 files, 10176549820 bytes total. Already received 9 files, 10106185464 bytes total
                .../node5/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-tmp-ka-2-Data.db 3651310715/3651310715 bytes(100%) received from idx:0/
                .../node5/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-tmp-ka-1-Data.db 55948069/55948069 bytes(100%) received from idx:0/
                .../node5/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-tmp-ka-9-Data.db 1450179/1450179 bytes(100%) received from idx:0/
                .../node5/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-tmp-ka-3-Data.db 901588743/901588743 bytes(100%) received from idx:0/
                .../node5/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-tmp-ka-6-Data.db 55590043/55590043 bytes(100%) received from idx:0/
                .../node5/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-tmp-ka-4-Data.db 902405063/902405063 bytes(100%) received from idx:0/
                .../node5/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-tmp-ka-8-Data.db 14081154/14081154 bytes(100%) received from idx:0/
                .../node5/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-tmp-ka-5-Data.db 901334951/901334951 bytes(100%) received from idx:0/
                .../node5/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-tmp-ka-7-Data.db 3622476547/3622476547 bytes(100%) received from idx:0/
    Read Repair Statistics:
    Attempted: 0
    Mismatch (Blocking): 0
    Mismatch (Background): 0
    Pool Name                    Active   Pending      Completed
    Commands                        n/a         0              8
    Responses                       n/a         0           4383

    To be absolutely sure about what was being observed, I ran a command to order the netstats output by file size for both node4 and node5.

    $ for file_size in$(ccm node4 nodetool netstats  | grep '(100%)\ received' | grep '' | tr -s ' ' | cut -d' ' -f3 | cut -d'/' -f1 | sort -g); do ccm node4 nodetool netstats | grep ${file_size} | tr -s ' '; done
     .../node4/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-tmp-ka-16-Data.db 1450179/1450179 bytes(100%) received from idx:0/
     .../node4/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-tmp-ka-15-Data.db 14081154/14081154 bytes(100%) received from idx:0/
     .../node4/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-tmp-ka-9-Data.db 55590043/55590043 bytes(100%) received from idx:0/
     .../node4/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-tmp-ka-1-Data.db 55948069/55948069 bytes(100%) received from idx:0/
     .../node4/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-tmp-ka-17-Data.db 56277615/56277615 bytes(100%) received from idx:0/
     .../node4/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-tmp-ka-8-Data.db 901334951/901334951 bytes(100%) received from idx:0/
     .../node4/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-tmp-ka-6-Data.db 901588743/901588743 bytes(100%) received from idx:0/
     .../node4/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-tmp-ka-7-Data.db 902405063/902405063 bytes(100%) received from idx:0/
     .../node4/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-tmp-ka-10-Data.db 3622476547/3622476547 bytes(100%) received from idx:0/
     .../node4/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-tmp-ka-4-Data.db 3651310715/3651310715 bytes(100%) received from idx:0/
    $ for file_size in$(ccm node5 nodetool netstats  | grep '(100%)\ received' | grep '' | tr -s ' ' | cut -d' ' -f3 | cut -d'/' -f1 | sort -g); do ccm node5 nodetool netstats | grep ${file_size} | tr -s ' '; done
     .../node5/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-tmp-ka-9-Data.db 1450179/1450179 bytes(100%) received from idx:0/
     .../node5/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-tmp-ka-8-Data.db 14081154/14081154 bytes(100%) received from idx:0/
     .../node5/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-tmp-ka-6-Data.db 55590043/55590043 bytes(100%) received from idx:0/
     .../node5/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-tmp-ka-1-Data.db 55948069/55948069 bytes(100%) received from idx:0/
     .../node5/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-tmp-ka-5-Data.db 901334951/901334951 bytes(100%) received from idx:0/
     .../node5/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-tmp-ka-3-Data.db 901588743/901588743 bytes(100%) received from idx:0/
     .../node5/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-tmp-ka-4-Data.db 902405063/902405063 bytes(100%) received from idx:0/
     .../node5/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-tmp-ka-7-Data.db 3622476547/3622476547 bytes(100%) received from idx:0/
     .../node5/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-tmp-ka-2-Data.db 3651310715/3651310715 bytes(100%) received from idx:0/

    With the exception of one file being streamed by node4, killrweather-raw_weather_data-tmp-ka-17-Data.db (size 56277615 bytes), node4 and node5 looked to be streaming the same data from node2. This was the first confirmation that node5 had stolen the tokens that where originally calculated by node4. Furthermore, it looked like node 4 was performing unnecessary streaming from node2. I noted down the file sizes displayed by node5’s netstats output to help track down data files on each node.

    $ ccm node5 nodetool netstats | grep '(100%)\ received' | grep '' | tr -s ' ' | cut -d' ' -f3 | cut -d'/' -f1 | sort -g > file_sizes.txt; cat file_sizes.txt

    Token and the thief

    Once both nodes had finished bootstrapping and had successfully joined the cluster it looked like this.

    $ ccm node1 nodetool status
    Datacenter: datacenter1
    |/ State=Normal/Leaving/Joining/Moving
    --  Address    Load       Tokens  Owns (effective)  Host ID                               Rack
    UN  19.19 GB   32      14.8%             cfb50e13-52a4-4821-bca2-4dba6061d38a  rack1
    UN  9.55 GB    32      22.0%             5176598f-bbab-4165-8130-e33e39017f7e  rack1
    UN  19.22 GB   32      23.6%             d261faaf-628f-4b86-b60b-3825ed552aba  rack1
    UN  19.17 GB   32      17.5%             ae0a26a6-fab5-4cab-a189-697818be3c95  rack1
    UN  9.55 GB    32      22.1%             a71ed178-f353-42ec-82c8-d2b03967753a  rack1

    Using the file sizes I captured earlier from node5 netstats, I checked the data directories of node4 and node5 to confirm both nodes contained files of those sizes.

    $ for file_size in$(cat file_sizes.txt); do ls -al .../node4/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/ | grep ${file_size}; done
    -rw-r--r--    1 anthony  staff     1450179 12 May 16:33 killrweather-raw_weather_data-ka-16-Data.db
    -rw-r--r--    1 anthony  staff    14081154 12 May 16:33 killrweather-raw_weather_data-ka-15-Data.db
    -rw-r--r--    1 anthony  staff    55590043 12 May 16:33 killrweather-raw_weather_data-ka-9-Data.db
    -rw-r--r--    1 anthony  staff    55948069 12 May 16:33 killrweather-raw_weather_data-ka-1-Data.db
    -rw-r--r--    1 anthony  staff   901334951 12 May 16:33 killrweather-raw_weather_data-ka-8-Data.db
    -rw-r--r--    1 anthony  staff   901588743 12 May 16:33 killrweather-raw_weather_data-ka-6-Data.db
    -rw-r--r--    1 anthony  staff   902405063 12 May 16:33 killrweather-raw_weather_data-ka-7-Data.db
    -rw-r--r--    1 anthony  staff  3622476547 12 May 16:33 killrweather-raw_weather_data-ka-10-Data.db
    -rw-r--r--    1 anthony  staff  3651310715 12 May 16:33 killrweather-raw_weather_data-ka-4-Data.db
    $ for file_size in$(cat file_sizes.txt); do ls -al  .../node5/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/ | grep ${file_size}; done
    -rw-r--r--    1 anthony  staff     1450179 12 May 16:36 killrweather-raw_weather_data-ka-9-Data.db
    -rw-r--r--    1 anthony  staff    14081154 12 May 16:36 killrweather-raw_weather_data-ka-8-Data.db
    -rw-r--r--    1 anthony  staff    55590043 12 May 16:36 killrweather-raw_weather_data-ka-6-Data.db
    -rw-r--r--    1 anthony  staff    55948069 12 May 16:36 killrweather-raw_weather_data-ka-1-Data.db
    -rw-r--r--    1 anthony  staff   901334951 12 May 16:36 killrweather-raw_weather_data-ka-5-Data.db
    -rw-r--r--    1 anthony  staff   901588743 12 May 16:36 killrweather-raw_weather_data-ka-3-Data.db
    -rw-r--r--    1 anthony  staff   902405063 12 May 16:36 killrweather-raw_weather_data-ka-4-Data.db
    -rw-r--r--    1 anthony  staff  3622476547 12 May 16:36 killrweather-raw_weather_data-ka-7-Data.db
    -rw-r--r--    1 anthony  staff  3651310715 12 May 16:36 killrweather-raw_weather_data-ka-2-Data.db

    So both nodes contained files of the same size. I then decided to check if the files on each node that were the same size had the same data content. This check was done by performing an MD5 check of file pairs that were the same size.

    $ BASE_DIR=...; DATA_DIR=data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d; for file_size in$(cat file_sizes.txt); do node_4_file=$(ls -al ${BASE_DIR}/node4/${DATA_DIR}/ | grep ${file_size} | tr -s ' ' | cut -d' ' -f9); node_5_file=$(ls -al ${BASE_DIR}/node5/${DATA_DIR}/ | grep ${file_size} | tr -s ' ' | cut -d' ' -f9); md5 ${BASE_DIR}/node4/${DATA_DIR}/${node_4_file}${BASE_DIR}/node5/${DATA_DIR}/${node_5_file}; echo; done
    MD5 (.../node4/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-ka-16-Data.db)= a9edb85f70197c7f37aa021c817de2a2
    MD5 (.../node5/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-ka-9-Data.db)= a9edb85f70197c7f37aa021c817de2a2
    MD5 (.../node4/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-ka-15-Data.db)= 975f184ae36cbab07a9c28b032532f88
    MD5 (.../node5/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-ka-8-Data.db)= 975f184ae36cbab07a9c28b032532f88
    MD5 (.../node4/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-ka-9-Data.db)= f0160cf8e7555031b6e0835951e1896a
    MD5 (.../node5/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-ka-6-Data.db)= f0160cf8e7555031b6e0835951e1896a
    MD5 (.../node4/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-ka-1-Data.db)= 7789b794bb3ef24338282d4a1a960903
    MD5 (.../node5/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-ka-1-Data.db)= 7789b794bb3ef24338282d4a1a960903
    MD5 (.../node4/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-ka-8-Data.db)= 1738695bb6b4bd237b3592e80eb785f2
    MD5 (.../node5/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-ka-5-Data.db)= 1738695bb6b4bd237b3592e80eb785f2
    MD5 (.../node4/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-ka-6-Data.db)= f7d1faa5c59a26a260038d61e4983022
    MD5 (.../node5/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-ka-3-Data.db)= f7d1faa5c59a26a260038d61e4983022
    MD5 (.../node4/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-ka-7-Data.db)= d791179432dcdbaf9a9b315178fb04c7
    MD5 (.../node5/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-ka-4-Data.db)= d791179432dcdbaf9a9b315178fb04c7
    MD5 (.../node4/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-ka-10-Data.db)= 3e6623c2f06bcd3f5caeacee1917898b
    MD5 (.../node5/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-ka-7-Data.db)= 3e6623c2f06bcd3f5caeacee1917898b
    MD5 (.../node4/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-ka-4-Data.db)= 8775f5df08882df353427753f946bf10
    MD5 (.../node5/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-ka-2-Data.db)= 8775f5df08882df353427753f946bf10

    Now I had absolute proof that both nodes did in fact stream the same data from node2. It did look as though that when node5 joined the cluster it had taken tokens calculated by node4. If this were the case, it would mean that the data files on node4 that are the same on node5 would no longer be needed. One way to prove that there is “orphaned” data on node4 i.e. data not associated to any of node4’s tokens, would be to run cleanup on the cluster. If there is orphaned data on node4 the cleanup would technically delete all or some of those files. Before running cleanup on the cluster, I took note of the files on node4 which were the same as the ones on node5.

    $ for file_size in$(cat file_sizes.txt); do ls -al .../node4/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/ | grep ${file_size}; | tr -s ' ' | cut -d' '  -f9; done> node4_orphaned_files.txt; cat node4_orphaned_files.txt

    I then ran a cleanup on all the nodes in the cluster.

    $ ccm node1 nodetool cleanup
    $ ccm node2 nodetool cleanup
    $ ccm node3 nodetool cleanup
    $ ccm node4 nodetool cleanup
    $ ccm node5 nodetool cleanup
    $ ccm node1 nodetool status
    Datacenter: datacenter1
    |/ State=Normal/Leaving/Joining/Moving
    --  Address    Load       Tokens  Owns (effective)  Host ID                               Rack
    UN  9.57 GB    32      14.8%             cfb50e13-52a4-4821-bca2-4dba6061d38a  rack1
    UN  138.92 KB  32      22.0%             5176598f-bbab-4165-8130-e33e39017f7e  rack1
    UN  19.22 GB   32      23.6%             d261faaf-628f-4b86-b60b-3825ed552aba  rack1
    UN  9.62 GB    32      17.5%             ae0a26a6-fab5-4cab-a189-697818be3c95  rack1
    UN  9.55 GB    32      22.1%             a71ed178-f353-42ec-82c8-d2b03967753a  rack1

    From this output it was obvious that node4 contained orphaned data. Earlier I had run a nodetool status which was just after both nodes completed bootstrapping and moved to the UN state, and prior to running cleanup. The output produced at that point showed that node4 had a Load of 19.17 GB. Now after cleanup it was showing to have a load of 9.62 GB. As a final verification, I iterated through the list of files on node4 which were the same as the ones on node5 (node4_orphaned_files.txt) and checked if they still were present on node4.

    $ for file_name in$(cat node4_orphaned_files.txt); do ls .../node4/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/${file_name}; done
    ls: .../node4/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-ka-16-Data.db: No such file or directory
    ls: .../node4/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-ka-15-Data.db: No such file or directory
    ls: .../node4/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-ka-9-Data.db: No such file or directory
    ls: .../node4/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-ka-1-Data.db: No such file or directory
    ls: .../node4/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-ka-8-Data.db: No such file or directory
    ls: .../node4/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-ka-6-Data.db: No such file or directory
    ls: .../node4/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-ka-7-Data.db: No such file or directory
    ls: .../node4/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-ka-10-Data.db: No such file or directory
    ls: .../node4/data0/killrweather/raw_weather_data-32f23d1015cb11e79d0fa90042a0802d/killrweather-raw_weather_data-ka-4-Data.db: No such file or directory

    As it can be seen the files were deleted as part of the cleanup on node4. Which means that during bootstrap node4 originally calculated tokens for that data. It then asked for a list of files that related to those tokens from node2 and began streaming them. A little while later node5 was added to the cluster while node4 was still bootstrapping. It then calculated tokens that overlapped with node4’s tokens. Node5 then asked for a list of files that related to those tokens from node2 and started streaming data for them as well. The issue here is node4 was never notified that it no longer required to stream files from node2. Hence, unnecessary resources were being consumed as a result of bootstrapping two nodes at the same time.


    Auto bootstrapping combined with vnodes is probably one of the most handy features in Cassandra. It takes the pain out of manually having to move data around ensure a continuous availability while expanding the cluster in a reliable and efficient way. There a number of knobs and levers for controlling the default behaviour of bootstrapping.

    Configuration properties

    • auto_bootstrap - controls whether data is streamed to the new node when inserted.
    • streaming_socket_timeout_in_ms - sets socket timeout for streaming operations.

    JVM options

    • cassandra.consistent.rangemovement - controls consistent range movements and multiple node bootstrapping.
    • cassandra.replace_address_first_boot=<IP_ADDRESS> - allows a down node to be replaced with a new node.

    As demonstrated by setting the JVM option cassandra.consistent.rangemovement=false the cluster runs the risk of over streaming of data and worse still, it can violate consistency. For new users to Cassandra, the safest way to add multiple nodes into a cluster is to add them one at a time. Stay tuned as I will be following up with another post on bootstrapping.

    0 0

    JOSE, the primary mechanism for securing various OAuth2/OIDC tokens, slowly but surely is becoming the main technology for securing the data in the wider contexts. JOSE, alongside COSE, will become more and more visible going forward.

    I talked about Apache CXF JOSEimplementation in this post. One of the practical aspects of this implementation is that one can apply JOSE to securing the regular HTTP payloads, with the best attempt at keeping the streaming going made by the sender side filters, with the JOSE protection of these payloads (JWS signature or JWE encryption) being able to 'stay' with the data even beyond the HTTP request-response time if needed.

    In CXF 3.1.12 I have enhanced this feature to support the signing of HTTP attachments. It depends on JWS Detached Content and Unencoded Content features which allow to integrity-protect the payload which can continue flowing to its destination in a clear form.

    Combining it with the super-flexible mechanism of processing the attachments in Apache CXF, and particularly with the newly introduced Multipart filters which let pre-process individual multipart attachment streams, helped produce the final solution.  

    Besides, as part of this effort, the optional binding of the outer HTTP headers to the secure JWS or JWE payloads has also been realized.

    Be the first in experimenting with this IMHO very cool feature, try it and provide the feedback, enjoy !

    0 0

    ApacheCon North America 2017 attendee interview with Mel Llaguno. Mel talks to us about Coverity Scan and how it could help Apache projects.

    0 0

    I'm going to try to use the Blogger "pages" facility to keep track of this, because it works better than having a new summary post every year: My Backpacking Trips with Mike.

    0 0

    A one-liner to run a SSL Docker registry generating a Let’s Encrypt certificate.

    This command will create a registry proxying the Docker hub, caching the images in a registry volume.

    LetsEncrypt certificate will be auto generated and stored in the host dir as letsencrypt.json. You could also use a Docker volume to store it.

    In order for the certificate generation to work the registry needs to be accessible from the internet in port 443. After the certificate is generated that’s no longer needed.

    docker run -d -p 443:5000 --name registry \
      -v `pwd`:/etc/docker/registry/ \
      -v registry:/var/lib/registry \
      -e REGISTRY_HTTP_TLS_LETSENCRYPT_CACHEFILE=/etc/docker/registry/letsencrypt.json \
      -e \

    You can also create a config.yml in this dir and run the registry using the file instead of environment variables

    version: 0.1
          cachefile: /etc/docker/registry/letsencrypt.json

    Then run

    docker run -d -p 443:5000 --name registry \
      -v `pwd`:/etc/docker/registry/ \
      -v registry:/var/lib/registry \

    If you want to use this as a remote repository and not just for proxying, remove the proxy entry in the configuration

    0 0

    A recent blog post showed how to use Talend Open Studio for Big Data to access data stored in HDFS, where HDFS had been configured to authenticate users using Kerberos. In this post we will follow a similar setup, to see how to create a job in Talend Open Studio for Big Data to read data from an Apache Kafka topic using kerberos.

    1) Kafka setup

    Follow a recent tutorial to setup an Apache Kerby based KDC testcase and to configure Apache Kafka to require kerberos for authentication. Create a "test" topic and write some data to it, and verify with the command-line consumer that the data can be read correctly.

    2) Download Talend Open Studio for Big Data and create a job

    Now we will download Talend Open Studio for Big Data (6.4.0 was used for the purposes of this tutorial). Unzip the file when it is downloaded and then start the Studio using one of the platform-specific scripts. It will prompt you to download some additional dependencies and to accept the licenses. Click on "Create a new job" called "KafkaKerberosRead". 
    In the search bar under "Palette" on the right hand side enter "kafka" and hit enter. Drag "tKafkaConnection" and "tKafkaInput" to the middle of the screen. Do the same for "tLogRow":

    We now have all the components we need to read data from the Kafka topic. "tKafkaConnection" will be used to configure the connection to Kafka. "tKafkaInput" will be used to read the data from the "test" topic, and finally "tLogRow" will just log the data so that we can be sure that it was read correctly. The next step is to join the components up. Right click on "tKafkaConnection" and select "Trigger/On Subjob Ok" and drag the resulting line to "tKafkaInput". Right click on "tKafkaInput" and select "Row/Main" and drag the resulting line to "tLogRow":

    3) Configure the components

    Now let's configure the individual components. Double click on "tKafkaConnection". If a message appears that informs you that you need to install additional jars, then click on "Install". Select the version of Kafka that corresponds to the version you are using (if it doesn't match then select the most recent version). For the "Zookeeper quorum list" property enter "localhost:2181". For the "broker list" property enter "localhost:9092".

    Now we will configure the kerberos related properties of "tKafkaConnection". Select the "Use kerberos authentication" checkbox and some additional configuration properties will appear. For "JAAS configuration path" you need to enter the path of the "client.jaas" file as described in the tutorial to set up the Kafka test-case. You can leave "Kafka brokers principal name" property as the default value ("kafka"). Finally, select the "Set kerberos configuration path" property and enter the path of the "krb5.conf" file supplied in the target directory of the Apache Kerby test-case.

    Now click on "tKafkaInput". Select the checkbox for "Use an existing connection" + select the "tKafkaConnection" component in the resulting component list. For "topic name" specify "test". The "Consumer group id" can stay as the default "mygroup".

    Now we are ready to run the job. Click on the "Run" tab and then hit the "Run" button. Send some data via the producer to the "test" topic and you should see the data appear in the Run Window in the Studio.

    0 0

    The 2016-17 Ski Season was a fun one for the Raible Family. Abbie and Jack are good enough that they can zoom down the mountain without looking back. Their preferred runs are now blacks and they're no longer intimidated by moguls. We spent most of the season skiing at Mary Jane and Winter Park, but also had some trips to Crested Butte, Steamboat, and Montana.

    Mary Jane for Trish's BirthdayFamily Ski Day at Mary Jane!

    On top of the world!

    On our way to Crested Butte (near the end of January), Stout the Syncro's front drive train started making crunching noises and we had to pull over. It was almost midnight, and we knew we couldn't fix it on the side of the road, so we called AAA and spent the night in a nearby hotel. I rode back to Denver the next day with the tow truck driver on the bumpiest ride of my life. The tow truck's tires were out of balance and it felt like things were going to fall apart at any moment. We later discovered that several things failed that night: one axle broke, a CV went out, and a wheel bearing was gone too. Trish and the kids spent the day at a hot springs in Buena Vista while I drive Trish's Tahoe back to complete the trip. While we missed a day skiing, the day we made it for was great and CB continues to be one of our favorite resorts.

    Trish and Laura on HeadwallMontanya Rum!

    James, Jenny, and Josie

    I had some awesome solo skiing adventures too: skiing in Sweden after Jfokus, skiing in Steamboat during Winter Wonder Grass, and Miller Time 2017 in Northstar, California. Miller Time is an annual event to remember a great friend, Jason Miller, who passed on way too early in October 2015.

    It was a beautiful day for skiing at Klövsjö!

    ShadowsBumps at SteamboatA beautiful day with brothers

    Miller Time 2017!

    The highlight of the season was when we traveled to Big Sky Resort in Montana. I'd never skied at Big Sky before and it's been on my bucket list for years. We stayed a week during the kids' Spring Break and experienced some of the best skiing of our lives. When I asked Jack if he had any goals for Spring Break, he said he wanted to "ski 5 double blacks". There were so many double blacks on the mountain, it was unbelievable. We had his goal accomplished by noon on Tuesday. Unfortunately, I didn't remember to use our GoPro until Wednesday. However, that was the day we skipped skiing and did zip lining, so that was a ton of fun. I left my phone and laptop at home that week and thouroughly enjoyed being disconnected.

    The video below shows our zip lining adventures, as well as some footage of our best day skiing. It was a powder day, we were on one of the first chairs, and we had fresh tracks all morning. There was an immmense amount of giggline, smiling, and hooting and hollering that morning.

    More photos of our ski season adventures can be found on Flickr.

    Stout 5.0 and Hefe 3.0

    Now that ski season is over, it's time to focus on our favorite summertime activities: VWs, horseback riding, mountain biking, and rafting. Stout 5.0 and Hefe 3.0 were released at the beginning of May and they've both been running great!

    Driving Hefe around with his rockin' stereo has been a lot of fun. You can imagine the kids giggling in the back with no seatbelts and ragtop down.

    Happy MattTrish is driving Hefe - watch out!

    #porschebusSo many windows

    Hefe won best in class (Custom Bus 49-67) at VWs on the Green last weekend. Two years in a row baby!

    1st Place! Good job Hefe!!

    More photos on Flickr ? Stout 5.0 and Hefe 3.0

    Speaking of winning, Abbie and Tucker have been tearing it up at the Colorado Horse Park. They've won 24 ribbons over the last five months!

    24 ribbons over the course of five months!

    Ski season is over, summer is almost upon us, and both our VWs are in tip-top shape. I really can't ask for anything more!

    0 0

    • U.S. top court tightens patent suit rules in blow to ‘patent trolls’

      This is excellent news, and a death knell for the East Texas patent troll court (cf ):

      The U.S. Supreme Court on Monday tightened rules for where patent lawsuits can be filed in a decision that may make it harder for so-called patent “trolls” to launch sometimes dodgy patent cases in friendly courts, a major irritant for high-tech giants like Apple and Alphabet Inc’s Google. In a decision that upends 27 years of law governing patent infringement cases, the justices sided with beverage flavoring company TC Heartland LLC in its legal battle with food and beverage company Kraft Heinz Co (KHC.O). The justices ruled 8-0 that patent suits can be filed only in courts located in the jurisdiction where the targeted company is incorporated.
      via Brad Fitzgerald

      (tags: via:bradfitzpatentsswpatseast-texaslawtrollssupreme-courtinfringement)

    0 0

    As a Developer Advocate at Okta, I'm expected to travel up to 25% per month to speak at conferences and meetups. This May was more like 50%! I had opportunities to contribute to a number of cool conferences in exotic cities that I was eager to accept.

    My adventure began on Monday, May 8 when I flew to Amsterdam to speak at the J-Spring conference. It was the first time the NLJUG hosted this conference in several years. I marveled at the venue and especially liked the outdoor area it offered during breaks. The walk from/to the train station was pretty nice too.

    J-Spring Outdoor AreaAmsterdam Bike Paths

    I spoke about Microservices for the Masses with Spring Boot, JHipster, and JWT. Feedback I received mentioned it was a bit too fast and I crammed too much into the 50-minute time slot. I do tend to mention everything I know about topics when I speak, so I apologize for trying to cram too much in.

    After J-Spring, I flew to London to speak at Devoxx UK. I arrived just in time to catch the speaker's dinner and had fun seeing and catching up with old friends from the conference circuit.

    View from Room 404 in LondonDevoxx UK Venue

    Thursday morning, I had an Angular workshop and did my microservices presentation in the afternoon. Friday, I had an early morning talk on Front End Development for Back End Developers. You can find all my presentations below.

    I rushed straight from my last talk on Friday to the airport to catch a flight to Boston for the weekend. In Boston, we celebrated Trish's brother's 50th birthday, Mother's Day, and had a blast with friends and family.

    Happy Mother's Day!

    The following Monday, I hopped on a plane to return to Europe with Krakow (for GeeCON) as my destination. Three flights later and I arrived in time to take a nice stroll around the city, enjoying the greenery.


    At GeeCON, I spoke about how to build a progressive web app with Ionic, Angular, and Spring Boot. Half of my talk was live coding and I almost got all my demos working. Deploying to Cloud Foundry and my phone was the final step, and due to Xcode updating, that demo failed. I wrote a tutorial about Ionic for the Okta developer blog that has everything (and more!) that I showed in my demo.

    I had to head straight to the airport after finishing my talk, this time heading for Spring I/O in Barcelona. Barcelona has always been on Trish's bucket list, so I easily talked her into joining me. At Spring I/O, I did a workshop on developing with Spring Boot and Angular, followed by my Front End Development for Back End Developers talk. There weren't that many talks on front-end development, so I felt privileged to be one of the few talking about UI development.

    I also enjoyed Deepu's talk on JHipster and Sebastien's talk on Keycloak. It was the first time I'd met these great guys in person, so that was a lot of fun.

    On Friday, Trish and I hit some of the sites in Barcelona and had a wonderful time. The weather was beautiful, the architecture was amazing, and the experience was awesome.

    Amazing Architecture in BarcelonaBarcelona

    Barcelona Fountains

    HappinessSagrada Familia

    Sagrada Familia

    More photos on Flickr → European Speaking Tour - May 2017

    Thanks to the organizers of each conference for allowing me to speak and for covering my travel expenses. My company doesn't pay for overseas conferences (yet!), but they do pay me while I'm there, so that's nice. To everyone that attended my sessions - thank you! I really appreciate the feedback and will do my best to improve future talks. If you have additional feedback, feel free to contact me.

    In the meantime, keep an eye on the Okta developer blog. I've been writing a lot of articles lately and there's more to come in the pipeline! Here's a few that've been published in the last month.