Are you the publisher? Claim or contact us about this channel

Embed this content in your HTML


Report adult content:

click to rate:

Account: (login)

More Channels

Channel Catalog

Articles on this Page

(showing articles 1 to 50 of 50)
(showing articles 1 to 50 of 50)

    0 0

    Note: This post is about my second Dirty Kanza 200 experience on June 3, 2017.

    It’s broken into seven parts:

    Part I – Prep / Training

    Part II – Preamble

    Part III – Starting Line

    Part IV – Checkpoint One

    Part V – Checkpoint Two

    Part VI – Checkpoint Three – coming soon

    Part VII – Finish Line – coming soon


    I mentioned earlier that this course was the same as last year’s, and so after finishing then, knew what to expect.  It’s about 55 miles and has the most climbing.

    Screen Shot 2017-06-16 at 7.37.52 AM

    Leg Two begins @ mile 49

    You can get an idea what I’m talking about from its elevation profile.  From mile 49, there’s 30 miles, mostly uphill.  The roads cross ranch land and are lightly maintained, if at all.  This means fun; this means pain.


    Mile 62 was still fun

    The winds were tame, the sun hidden behind a thick cloud cover keeping the temps down but it was muggy.


    Mile 76 getting tougher

    It was this time last year I hit the Wall.  But I adjusted and was confident that trouble could be avoided.


    As the going slows, the tedium grows, the mind struggles to find something to latch onto, and begins to play tricks.  Seemingly big things are downplayed.  For example, my left leg began cramping at mile 85, but I largely ignored it.  The terrain becomes treacherous, but I was unconcerned — just white noise.

    But small things become a big deal.  For example, an airplane buzzing overhead, I became obsessed with getting its picture.  It seems stupid now to pull out a phone, aim it upwards, and shoot, while navigating the most challenging terrain on the course.


    Took this one while riding…. airplane.

    Anything to take the mind off of the pain.  And that nameplate, on its back, facing me, were images of sad riders, my friends, who couldn’t manage to mount it correctly.

    Screen Shot 2017-06-16 at 9.44.40 AM

    Image on the back of nameplate of the happy and sad riders

    And the one happy rider, who got it right, I wanted to rip his face off.  Was I angry for being so damn happy whilst we were suffering?  Was he mocking us?  Did I make a mistake and hang my nameplate wrong, which doomed my fate to be unhappy like the others?

    Yeah, I know it doesn’t make any sense.

    For the most part I was doing OK, just slowed down quite a bit.  The changes to the rear cassette, and the hill training, helped tremendously.  I remained in the saddle the entire time and only walked one hill — the Bitch.  I could have rode it but my cramping left quad begged me otherwise and so I relented, just this once.


    I went into great detail last year about running out of water during the middle of the second leg.  This year I made changes, added a 2.5L Camelbak to a 1.8L bladder and two more bottles in the cages — 1.5L.  That’s about 1.5 gallons for those keeping score back home.

    In addition to water I also used (more) electrolytes, although not enough due to the cramping I experienced.

    I still ran out of water on this leg, around mile 95, about nine miles from checkpoint two!!  Fortunately, there was a husband and wife duo parked at the end of their driveway, outside their country home, with a truckload of iced water bottles beside them.  I stopped and asked if they would be so kind as to share some water with me.

    “Are you dry”, the man asked me in his Kansas twang, to which I replied that I most certainly was.

    “Take all you want”, he told me.  I downed a pint as we exchanged pleasantries, grabbed another for the road and just like that I’m good.

    I grew up in Kansas, already knew that good Samaritans run aplenty, but still am inspired by their warmth and hospitality. One of the reasons I keep coming back here is to be reminded that these people are still here.


    From here on out I have in my pit crew, from l to r, Cheri, Gregg, Janice, Kelly and Kyle.


    Kelly had just completed the DKlite, and was working his magic down in Eureka keeping the crew operating like a well-oiled machine.  Kyle was in from Seattle and Janice (Mom) from Salina.  You may recall that Gregg was my riding partner last year.  He couldn’t commit to the training but made sure he was there to lend a hand and offer encouragement along with Cheri, his partner, who also was in our pit last year.


    Kelly wearing the colors

    That way when I rolled into town, weary and tired from the road…


    Rolling into CP2

    All I had to do was hand over my bike, eat, hydrate, and try to relax a bit.  I can’t tell you how much it helped me to have them there.


    That time spent in checkpoint two renewed my spirit and provided resolve.


    Gregg made sure I tried his preferred energy drink.

    I had a rough go in that second leg (again) but was feeling better than last year.  I could eat and had plenty of gas left to finish.  The muscles in my neck were beginning to ache and I took a few aspirin, changed into dry socks, ate and drank a bit, and hit the road once again.

    Screen Shot 2017-06-16 at 8.48.38 AM

    At 58 miles, the third leg is the longest.  I was feeling fine but storm clouds were brewing and I began to wonder what it would be like to ride through one…

    Next Post: Part V – Checkpoint Three – coming soon



    0 0

    This blog entry continues the series started with the introduction to Apache CXF JOSE implementation followed recently with the post talking about the signing of HTTP attachments.

    So CXF helps with shipping JOSE filters which can protect the application data by wrapping them into JOSE JWS or JWE envelopes or verify that the data has been properly encrypted and/or signed. In these cases the application code is not even aware that the JOSE processors are involved.

    How would one approach the task of signing/verifying and/or encrypting/decrypting the data directly in the application code ? For example, what if an individual property of the bigger payload needs to be JOSE protected ?

    The most obvious approach is to use either CXF JOSE or the preferred 3rd party library to deal with the JOSE primitives in the application code. This is Option 1. It is a must option if one needs to have a closer control over the JOSE envelope creation process.

    Or you can basically do nearly nothing at all and let CXF handle it for you, this is Option 2. This is a CXF Way Option - make it as easy as possible for the users to embrace the advanced technologies fast. It is not though only about making it easy - but is also about having a more flexible and even portable JOSE-aware code.

    In this case such requirements as "sign only" or "encrypt only" or "sign and encrypt" and similarly for the "verify/decrypt" are not encoded in the code - it is managed at the time of configuring the JOSE helpers from the application contexts (by default they only sign/verify).

    Likewise, the signature and encryption algorithm and key properties are controlled externally.

    I know, it is hard to believe that it can be so easy. Try it to believe it. Enjoy !

    0 0

    Note: This post is about my second Dirty Kanza 200 experience on June 3, 2017.

    It’s broken into seven parts:

    Part I – Prep / Training

    Part II – Preamble

    Part III – Starting Line

    Part IV – Checkpoint One

    Part V – Checkpoint Two

    Part VI – Checkpoint Three

    Part VII – Finish Line – coming soon

    Don’t Worry Be Happy

    My thoughts as I roll out of Eureka @ 3:30pm…

    • Thirty minutes at a checkpoint is too long, double the plan, but was overheated and feel much better now.
    • I’m enjoying myself.
    • It’s only a hundred miles back to Emporia, I could do that in my sleep.
    • What’s that a storm cloud headed our way?  It’s gonna feel good when it gets here.

    Mud & Camaraderie Mix

    That first century was a frantic pace and there’s not much time or energy for team building.  We help each other out, but it’s all business.

    The second part is when stragglers clump into semi-cohesive units.   It’s only natural and in any case, foolish to ride alone.  A group of riders will always be safer than one, assuming everyone does their job properly.  Each new set of eyes brings another brain to identify and solve problems.

    There’s Jim, who took a few years off from his securities job down in Atlanta, Georgia to help his wife with their Montessori school, and train for this race.  He and I teamed up during the first half of the third leg.  As the worst of the thunderstorms rolled over.

    Before we crossed the US hiway 54, a rider was waiting to be picked up by her support team.  Another victim of muddy roads, a derailleur twisted, bringing an early end to a long day.  We stopped, checked and offered encouragement as a car whizzed by us.

    “That’s a storm chaser!!”, someone called out, leaving me to wonder just how bad these storms were gonna get.

    Derrick, is an IT guy from St. Joseph, Missouri, riding a single-speed bike on his way to a fifth finish, and with it a Goblet commemorating 1000 miles of toil.

    We rode for a bit at the end of the third, right at dusk.  My GPS, up to now worked flawlessly had changed into the nightime display mode and I could no longer make out which lines to follow, missed a turn and heard the buzzer telling me I’d veered off course.

    I stopped and pulled out my cue sheets.  Those were tucked safely and sealed to stay nice and dry.  What, I forgot to seal, its pages wet, stuck together and useless?

    I was tired and let my mind drift.  Why didn’t I bring a headlamp on this leg?  I’d be able to read the nav screen better.  And where is everybody?  How long have I been on the wrong path?  Am I lost?

    Be calm.  Get your focus and above all think.  What about my phone, I can get to the maps from it.  It’s almost dead but there’s plenty of reserve power available.

    Just then Derrick’s dim headlight appeared in the distance.  He stopped and we quietly discussed my predicament.  For some reason his GPS device couldn’t figure that turn out either.  It was then we noticed tire tracks off to our right, turned and got back on track, both nav devices mysteriously resumed working once again.

    Jeremy is the service manager at one of the better bike shops in Topeka, Kansas.  He’s making a third attempt.  Two years ago, he broke down in what turned into a mudfest.  Last year, he completed the course, but twenty minutes past due and didn’t make the 3:00 am cutoff.

    His bike was a grinder of sorts with some fats.  It sounded like a Mack truck on the downhills, but geared like a mountain goat on the uphills.  I want one of them bikes.  Going to have to look him up at that bike shop one day.

    Last year I remembered him lying at the roadside, probably ten maybe fifteen miles outside of Emporia.

    “You alright?”, we stopped and asked.  It was an hour or more past midnight and the blackest of night.

    “Yeah man, just tired, and need to rest a bit.  You guys go on, I’m fine”, he calmly told us.

    There’s the guy from Iowa, who normally wouldn’t be at the back-of-the-pack (with us), but his derailleur snapped and he’d just converted to a single-speed as I caught up with him, and his buddy.  This was a first attempt for both.  They’d been making good until the rains hit.

    Or the four chicks, from where I do not know, who were much faster than I, but somehow kept passing me.  How I would get past them again remains a mystery.

    Also, all of the others, whose names can’t be placed, but the stories can…


    Seven miles into that third leg came the rain.  It felt good, but introduced challenges.  The roads become slippery and a rider could easily go down.  They become muddy and the bike very much wants to break down.

    Both are critical risk factors in terms of finishing.  One’s outcome much worse than the other.

    Fortunately, both problems have good solutions.  The first, slow down the descents, pick through the rocks, pools of mud and water — carefully.  If in doubt stop and walk a section, although I never had to on this day, except for that one crossing with peanut butter on the other side.

    By the way, these pictures that I’m posting are from the calmer sections.  It’s never a good idea to stop along a dangerous roadside just to take one.  That will create a hazard for the other riders, who then have to deal with you in their pathways which limits their choices for a good line.  When the going is tricky, keep it moving, if possible to do so safely.

    The second problem means frequent stops to flush the grit from the drivetrains.  When it starts grinding, it’s time to stop and flush.  Mind the grind.  Once I pulled out two centimeter chunks of rocks lodged in the derailleurs and chain guards.

    Use whatever is on hand.  River, water, bottles, puddles.  There was mud — everywhere.  In the chain, gears and brakes.  It’d get lodged in the pedals and cleats of our shoes making it impossible to click in or (worse) to click out.  I’d use rocks to remove other rocks or whatever is handy and/or expedient.  It helps to be resourceful at times like this.  That’s not a fork, it’s an extended, multi-pronged, mud and grit extraction tool.

    The good folks alongside the road were keeping us supplied with plenty of water.  It wasn’t needed for hydration, but for maintenance.  I’d ask before using it like this, to not offend them.  Pouring their bottles of water over my bike, but they understood and didn’t seem to mind.

    We got rerouted once because the water crossing decided it wanted to be a lake.  This detour added a couple of miles to a ride that was already seven over two hundred.

    The rain made for slow but I was having a good time and didn’t want the fun to end.

    Enjoy this moment.  Look over there, all the flowers growing alongside the road.  The roads were still muddy but the fields were clean and fresh, the temperatures were cool.


    wild flowers along the third leg

    Madison (once again)

    Rolled in about 930p under the cover of night.


    930p @ Madison CP3

    After all that fussing over nameplates in the previous leg and found out it was mounted incorrectly.  It partially blocked the headlight beam and had to be fixed.


    Cheri lends a hand remounting the nameplate so I can be a happy rider

    It was Cheri’s second year doing support.  Last year it was her and Kelly crewing for Gregg and I.  This year, she and Gregg came as well.  As I said earlier, the best part of this race is experiencing it with friends and family.

    I was in good spirits, but hungry, my neck ached, and my bike was in some serious need of attention.  All of this was handled with calm efficiency by Kelly & Co.

    Kyle, who’s an RN, provided medical support with pain relievers and ice packs.  They knew I liked pizza late in the race and Gregg handed some over that had just been pulled fresh from the oven, across the street, at the EZ-mart. It may not sound like much now, but gave me the needed energy boost, from something that doesn’t get squeezed out of a tube.

    As soon as Cheri finished the nameplate, Gregg got the drivetrain running smoothly once again.

    All the while, Kelly and Mom were assisting and directing.  There’s the headlamp needing to be mounted, fresh battery packs, change to the clear lens on the glasses, socks, gloves, cokes, energy drinks, refilling water tanks, electrolytes, gels and more.  There’s forty-some to go, total darkness, unmarked roads.  Possibly more mud on the remaining B roads.  Weather forecast clear and mild.

    Let’s Finish This

    “Who are you riding with?”, Gregg called out as I was leaving.  He ran alongside for a bit, urging me on.


    Gregg runs alongside as I leave CP3

    “Derrick and I are gonna team up”, I called back, which was true, that was the plan as we rolled into town.  Now I just had to find him.  Madison was practically deserted at this hour, its checkpoint regions, i.e. red, green, blue, orange, were spread out, and what color did he say he was again??

    Twenty two minutes spent refueling at checkpoint three and into the darkness again.  That last leg started @ 10 pm with 45 miles to go.  I could do that in my sleep, may need to.

    Screen Shot 2017-06-17 at 9.32.30 PM

    Next Post: Part VII – Finish Line – coming soon


    0 0

    We've had an extensive demonstration of how to enable Swagger UI for CXF endpoints returning Swagger documents for a while but the only 'problem' was that our demos only showed how to unpack a SwaggerUI module into a local folder with the help of a Maven plugin and make these unpacked resources available to browsers.
    It was not immediately obvious to the users how to activate SwaggerUI and with the news coming from a SpringBoot land that apparently it is really easy over there to do it it was time to look at making it easier for CXF users.
    So Aki, Andriy and myself talked and this is what CXF 3.1.7 users have to do:

    1. Have Swagger2Feature activated to get Swagger JSON returned
    2. Add a swagger-ui dependency  to the runtime classpath.
    3. Access Swagger UI

    For example, run a description_swagger2 demo. After starting a server go to the CXF Services page and you will see:

    Click on the link and see a familiar Swagger UI page showing your endpoint's API.

    Have you wondered what do some developers mean when they say it is a child's play to try whatever they have done ? You'll find it hard to find a better example of it after trying Swagger UI with CXF 3.1.7 :-)

    Note in CXF 3.1.8-SNAPSHOT we have already fixed it to work for Blueprint endpoints in OSGI (with the help from Łukasz Dywicki).  SwaggerUI auto-linking code has also been improved to support some older browsers better.

    Besides, CXF 3.1.8 will also offer a proper support for Swagger correctly representing multiple JAX-RS endpoints based on the fix contributed by Andriy and available in Swagger 1.5.10 or when API interface and implementations are available in separate (OSGI) bundles (Łukasz figured out how to make it work).

    Before I finish let me return to the description_swagger2 demo. Add a cxf-rt-rs-service-description dependency to pom.xml. Start the server and check the services page:

    Of course some users do and will continue working with XML-based services and WADL is the best language available around to describe such services. If you click on a WADL link you will see an XML document returned. WADLGenerator can be configured with an XSLT template reference and if you have a good template you can get UI as good as this Apache Syncope document.

    Whatever your data representation preferences are, CXF will get you supported.


    0 0

    Since we created our hard fork of Spotify’s great repair tool, Reaper, we’ve been committed to make it the “de facto” community tool to manage repairing Apache Cassandra clusters.
    This required Reaper to support all versions of Apache Cassandra (starting from 1.2) and some features it lacked like incremental repair.
    Another thing we really wanted to bring in was to remove the dependency on a Postgres database to store Reaper data. As Apache Cassandra users, it felt natural to store these in our favorite database.

    Reaper 0.6.1

    We are happy to announce the release of Reaper 0.6.1.

    Apache Cassandra as a backend storage for Reaper was introduced in 0.4.0, but it appeared that it was creating a high load on the cluster hosting its data.
    While the Postgres backend could rely on indexes to search efficiently for segments to process, the C* backend had to scan all segments and filter afterwards. The initial data model didn’t account for the frequency of those scans, which generated a lot of requests per seconds once you had repairs with hundreds (if not thousands) of segments.
    Then it seems, Reaper was designed to work on clusters that do not use vnodes. Computing the number of possible parallel segment repairs for a job used the number of tokens divided by the replication factor, instead of using the number of nodes divided by the replication factor.
    This lead to create a lot of overhead with threads trying and failing to repair segments because the nodes were already involved in a repair operation, each attempt generating a full scan of all segments.

    Both issues are fixed in Reaper 0.6.1 with a brand new data model which requires a single query to get all segments for a run, the use of timeuuids instead of long ids in order to avoid lightweight transactions when generating repair/segment ids and a fixed computation of the number of possible parallel repairs.

    The following graph shows the differences before and after the fix, observed on a 3 nodes cluster using 32 vnodes :

    The load on the nodes is now comparable to running Reaper with the memory backend :

    This release makes Apache Cassandra a first class citizen as a Reaper backend!

    Upcoming features with the Apache Cassandra backend

    On top of not having to administer yet another kind of database on top of Apache Cassandra to run Reaper, we can now better integrate with multi region clusters and handle security concerns related to JMX access.

    First, the Apache Cassandra backend allows us to start several instances of Reaper instead of one, bringing it fault tolerance. Instances will share the work on segments using lightweight transactions and metrics will be stored in the database. On multi region clusters, where the JMX port is closed in cross DC communications, it will give the opportunity to start one or more instances of Reaper in each region. They will coordinate together through the backend and Reaper will still be able to apply backpressure mechanisms, by monitoring the whole cluster for running repairs and pending compactions.

    Next, comes the “local mode”, for companies that apply strict security policies for the JMX port and forbid all remote access. In this specific case, a new parameter was added in the configuration yaml file to activate the local mode and you will need to start one instance of Reaper on each C* node. Each instance will then only communicate with the local node on and ignore all tokens for which this node isn’t a replica.

    Those feature are both available in a feature branch that will be merged before the next release.

    While the fault tolerant features have been tested in different scenarios and considered ready for use, the local mode still needs a little bit of work before usage on real clusters.

    Improving the frontend too

    So far, we hadn’t touched the frontend and focused on the backend.
    Now we are giving some love to the UI as well. On top of making it more usable and good looking, we are pushing some new features that will make Reaper “not just a tool for managing repairs”.

    The first significant addition is the new cluster health view on the home screen :

    One quick look at this screen will give you the nodes individual status (up/down) and the size on disk for each node, rack and datacenter of the clusters Reaper is connected to.

    Then we’ve reorganized the other screens, making forms and lists collapsible, and adding a bit of color :

    All those UI changes were just merged into master for your testing pleasure, so feel free to build, deploy and be sure to give us feedback on the reaper mailing list!

    0 0

    Note: This post is about my second Dirty Kanza 200 experience on June 3, 2017.

    It’s broken into seven parts:

    Part I – Prep / Training

    Part II – Preamble

    Part III – Starting Line

    Part IV – Checkpoint One

    Part V – Checkpoint Two

    Part VI – Checkpoint Three

    Part VII – Finish Line


    I went looking for Derrick but couldn’t find him.  A woman, found out later his wife…

    “Are you John?”, she asked.

    I replied with my name and didn’t make the connection.  I’d forgotten the color of his support team and he got my name wrong so that made us even.

    He caught up ten miles later, by then chasing the fast chicks.  I called out as they zoomed past, wished them well.  This is how it works.  Alliances change according to the conditions and needs from one moment to the next.

    A lone rider stopped at the edge of downtown — Rick from Dewitt, Arkansas.  He was ready for takeoff.

    “You headed out, how bout we team up?”, I asked matter-of-factly.  The deal was struck and then there were two.

    Eventually, maybe twenty miles later, we picked up Jeremy, which made three.  It worked pretty well.  Not much small talk, but lots of operational chatter.  You’d thought we were out on military maneuvers.

    • “Rocks on left.”
    • “Mud — go right!”
    • “Off course, turning around.”
    • “Rough! Slowing!”

    There were specializations.  For example, Jeremy was the scout.  His bike had fat tires and so he’d bomb the downhills, call back to us what he saw, letting us know of the dangers.  Rick did most of the navigating.  I kept watch on time, distance and set the pace.

    By this time we were all suffering and made brief stops every ten miles or so.  We’d agreed that it was OK, had plenty of time, and weren’t worried.

    Caught up with Derrick six miles from home.  Apparently he couldn’t keep up with the fast chicks either, but gave it the college try, and we had a merry reunion.

    We rolled over the finish line somewhat past 2:00 am.  Here’s the video feed:


    Rick and I @ the FL

    My support team was there along with a smattering of hearty locals to cheer us and offer congratulations.

    Jeremy, Rick and I had a brief moment where we congratulated each other before Lelan handed over our Breakfast Club finishers patches and I overheard Rick in his southern drawl…

    “I don’t care if it does say breakfast club on there.”

    Next were the hugs and pictures with my pit crew and I was nearly overcome with emotion.  Felt pretty good about the finish and I don’t care if it says breakfast club on there either.


    The Pit Crew, l to r, Me, Gregg, Kelly, Janice, Cheri, Kyle


    In addition to my pit crew…

    My wife Cindy deserves most of the credit.  She bought the bike four years ago that got me all fired up again about cycling.  Lots of times when I’m out there riding I should be home working.  Throughout this she continues to support without complaint.  Thanks baby, you’re the best, I love you.

    Next, are the guys at the bike shop — Arkansas Cycle and Fitness, back home in Little Rock.  They tolerate abysmal mechanical abilities, patiently listen to requirements, and teach when necessary (often).  Time and again the necessary adjustments were made to correct the issues I was having with the bike.  They’ve encouraged and cheered, offering suggestions and help along the way.

    Finally, my cycling buddies — the Crackheads.  Truth be known they’re probably more trail runners than cyclists, but they’re incredible athletes from whom I’ve learned much.  In the summertime, when the skeeters and chiggers get too bad for Arkansas trail running, they come out and ride which makes me happy.

    Screen Shot 2017-06-20 at 12.01.19 AM

    The End

    0 0

    In George R R Martin’s books “A Song of Fire and Ice” (which you may know by the name “A Game of Thrones”), the people of Braavos,
    have a saying – “Valar Morghulis” – which means “All men must die.” As you follow the story, you quickly realize that this statement is not made in a morbid, or defeatist sense, but reflects on what we must do while alive so that the death, while inevitable, isn’t meaningless. Thus, the traditional response is “Valar Dohaeris” – all men must serve – to give meaning to their life.

    So it is with software. All software must die. And this should be viewed as a natural part of the life cycle of software development, not as a blight, or something to be embarrassed about.

    Software is about solving problems – whether that problem is calculating launch trajectories, optimizing your financial investments, or entertaining your kids. And problems evolve over time. In the short term, this leads to the evolution of the software solving them. Eventually, however, it may lead to the death of the software. It’s important what you choose to do next.

    You win, or you die

    One of the often-cited advantages of open source is that anybody can pick up a project and carry it forward, even if the original developers have given up on it. While this is, of course, true, the reality is more complicated.

    As we say at the Apache Software Foundation, “Community > Code”. Which is to say, software is more than just lines of source code in a text file. It’s a community of users, and a community of developers. It’s documentation, tutorial videos, and local meetups. It’s conferences, business deals and interpersonal relationships. And it’s real people solving real-world problems, while trying to beat deadlines and get home to their families.

    So, yes, you can pick up the source code, and you can make your changes and solve your own problems – scratch your itch, as the saying goes. But a software project, as a whole, cannot necessarily be kept on life support just because someone publishes the code publicly. One must also plan for the support of the ecosystem that grows up around any successful software project.

    Eric Raymond just recently released the source code for the 1970s
    computer game Colossal Cave Adventure on Github. This is cool, for us greybeard geeks, and also for computer historians. It remains to be seen whether the software actually becomes an active open source project, or if it has merely moved to its final resting place.

    The problem that the software solved – people want to be entertained – still exists, but that problem has greatly evolved over the years, as new and different games have emerged, and our expectations of computer games have radically changed. The software itself is still an enjoyable game, and has a huge nostalgia factor for those of us who played it on greenscreens all those years ago. But it doesn’t measure up to the alternatives that are now available.

    Software Morghulis. Not because it’s awful, but because its time has

    Winter is coming

    The words of the house of Stark in “A Song of Fire and Ice”, are “Winter is coming.” As with “Valar Morghulis,” this is about planning ahead for the inevitable, and not being caught surprised and unprepared.

    How we plan for our own death, with insurance, wills, and data backups, isn’t morbid or defeatist. Rather, it is looking out for those that will survive us. We try to ensure continuity of those things which are possible, and closure for those things which are not.

    Similarly, Planning ahead for the inevitable death of a project isn’t defeatist. Rather, it shows concern for the community. When a software project winds down, there will often be a number of people who will continue to use it. This may be because they have built a business around it. It may be because it perfectly solves their particular problem. And it may be that they simply can’t afford the time, or cost, of migrating to something else.

    How we plan for the death of the project prioritizes the needs of this community, rather than focusing merely on the fact that we, the developers, are no longer interested in working on it, and have moved on to something else.

    At Apache, we have established the Attic as a place for software projects to come to rest once the developer community has dwindled. While the project itself may reach a point where they can no longer adequately shepherd the project, the Foundation as a whole still has a responsibility to the users, companies, and customers, who rely on the software itself.

    The Apache Attic provides a place for the code, downloadable releases, documentation, and archived mailing lists, for projects that are no longer actively developed.

    In some cases, these projects are picked up and rejuvenated by a new community of developers and users. However, this is uncommon, since there’s usually a very good reason that a project has ceased operation. In many cases, it’s because a newer, better solution has been developed for the problem that the project solved. And in many cases, it’s because, with the evolution of technology, the problem is no longer important to a large enough audience.

    However, if you do rely on a particular piece of software, you can rely on it always being available there.

    The Attic does not provide ongoing bug fixes or make additional releases. Nor does it make any attempt to restart communities. It is
    merely there, like your grandmother’s attic, to provide long-term storage. And, occasionally, you’ll find something useful and reusable as you’re looking through what’s in there.

    Software Dohaeris

    The Apache Software Foundation exists to provide software for the public good. That’s our stated mission. And so we must always be looking out for that public good. One critical aspect of that is ensuring that software projects are able to provide adequate oversight, and continuing support.

    One measure of this is that there are always (at least) three members of the Project Management Committee (PMC) who can review commits, approve releases, and ensure timely security fixes. And when that’s no longer the case, we must take action, so that the community depending on the code has clear and correct expectations of what they’re downloading.

    In the end, software is a tool to accomplish a task. All software must serve. When it no longer serves, it must die.

    0 0

    Most datacenter automation tools operate on the basis of desired state. Desired state describes what should be the end state but not how to get there. To simplify a great deal, if the thing being automated is the speed of a car, the desired state may be “60mph”. How to get there (braking, accelerator, gear changes, turbo) isn’t specified. Something (an “agent”) promises to maintain that desired speed.


    The desired state and changes to the desired state are sent from the orchestrator to various agents in a datacenter. For example, the desired state may be “two apache containers running on host X”. An agent on host X will ensure that the two containers are running. If one or more containers die, then the agent on host X will start enough containers to bring the count up to two. When the orchestrator changes the desired state to “3 apache containers running on host X”, then the agent on host X will create another container to match the desired state.

    Transfer of desired state is another way to achieve idempotence (a problem described here)

    We can see that there are two sources of changes that the agent has to react to:

    1. changes to desired state sent from the orchestrator and
    2. drift in the actual state due to independent / random events.

    Let’s examine #1 in greater detail. There’s a few ways to communicate the change in desired state:

    1. Send the new desired state to the agent (a “command” pattern). This approach works most of the time, except when the size of the state is very large. For instance, consider an agent responsible for storing a million objects. Deleting a single object would involve sending the whole desired state (999999 items). Another problem is that the command may not reach the agent (“the network is not reliable”). Finally, the agent may not be able to keep up with rate of change of desired state and start to drop some commands.  To fix this issue, the system designer might be tempted to run more instances of the agent; however, this usually leads to race conditions and out-of-order execution problems.
    2. Send just the delta from the previous desired state. This is fraught with problems. This assumes that the controller knows for sure that the previous desired state was successfully communicated to the agent, and that the agent has successfully implemented the previous desired state. For example, if the first desired state was “2 running apache containers” and the delta that was sent was “+1 apache container”, then the final actual state may or may not be “3 running apache containers”. Again, network reliability is a problem here. The rate of change is an even bigger potential problem here: if the agent is unable to keep up with the rate of change, it may drop intermediate delta requests. The final actual state of the system may be quite different from the desired state, but the agent may not realize it! Idempotence in the delta commands helps in this case.
    3. Send just an indication of change (“interrupt”). The agent has to perform the additional step of fetching the desired state from the controller. The agent can compute the delta and change the actual state to match the delta. This has the advantage that the agent is able to combine the effects of multiple changes (“interrupt debounce”). By coalescing the interrupts, the agent is able to limit the rate of change. Of course the network could cause some of these interrupts to get “lost” as well. Lost interrupts can cause the actual state to diverge from the desired state for long periods of time. Finally, if the desired state is very large, the agent and the orchestrator have to coordinate to efficiently determine the change to the desired state.
    4. The agent could poll the controller for the desired state. There is no problem of lost interrupts; the next polling cycle will always fetch the latest desired state. The polling rate is critical here: if it is too fast, it risks overwhelming the orchestrator even when there are no changes to the desired state; if too slow, it will not converge the the actual state to the desired state quickly enough.

    To summarize the potential issues:

    1. The network is not reliable. Commands or interrupts can be lost or agents can restart / disconnect: there has to be some way for the agent to recover the desired state
    2. The desired state can be prohibitively large. There needs to be some way to efficiently but accurately communicate the delta to the agent.
    3. The rate of change of the desired state can strain the orchestrator, the network and the agent. To preserve the stability of the system, the agent and orchestrator need to coordinate to limit the rate of change, the polling rate and to execute the changes in the proper linear order.
    4. Only the latest desired state matters. There has to be some way for the agent to discard all the intermediate (“stale”) commands and interrupts that it has not been able to process.
    5. Delta computation (the difference between two consecutive sets of desired state) can sometimes be more efficiently performed at the orchestrator, in which case the agent is sent the delta. Loss of the delta message or reordering of execution can lead to irrecoverable problems.

    A persistent message queue can solve some of these problems. The orchestrator sends its commands or interrupts to the queue and the agent reads from the queue. The message queue buffers commands or interrupts while the agent is busy processing a desired state request.  The agent and the orchestrator are nicely decoupled: they don’t need to discover each other’s location (IP/FQDN). Message framing and transport are taken care of (no more choosing between Thrift or text or HTTP or gRPC etc).


    There are tradeoffs however:

    1. With the command pattern, if the desired state is large, then the message queue could reach its storage limits quickly. If the agent ends up discarding most commands, this can be quite inefficient.
    2. With the interrupt pattern, a message queue is not adding much value since the agent will talk directly to the orchestrator anyway.
    3. It is not trivial to operate / manage / monitor a persistent queue. Messages may need to be aggressively expired / purged, and the promise of persistence may not actually be realized. Depending on the scale of the automation, this overhead may not be worth the effort.
    4. With an “at most once” message queue, it could still lose messages. With  “at least once” semantics, the message queue could deliver multiple copies of the same message: the agent has to be able to determine if it is a duplicate. The orchestrator and agent still have to solve some of the end-to-end reliability problems.
    5. Delta computation is not solved by the message queue.

    OpenStack (using RabbitMQ) and CloudFoundry (using NATS) have adopted message queues to communicate desired state from the orchestrator to the agent.  Apache CloudStack doesn’t have any explicit message queues, although if one digs deeply, there are command-based message queues simulated in the database and in memory.

    Others solve the problem with a combination of interrupts and polling – interrupt to execute the change quickly, poll to recover from lost interrupts.

    Kubernetes is one such framework. There are no message queues, and it uses an explicit interrupt-driven mechanism to communicate desired state from the orchestrator (the “API Server”) to its agents (called “controllers”).

    Courtesy of Heptio

    (Image courtesy:

    Developers can use (but are not forced to use) a controller framework to write new controllers. An instance of a controller embeds an “Informer” whose responsibility is to watch for changes in the desired state and execute a controller function when there is a change. The Informer takes care of caching the desired state locally and computing the delta state when there are changes. The Informer leverages the “watch” mechanism in the Kubernetes API Server (an interrupt-like system that delivers a network notification when there is a change to a stored key or value). The deltas to the desired state are queued internally in the Informer’s memory. The Informer ensures the changes are executed in the correct order.

    • Desired states are versioned, so it is easier to decide to compute a delta, or to discard an interrupt.
    • The Informer can be configured to do a periodic full resync from the orchestrator (“API Server”) – this should take care of the problem of lost interrupts.
    • Apparently, there is no problem of the desired state being too large, so Kubernetes does not explicitly handle this issue.
    • It is not clear if the Informer attempts to rate-limit itself when there are excessive watches being triggered.
    • It is also not clear if at some point the Informer “fast-forwards” through its queue of changes.
    • The watches in the API Server use Etcdwatches in turn. The watch server in the API server only maintains a limited set of watches received from Etcd and discards the oldest ones.
    • Etcd itself is a distributed data store that is more complex to operate than say, an SQL database. It appears that the API server hides the Etcd server from the rest of the system, and therefore Etcd could be replaced with some other store.

    I wrote a Network Policy Controller for Kubernetes using this framework and it was the easiest integration I’ve written.

    It is clear that the Kubernetes creators put some thought into the architecture, based on their experiences at Google. The Kubernetes design should inspire other orchestrator-writers, or perhaps, should be re-used for other datacenter automation purposes. A few issues to consider:

    • The agents (“controllers”) need direct network reachability to the API Server. This may not be possible in all scenarios, needing another level of indirection
    • The API server is not strictly an orchestrator, it is better described as a choreographer. I hope to describe this difference in a later blog post, but note that the API server never explicitly carries out a step-by-step flow of operations.

    0 0

    The Stack Clash class of bugs can be easily prevented on Gentoo.

    1. Add -fstack-check to your CFLAGS. It instructs the compiler to touch every page when extending the stack by more than one page. So the kernel will trap in the guard page. This even makes the larger stack gap in recent kernels unnecessary (if you don't run other binaries)


    CFLAGS="-march=native -O2 -pipe -fstack-check"

    2. Recompile important libraries (like openssl) and programs (setuid root binaries in shadow and util-linux) or simply everything: emerge -ae world

    As always, keep your system uptodate regularly: emerge -uavD world

    0 0

    It's been a little over six years since I first ventured to Kraków, Poland. I have fond memories of that trip, mostly because Trish was with me and we explored lots of sites. Last month, I visited Kraków for GeeCON, but only stayed for one night.

    Last week, I had the pleasure of visiting a third time for my first Devoxx Poland. I was excited to travel internationally again with my favorite travel shirt on. This caused a funny conversation with TSA just before my departure.

    Heading to the airport in my favorite travel shirt

    I arrived in Krakow on a beautiful day and took an Ubër to my hotel next to the venue. I took a stroll along the Vistula River to enjoy the sunshine.

    Along the Vistula RiverGorgeous day for a stroll in Krakow

    A beautiful day in Krakow

    I attended the conference happy hour that evening, then journeyed to a local restaurant for some delicious food and fun conversations. There's a chance one of those conversations inspires a speaking tour in South Africa next year.

    On Thursday, I attended a couple sessions on application and microservices security, and delivered my talk on PWAs with Ionic, Angular, and Spring Boot. You can check out my presentation below or on Speaker Deck.

    I was amazed at the sheer size of Devoxx Poland! Not only was the venue massive, but its 2500 attendees filled it up quickly. Thursday evening was the speaker's dinner, on a boat no less! It was a great location, with lots of familiar faces.

    Friday, I spoke about What's New in JHipsterLand. My presentation can be viewed below or on Speaker Deck. You can download the PDF from Speaker Deck for clickable links.

    In other news, I've been busy writing blog posts for the @OktaDev blog.

    I also wrote an article for titled "The Ultimate Guide to Progressive Web Applications."

    For the next couple of weeks, I'll be on vacation in Montana. Then it's time for ÜberConf, vJUG (with Josh Long!), and Oktane17.

    If I don't see you on the rivers in Montana or at an upcoming conference, I hope you have a great summer!

    0 0

    Due to recent updates your folder preview of images in KDE Dolphin is probably broken. The culprit is the exiv2 library (which in itself is a major problem).

    To fix that, rebuild exiv2 first: emerge -1av exiv2

    Then check what uses it: revdep-rebuild -pL libexiv2

    And recompile that: emerge -1av kde-apps/libkexiv2 kde-apps/kio-extras kfilemetadata gwenview

    0 0

    This is the third post in a series of articles on securing Apache Solr. The first post looked at setting up a sample SolrCloud instance and securing access to it via Basic Authentication. The second post looked at how the Apache Ranger admin service can be configured to store audit information in Apache Solr. In this post we will extend the example in the first article to include authorization, by showing how to create and enforce authorization policies using Apache Ranger.

    1) Install the Apache Ranger Solr plugin

    The first step is to install the Apache Ranger Solr plugin. Download Apache Ranger and verify that the signature is valid and that the message digests match. Now extract and build the source, and copy the resulting plugin to a location where you will configure and install it:

    • mvn clean package assembly:assembly -DskipTests
    • tar zxvf target/ranger-${version}-solr-plugin.tar.gz
    • mv ranger-${version}-solr-plugin ${ranger.solr.home}
    Now go to ${ranger.solr.home} and edit "". You need to specify the following properties:
    • POLICY_MGR_URL: Set this to "http://localhost:6080"
    • REPOSITORY_NAME: Set this to "solr_service".
    • COMPONENT_INSTALL_DIR_NAME: The location of your Apache Solr server directory
    Save "" and install the plugin as root via "sudo -E ./". Make sure that the user who is running Solr can read the "/etc/ranger/solr_service/policycache". Now follow the first tutorial to get an example SolrCloud instance up and running with a "gettingstarted" collection. We will not enable the authorization plugin just yet.

    2) Create authorization policies for Solr using the Apache Ranger Admin service

    Now follow the second tutorial to download and install the Apache Ranger admin service. To avoid conflicting with the Solr example we are securing, we will skip the section about auditing to Apache Solr (sections 3 and 4). In addition, in section 5 the "audit_store" property can be left empty, and the Solr audit properties can be omitted. Start the Apache Ranger admin service via: "sudo ranger-admin start", and open a browser at "http://localhost:6080", logging on with "admin/admin" credentials. Click on the "+" button for the Solr service and create a new service with the following properties:
    • Service Name: solr_service
    • Username: alice
    • Password: SolrRocks
    • Solr URL: http://localhost:8983/solr
    Hit the "Test Connection" button and it should show that it has successfully connected to Solr. Click "Add" and then click on the "solr_service" link that is subsequently created. We will grant a policy that allows "alice" the ability to read the "gettingstarted" collection. If "alice" is not already created, go to "Settings/User+Groups" and create a new user there. Delete the default policy that is created in the "solr_service" and then click on "Add new policy" and create a new policy called "gettingstarted_policy". For "Solr Collection" enter "g" here and the "gettingstarted" collection should pop up. Add a new "allow condition" granting the user "alice" the "others" and "query" permissions.

    3) Test authorization using the Apache Ranger plugin for Solr

    Now we are ready to enable the Apache Ranger authorization plugin for Solr. Download the following security configuration which enables Basic Authentication in Solr as well as the Apache Ranger authorization plugin:
    Now upload this configuration to the Apache Zookeeper instance that is running with Solr:
    • server/scripts/cloud-scripts/ -zkhost localhost:9983 -cmd putfile /security.json security.json
     Now let's try to query the "gettingstarted" collection as 'alice':
    • curl -u alice:SolrRocks http://localhost:8983/solr/gettingstarted/query?q=author_s:Arthur+Miller
    This should be successful. However, authorization will fail for the case of "bob":
    • curl -u bob:SolrRocks http://localhost:8983/solr/gettingstarted/query?q=author_s:Arthur+Miller
    In addition, although "alice" can query the collection, she can't write to it, and the following query will return 403:
    • curl -u alice:SolrRocks http://localhost:8983/solr/gettingstarted/update -d '[ {"id" : "book4", "title_t" : "Hamlet", "author_s" : "William Shakespeare"}]'

    0 0

    I just wanted to share a little status update on where we are with the Camel in Action 2nd edition book.

    We recently completed the last round of reviews from a selected group of readers whom have provided anonymous feedback on the material.

    Based on their feedback we were able to do some changes to the material before we handed it over to pre-production.

    One point we knew and also have gather feedback about is the length of the book. For example the last MEAP release has staggering 996 pages in the PDF file.

    We have identified up till about 100 pages that was the weakest content in the book, which we would then cut out from the final book. Out of those 100 pages we will made available the last two chapters as online bonus chapters, freely available for download. The IoT and Reactive chapters felt a bit out of place accordingly to reviewers. Don't despair these bonus chapters will go under the same scrutiny as all the other chapters to ensure the same high quality level you would expect from a Manning book.

    At this point all the chapters are in pre-production phase, where they are undergoing technical review, proofing, etc.

    As part of the pre-production phase, we will do the cut and any final post cleanup changes before the material is final and the book is handed over to type setting.

    The dead-line for the pre-production is end of July, which means that the book will go into type setting afterwards and then getting much closer to be done. While the chapters are in type setting we will work on what is called the front matter, which is essentially all the other stuff. And when that work is complete, then we can sit back and just wait until the book is printed and we have it in out hands.

    0 0

    Long ago, Lucene could only use a single thread to write new segments to disk. The actual indexing of documents, which is the costly process of inverting incoming documents into in-memory segment data structures, could run with multiple threads, but back then, the process of writing those in-memory indices to Lucene segments was single threaded.

    We fixed that, more than 6 years ago now, yielding big indexing throughput gains on concurrent hardware.

    Today, hardware has only become even more concurrent, and we've finally done the same thing for processing deleted documents and updating doc values!

    This change, in time for Lucene's next major release (7.0), shows a 53% indexing throughput speedup when updating whole documents, and a 7.4X - 8.6X speedup when updating doc values, on a private test corpus using highly concurrent hardware (an i3.16xlarge EC2 instance).

    Buffering versus applying

    When you ask Lucene's IndexWriter to delete a document, or update a document (which is an atomic delete and then add), or to update a doc-values field for a document, you pass it a Term, typically against a primary key field like id, that identifies which document to update. But IndexWriter does not perform the deletion right away. Instead, it buffers up all such deletions and updates, and only finally applies them in bulk once they are using too much RAM, or you refresh your near-real-time reader, or call commit, or a merge needs to kick off.

    The process of resolving those terms to actual Lucene document ids is quite costly as Lucene must visit all segments and perform a primary key lookup for each term. Performing lookups in batches gains some efficiency because we sort the terms in unicode order so we can do a single sequential scan through each segment's terms dictionary and postings.

    We have also optimized primary key lookups and the buffering of deletes and updates quite a bit over time, with issues like LUCENE-6161, LUCENE-2897, LUCENE-2680, LUCENE-3342. Our fast BlockTree terms dictionary can sometimes save a disk seek for each segment if it can tell from the finite state transducer terms index that the requested term cannot possibly exist in this segment.

    Still, as fast as we have made this code, only one thread is allowed to run it at a time, and for update-heavy workloads, that one thread can become a major bottleneck. We've seen users asking about this in the past, because while the deletes are being resolved it looks as if IndexWriter is hung since nothing else is happening. The larger your indexing buffer the longer the hang.

    Of course, if you are simply appending new documents to your Lucene index, never updating previously indexed documents, a common use-case these days with the broad adoption of Lucene for log analytics, then none of this matters to you!

    Concurrency is hard

    With this change, IndexWriter still buffers deletes and updates into packets, but whereas before, when each packet was also buffered for later single-threaded application, instead IndexWriter now immediately resolves the deletes and updates in that packet to the affected documents using the current indexing thread. So you gain as much concurrency as indexing threads you are sending through IndexWriter.

    The change was overly difficult because of IndexWriter's terribly complex concurrency, a technical debt I am now convinced we need to address head-on by somehow refactoring IndexWriter. This class is challenging to implement since it must handle so many complex and costly concurrent operations: ongoing indexing, deletes and updates; refreshing new readers; writing new segment files; committing changes to disk; merging segments and adding indexes. There are numerous locks, not just IndexWriter's monitor lock, but also many other internal classes, that make it easy to accidentally trigger a deadlock today. Patches welcome!

    The original change also led to some cryptictestfailuresthanks to our extensive randomized tests, which we are working through for 7.0.

    That complex concurrency unfortunately prevented me from making the final step of deletes and updates fully concurent: writing the new segment files. This file writing takes the in-memory resolved doc ids and writes a new per-segment bitset, for deleted documents, or a whole new doc values column per field, for doc values updates.

    This is typically a fast operation, except for large indices where a whole column of doc-values updates could be sizable. But since we must do this for every segment that has affected documents, doing this single threaded is definitely still a small bottleneck, so it would be nice, once we succeed in simplifying IndexWriter's concurrency, to also make our file writes concurrent.

    0 0

    The summer has been great so far, and as usual, instead of watching yet another sport event final, you've decided to catch up with your colleagues after work and do a new round of the Apache CXF JOSE coding. Nice idea they said.

    The idea of  creating an application processing the content encrypted for the multiple recipients has captured your imagination.

    After reviewing the CXF JWE JSON documentation you've decided to start with the following client code. This code creates a client proxy which posts some text.

    JWE JSON filter registered with the proxy will encrypt whatever the content the proxy is sending (does not have to be only text) only once, and the content encrypting key (CEK) will be encrypted with the recipient specific encrypting keys. Thus if you have 2 recipients then CEK will be encrypted twice.

    Registering the  and with the proxy instructs the JWE JSON filter that a JWE JSON container for 2 recipients needs to be created, that the content encryption algorithm is A128GCM and key encryption algorithm is A128KW, and each recipient is using its own symmetric key encryption key. Each recipient specific entry will also include a 'kid' key identifier of the key encryption key for the service to figure out which JWE JSON entry is targeted at which recipient.

    Setting up the client took you all one hour.

    Next task was to prototype a service code. That was even easier. Loading the recipient specific properties, locating a recipient specific entry and getting the decrypted content was all what was needed.

    Two hours in total. Note I did not promise it would take you 30 mins to do all the POC, it would've been really a child's play which is not realistic. With the two hours long project it is more complex, it felt like it was a walk in the park :-)

    0 0

    In the past, we covered using Lagom to implement Java microservices using the event sourcing and CQRS patterns that framework relies on. Today, we’ll be revisiting our blog microservice example using the Scala programming language and Akka, one of the main components underlying Lagom.

    Event Sourcing

    First, let’s take a quick review of what event sourcing and CQRS are along with the general design patterns to use to implement these ideas. Event sourcing is an architectural pattern where all activity in a system is captured by immutable event objects, and these events are stored and published in a sequential log. The state of a system at a given time can therefore be derived from the sequence of events prior to the state. For example, most SQL databases are implemented using this sort of architecture where all data mutation operations are stored as events in a write-ahead log. This pattern works very well with the physical storage underlying these devices as they tend to be based on magnetic disks which can work very fast in sequential access but become a lot slower in random access.

    Using the concept of event sourcing, Command Query Responsibility Segregation is an architectural pattern that is rather self-describing: the responsibility of a system to handle commands and queries are separated rather than intertwined as in typical applications. Commands are used to update state while queries are used to inspect state. The separation of these two concerns allows for much greater scalability in distributed systems than traditional approaches such as a single backing database for both reading and writing data. While CQRS is not required in order to implement a write-side and read-side pair of databases, it does offer a strong pattern to follow to do so. For example, an application like Twitter may wish to store all tweets in a distributed database such as Cassandra, but in order to effectively search those tweets, it would be useful to copy written data into Elasticsearch for later querying. Provided that we follow an event sourcing pattern, then the source streams of data are well defined and make it much simpler to implement this responsibility segregation.

    In CQRS, we can issue a command to a system to update its state. This command, after validation, will create an immutable event which is appended to the event log. After persisting this event, the state of the system can be updated in response. By doing this, the system can always be brought back to any given state by replaying the events in the event log. For performance reasons, the state of a system is periodically snapshotted and persisted so that less events from the main event log need to be replayed when restarting the system. The state obtained from the event log can be as simple as the denormalized representation of entities, or it could be more complex views of the event stream which can be further combined down the line. It’s important to note that following this pattern means we need to manually reimplement some things we take for granted in a normal RDBMS such as transactions, indexing, joins, constraints, and triggers. Ideally, these sorts of things can be abstracted well enough to be provided mostly through libraries or frameworks, but some things such as transactions are still application-specific in how to make compensating transactions to roll back erroneous data.

    Actor Model and Akka

    The actor model is essentially a model of distributed systems where processes are modeled as “actors”, things that have an inbox for receiving messages that are processed asynchronously. Actors can only communicate with other actors via messages, and this allows actors to be scaled outward into fully distributed systems. A simple way to think of an actor is as a thread that cannot directly access the state of any other thread and must pass immutable message objects to other threads to coordinate anything. This avoids the use of synchronization, locks, and all the extremely difficult to debug concurrency issues that plague typical concurrent code.

    Actors are lightweight and can be spawned and restarted many times with very little overhead (far less than an actual thread). Thus, following the ideals of programming languages like Erlang, actors are small enough to allow for a more robust form of error handling known as “let it crash”. Forming a suitable hierarchy of actors, parent actors can automatically restart child actors when errors occur or perform other fallback strategies. This is particularly useful for long-running systems where either a full restart is infeasible or the cost of resources to double up the system for blue green deployments is prohibitive. It may also be the case that recovering the full state of the system would take too long, so being able to selectively restart only single components without affecting the rest of the system is very useful.

    Akka is an implementation of this actor model inspired by Erlang. Akka is written in Scala and provides both a Scala and Java API for actors, clustering, persistence, distributed data, reactive streams, HTTP, and integration with various external systems.

    All code samples in this post are from my GitHub repository. We’ll be using Scala 2.12 which requires Java 8. Note that if you’re still stuck on Java 6 or 7, Scala 2.10 and 2.11 both only require Java 6. Code samples may be adaptable to previous releases of Scala with little or no modifications. Anyways, let’s jump in to using Scala!

    Blog API

    Our API to implement today will be the same as in my Lagom tutorial. As such, we’ll have a blog data type consisting of three fields: title, author, and body. Blog posts are identified by UUIDs. The REST API will consist of a GET, POST, and PUT endpoint for looking up, creating, and updating blog posts respectively. The first thing to implement, then will be our blog data type.


    You may note that there isn’t much to it, and you’re right! Let’s review the syntax here. First of all, we don’t need to define things as public in Scala as they’re all assumed to be public by default. In fact, public is not even a keyword in Scala. After a class name comes its parameters which can be thought of as the parameters to its constructor. A class can still have multiple constructors by defining methods named this within the class, but we have no need for that here. Parameters and variables are declared in the opposite order as Java; in Java, we would say String foo, whereas in Scala we would say foo: String. The other keywords used here are final which makes the class final as in Java, and case which makes this a “case class”.

    Recall that in Java, we can use Lombok to automatically create an all-args constructor, getters for all fields, equals, toString, and hashCode, builders, withers, and other boilerplate code. In Scala, we can simply add case to a class to get a lot of that for free. A case class makes all its class parameters (fields) available with something similar to getters, adds sensible equals, hashCode, and toString method implementations, and handles some other Scala-specific features we’ll cover later. Essentially, a case class can be thought of as a data value class, and we’ll be using it as such.

    Next, we’ll define a post id class instead of directly using UUID.


    We have several new keywords to discuss here. First, by adding val to the id class parameter, this will add a Scala-style getter for id by providing a way to access the id field directly. Classes use the extends keyword to extend other classes or implement traits (interfaces). This AnyVal class is a special class in Scala that works essentially as a boxed value type. This value class can only contain a single public field, and at compile time, the compiler will attempt to remove all indirect access of that field and replace it with direct access. Thus, we can create a wrapper type for UUID without sacrificing performance at runtime.

    The override keyword is used just like the @Override annotation in Java: while not required, if the keyword is present but the method doesn’t actually override anything, this will cause a compiler error. This can be useful for catching typos or unknown API changes. The def keyword is used for defining methods. When a method has zero arguments, the parenthesis are optional. The conventional style here is that parentheses are omitted in pure functions while they remain in side-effecting functions. As with variables, the types of parameters and method return values come afterwards.

    The object keyword here defines what is essentially a singleton instance of a class of the same name. Scala does not have static, but these objects provide an equivalent feature. When an object is named the same as a class, it is called the class’s “companion object”. A class and its companion object have equal access to internals of each other similar to how static and non-static members work in Java.

    The apply methods here are used as factory methods to create PostId instances. The apply method has special meaning in Scala: we can omit the name of the method when calling it. For example, PostId.apply() is the same as calling just PostId(). As can be seen here, the return keyword is optional when the last line of executing code of a method is the return value.

    The implicit keyword is used for a lot of things in Scala, and in this case, an implicit variable is one that is available for implicit parameters to methods that take them. We’ll be using it here to fill in custom encoder and decoder implementations for our PostId class to ensure that the JSON marshaller and unmarshaller do not think that this is a complex object.

    The square bracket syntax, Decoder[PostId], is Scala’s generic type parameter syntax. The equivalent in Java would be Decoder<PostId>. Finally, these codecs are derived from existing UUID codecs, so we map and contramap them using some lambda functions. Both of these lambda functions use anonymous parameter syntax, both of which can be expanded to:>PostId(id))Encoder.encodeUUID.contramap(postId=>

    Next, let’s stub out a service interface. We’ll fill in the details later.


    The trait keyword is very similar to an interface in Java. In fact, as written, this trait will compile directly into an interface. Traits are far more powerful than Java interfaces, so this isn’t always possible, but Java has started making interfaces more powerful starting with default methods in Java 8, so these two concepts may converge one day. The main difference between a trait and a class in Scala is that a trait cannot have its own parameters, but a class can extend (or mix in) multiple traits. So far, this is still very similar to Java. However, traits can contain fields, private and protected members, default implementations, and even constraints on what the concrete implementing class must conform to. We’ll omit the return types of the methods for now, but we’ll return to add them later once we’ve defined them.

    Blog Entity

    Next, let’s dive down to the entity level. Recall that in Lagom, a persistence entity is associated with three related things: commands, events, and state. Following a similar pattern here, we’ll implement a BlogEntity class by first defining our commands, events, and state classes.

    objectBlogEntity{sealedtraitBlogCommandfinalcaseclassGetPost(id:PostId)extendsBlogCommandfinalcaseclassAddPost(content:PostContent)extendsBlogCommandfinalcaseclassUpdatePost(id:PostId,content:PostContent)extendsBlogCommandsealedtraitBlogEvent{valid:PostIdvalcontent:PostContent}finalcaseclassPostAdded(id:PostId,content:PostContent)extendsBlogEventfinalcaseclassPostUpdated(id:PostId,content:PostContent)extendsBlogEventfinalcaseclassPostNotFound(id:PostId)extendsRuntimeException(s"Blog post not found with id $id")typeMaybePost[+A]=Either[PostNotFound, A]finalcaseclassBlogState(posts:Map[PostId, PostContent]){defapply(id:PostId):MaybePost[PostContent]=posts.get(id).toRight(PostNotFound(id))def+(event:BlogEvent):BlogState=BlogState(posts.updated(,event.content))}objectBlogState{defapply():BlogState=BlogState(Map.empty)}}

    The first new thing of note here is the sealed keyword. A trait or class marked sealed indicates that all subclasses of that trait or class must be contained in the same file. This makes pattern matching on these types of classes easier as it aids the compiler in detecting missing patterns checked by the programmer and can help prevent certain classes of bugs.

    We have three types of commands: GetPost, AddPost, and UpdatePost. These are all rather trivial and mirror the same commands from the Lagom version of this microservice.

    Next, we defined a sealed trait BlogEvent which contains two public values. The val keyword is similar to marking a variable as final in Java, while the var keyword is similar to a normal variable in Java. When defining a val or var on a class, this is technically creating a field in the class along with a getter-style method (named the same as the field) and a setter-style method if using var. What this effectively does is allows the variable to be accessed as if it were a public field of the class. Note that by not giving a val or var an initial value, this makes them abstract. It may be worth noting here that semicolons are optional in Scala and tend to be omitted; otherwise, this would have been our first usage of them.

    For our events, we only care about commands, not queries (in the CQRS sense of the words, not a BlogCommand type of command), so we defined two events: PostAdded and PostUpdated.

    Next, we abuse the case class feature to implement our own exception class, PostNotFound, to obtain some handy features for free. This demonstrates the syntax to call the constructor of a superclass as well. There is one other nifty feature in use here: interpolated strings. Scala provides an extensible feature to make interpolated strings which are expanded out and filled in at compile time. The syntax s"Hello, $foo!" would be essentially the same as "Hello, " + foo + "!" at compile time. We could use more advanced expressions inside the string by wrapping the variable name in curly braces, so for instance, we could say s"Hello, ${foo.capitalize}!".

    After that, we define a type alias named MaybePost[+A]. There are a few things going on in this, so let’s break it down. First of all, Scala allows us to define type aliases which can be used to alias a larger type into something more readable, or it can even be used simply to rename types. In the generic type parameter, the +A bit indicates that, if B is a subclass of A, then MaybePost[B] is a subclass of MaybePost[A]. This is called a covariant type parameter. If we omitted the plus, we’d have an invariant type parameter which means that regardless of how A and B are related, Foo[A] and Foo[B] would not be related. There is a third type of variance available called contravariant type parameters which use -A and imply the opposite subclass relationship between Foo[A] and Foo[B] from A and B. We alias this MaybePost from the Either[A, B] type from the Scala standard library which is a type that can be either a Left or Right value containing a type A or B respectively. This class is normally used as a more powerful form of Option (Optional in Java) where the left side type is an exception type and the right side type is the success value type.

    Finally, we come to the BlogState class and companion object. We have some strange names chosen here for methods: apply and +. Wait, +? Yup! In Scala, you can name a method pretty much anything you like. Scala will translate the names into valid Java identifiers at compile time. As explained above, the apply method is treated specially by allowing the word apply to be omitted when calling the method. There are several other special method names that allow for syntax features in Scala such as update, unapply, map, flatMap, filter, withFilter, foreach, and methods named after mathematical operators such as +, -, and *. We will not be covering most of them, but it’s worth noting their names. One other thing to note here is that the default Map[K, V] type used here is an immutable hash map, and in keeping with the spirit of immutability, we will be updating our BlogState by returning a new BlogState with a new Map which contains the old map plus an additional item. Thus, the use of posts.updated(key, value) returns a new map with the addition or modification of the provided key value mapping.

    With all that out of the way, let’s move on to the entity implementation.


    Note that we need to extend PersistentActor to use akka-persistence, the mechanism we’ll be using to save blog events to disk. This class is dense with syntax we haven’t covered yet, so let’s go over it all. First of all, note how we imported all the members of the BlogEntity object. The _ in the import is equivalent to using * in Java (Scala doesn’t use * here because * is a valid class and method name in Scala). Note that we don’t ever use an import static like in Java as Scala does not have static things (though it can import static things from Java without having to specify it’s a static import). Since BlogEntity is already in scope, we did not have to specify the full package name preceding it in the import. We also import the members of the context variable which is defined in a superclass. We do this to gain access to some implicitly available variables used in the asynchronous code below.

    Next, note that we were able to construct a BlogState object by omitting the keyword new because we defined a method named apply. Thus, we are actually calling BlogState.apply() here. In the next line, we override an abstract method to identify this aggregate root for persistence purposes. An alternative name here may be the fully qualified class name.

    Another neat syntax to note here is that in Scala, a zero-arg method can omit its parenthesis. This feature allows for a lot of neat things such as allowing a val to override an arg-less def in the parent class. Our next method, receiveCommand, returns an Akka type called Receive which is a type alias for a lambda function that takes a single parameter of type Any and returns nothing. This warrants a quick overview of some of the standard Scala types. The Any class is the root class, and from there are AnyVal and AnyRef. The AnyVal class is for types like Int, Boolean, Double, etc., along with user-defined value classes as explained in the PostId description, while AnyRef is equivalent to Object in Java. There is also the Unit class which is equivalent to void in Java. The difference here is that technically, all methods must return a value in Scala, so for an empty return type, Unit can be used which has only one possible value: (). There are also two bottom-most types in Scala: Nothing and Null. The Nothing type is generally used as a return value for a method that does not actually return (e.g., it always throws an exception). The Null type is the type of the null reference which is a subtype of everything. Try not to confuse these with the None type of an empty Option or the Nil instance of an empty List.

    The receive method of an actor tends to be implemented as a pattern match expression. A pattern match takes the form of foo match { case a => ...; case b => ...; ... }, and this is a very powerful feature used pervasively in Scala. In our use case here, we can omit the foo match part to create a lambda function that matches an anonymous value. In our cases, we’ll be looking for messages that match our command types. Since our commands are all case classes, Scala has generated an unapply method for each which makes the classes destructurable so to say within pattern match expressions. For example, the expression case GetPost(id) => will match when the matched object is an instance of GetPost, and it will subsequently bind its one field to the new variable named id. We can use nested patterns here if we wanted, but that is a more advanced feature.

    In the GetPost example, we already have a handy BlogState.apply(id) method available as a lookup to find the content for a given id. This returns our MaybePost[PostContent] type which can be used to determine whether or not we got the content. Using the sender() method, we can send a message back to the actor that sent the initial message. The ! method used here is an alias for the tell method. Note that we omitted the dot and parenthesis here by using the infix notatation syntax of calling methods. This syntax allows, for example, the method call 1.+(1) to be written as 1 + 1. This is particularly useful for methods that have an operator syntax like +, but it can also be useful in functional programming contexts as well. Thus, our line of code is equivalent to sender().!(state.apply(id)) when expanded out.

    Next, to handle the AddPost command, we create a PostAdded event with a new id and its content. The result of handling the event is a Future[PostAdded], so we use the pipeTo pattern in Akka to send the result of that future back to the sender. As mentioned for the infix method call syntax, this line can be equivalently written as handleEvent(PostAdded(PostId(), content)).pipeTo(sender()). We’ll be reusing the event handling logic in UpdatePost, so we’ll come back to that. The next line contains a Unit to return explicitly. We do this because we are using a set of strict compiler flags which would make discarding a return value an error otherwise. Note that this is not required in the default Scala compiler settings, but we’re trying to stick to high quality Scala code.

    The UpdatePost command requires a bit more work than AddPost did. This is to validate that the id provided already exists. Thus, first we look up in our state for an existing post. If it does not exist, we’ll get a Left(error) value with some exception error; if it does exist, we’ll get a Right(content) value with content being the PostContent value. Thus, we combine this with a pattern match to check for both types. The syntax, case response @ Left(_), uses two pattern types concurrently: binding the matched expression to the new variable named response, and matching that it is a Left type with any content (the _ is a wildcard match in this context as a throwaway variable). If the post doesn’t exist, we send back the error. Otherwise, we create a PostUpdated event, handle it, and send it back.

    Next, we look at handling the event. Let’s break down the new syntax. The [E <: BlogEvent] bit is a generic method type parameter, where <: is equivalent to extends in a Java generic method. Conversely, the >: generic syntax would be equivalent to super in Java. For example, this method may be written in Java as private <E extends BlogEvent> Future<E> handleEvent(E event). The other syntax of note here is the e: => E parameter. This is similar to a zero-arg lambda function, but when called as such, does not require being wrapped in a lambda. Had we used e: () => E as the parameter instead, then we would have had to call the method as handleEvent(() => PostAdded(...)) instead.

    In order to create a Future here, we’re using the Promise class. A Scala Promise works rather similarly to promises in JavaScript and other languages. To handle the event, we call the persist method defined in the PersistentActor superclass. We use another new syntax here where a method with a single lambda function parameter can be replaced with curly braces. A lambda function can always be surrounded in curly braces, so this syntax is taking advantage of some infix method call syntax features. This syntax is rather similar to how Closures work in Groovy. The lambda provided to persist here is called after the event has been successfully persisted. In this, we complete our promise which is returned as a Future to the caller. After that, we call state += event. Since we never defined a method called += on BlogState, Scala sees that state is a var, so it can rewrite that expression into state = state + event. Since we do in fact have such a method available, the whole expression is equivalent to calling state = state.+(event). After that, we publish the event to the system event stream which can be used as an application-wide event bus. Finally, we add in a periodic call to saveSnapshot every 1000 events which will allow us to restart and recover the state a lot faster than rereading the entire event log.

    The last bit of code we needed to implement in this actor was the receiveRecover method which is used to recover the latest snapshot on startup if available. We can populate this actor’s state directly from the snapshot, and then subsequently handle all the events that weren’t included in that snapshot. The only new syntax used here is the case event: BlogEvent pattern match syntax which matches when the object is an instance of BlogEvent and binds it to the new variable event.

    Blog Service

    Now that we’ve written our entity actor, we can fill in the stub BlogService trait defined earlier.


    The AkkaConfiguration trait we mix in here is a trait we defined for easy access to Akka-related objects such as an ActorRefFactory to create actors, a Materializer which is used to run Akka Streams, an ExecutionContext which is used for coordinating Futures and other asynchronous functionality, and a Timeout which is used for a default timeout value when making request/reply-style ask calls to actors.

    As noted before, we can import all the members of the BlogEntity companion object here. It’s also worth noting that Scala allows you to import things to whatever scope you want, so we can limit our imports to the closest spot it is actually used.

    To use our BlogEntity actor, we need to spawn it first. Using our ActorRefFactory, we can spawn a BlogEntity using the default Props of a BlogEntity. Spawning the actor will create it and start it asynchronously, returning an ActorRef instance which can be used to interact with the actor. Since actors can be restarted, replaced, or even located on completely different processes or machines, we never directly access an actor’s instance and instead use its ActorRef to send messages, watch it, etc. Since actors are written in a rather type-unsafe fashion, we’re wrapping the actor’s messaging API into a typesafe trait.

    In order to receive responses from our actor when sending a message, we must do so asynchronously. Thus, our API returns futures. We utilize the ask pattern to send a message to the actor and wait for a response message. Since actors are not typed, the message response comes back as a Future[Any], so we use the mapTo[A] method on Future to verify it matches our expected type and cast it. Other than that, this trait is rather self-explanatory based on our BlogEntity.

    Blog REST API

    Next up is defining our REST API. We’ll be using the high level Akka HTTP route DSL for this. Using our BlogService trait, we’ll mix that in to another trait to define our API.

    traitBlogRestApiextendsRestApiwithBlogService{overridedefroute:Route=pathPrefix("api"/"blog"){(pathEndOrSingleSlash&post){// POST /api/blog/
    entity(as[PostContent]){content=>onSuccess(addPost(content)){added=>complete((StatusCodes.Created,added))}}}~pathPrefix({id=>pathEndOrSingleSlash{get{// GET /api/blog/:id
    onSuccess(getPost(id)){caseRight(content)=>complete((StatusCodes.OK,content))caseLeft(error)=>complete((StatusCodes.NotFound,error))}}~put{// PUT /api/blog/:id

    There are a couple new syntax features here along with many DSL-specific things going on. First of all, this introduces the keyword with which is used when extending multiple traits. The RestApi trait is one we made that mixes in the Akka HTTP route DSL along with support for marshalling and unmarshalling our case classes and primitive types into JSON. The only other new syntax here is the tuple syntax. A tuple is an abstraction of an ordered pair. A tuple can have two or more values that do not have to be the same type. They are contained in parenthesis and separated by commas. Due to syntax ambiguity, when sending an inline tuple to a method, we need to double up on the parenthesis to avoid it being interpreted as a call to a method with multiple parameters. All other syntax in this trait are features of the route DSL. For example, URI paths can be matched using implicit conversions from strings into URI patterns, and those patterns can be composed with / which matches a slash in the URI. Segments of the path can be extracted into parameters in the lambda function provided after. We can easily unmarshal request bodies using the entity(as[A]) { a: A => ... } DSL. Using our service mixin, we can forward these requests and get response which can be chained back into the HTTP response. Finally, the ~ function is used to chain two routes together into a single route. Far more comprehensive information about the routing DSL can be found in the Akka documentation.

    In order to run this code, we still need to create our HTTP server and set up Akka in general. That code is not very interesting in itself and is included in the code samples. While this post only scratches the surface of Scala, it provides a broad overview of various Scala-specific syntax features that really differentiate it from Java. There is one other feature that has only been mentioned by name so far, and that is implicits. Implicits are a powerful feature specific to Scala that helps reduce boilerplate typing in various scenarios. Implicits can generally be used for a few different things:

    • Implicit values can be used to provide parameters to functions or class constructors without being explicitly written as long as it’s in scope. This is useful for passing around parameters that are needed by tons of methods such as the ExecutionContext object mentioned above.
    • Implicit parameters can be used to automatically be filled in by an implicit value that is in scope so that the parameter doesn’t have to be typed out over and over again.
    • Implicit methods can be used to convert from one type to another when the original type is not compatible where it is used. For example, this feature is used to automatically convert an Int to a Long when passed to a method parameter that takes a Long. This can be a very dangerous feature when misused.
    • Implicit classes can be used to provide extension methods to an existing API. This is used to add methods to java.lang.String, for example, which cannot normally be done as String is a final class that cannot be extended. Extension methods provide a compositional way to extend APIs in a type safe fashion without having to explicitly wrap the API everywhere an extension method is desired.
    • Implicits can be used to provide type classes to Scala, a feature common to functional programming languages. Type classes are used very seldomly in object oriented programming languages. An example of a simple type class in Java is the Comparable<T> or Enum<E> interface.

    Overall, Scala is a fantastic programming language with a simple core set of features and tons of extensibility. In the Scala ecosystem, many developers prefer using the Akka stack of frameworks which includes Play, Lagom, Apache Spark, and Slick. There is also the buzzword-laden SMACK stack which consists of Spark, Mesos, Akka, Cassandra, and Kafka, which is a great set of related technology for handling big data applications.

    0 0

    Wow, it’s been a long time since I’ve done non-open source blogging here! Most of my time is spent at Community Over Code or speaking at conferences, building a new consulting gig up (more on that soon), or continuing work … Continue reading

    0 0

    This post is a follow up to Using ShiftLeft in Open Source, where I was looking to see if I could apply the principle of shift left testing to security. Now that ShiftLeft has a user interface, I want to come back to it and revisit looking at results from the UI instead of pouring through JSON reports. You’ll find that this write up parallels my original post so reading the original is not required to get up to speed.

    Getting Rid of FUD and Panic

    To get us started, allow me to go through the premise from my initial post: My long term goal is to formally insert security awareness into my development practices and eventually into my continuous integration-based builds.

    After years of being involved in open-source development at Apache, we’ve seen security issues pop up in Apache Commons like arbitrary remote code execution, and denial of service attacks (CVE-2016-3092 and CVE-2014-0050). While some threats are real, other are just FUD. Even when they are real, it is important to consider context. There may be problems that end users never see because the “hole” is not reachable by users’ actions.

    The idea behind ShiftLeft is to break old habits of building a product and then, later, figuring out how to fend off attacks and plug up security holes. Today, we take for granted that unit testing, integration testing, continuous integration, and continuous delivery are common place. ShiftLeft propose to make security analysis as ubiquitous.


    By DonFiresmith (Own work) [CC BY-SA 4.0], via Wikimedia Commons

    Getting Started

    Since ShiftLeft is free for open source projects, I decided to look what it reports for Apache Commons IO, an Apache Commons Java component.

    To get started, go to and enter a GitHub repository URL.


    ShiftLeft then asks you for your name and email address:


    And you are off to the races.

    It’s important to note that  ShiftLeft has a 30 day disclosure policy so you have plenty of time to fix up your FOSS projects.

    My previous post looked at the 2.5 release tag for Apache Commons IO; here I am working with my GitHub fork of the master branch, which I’ve kept up-to-date. While my initial experiment with ShiftLeft gave me a 150 KB JSON report to pour over, here, I have a nice web UI to explore:


    What does it all mean? We have three areas in the UI that we will explore:

    • The top-left shows a summary for the current state of the repository’s master branch: the latest commit details and a summary of conclusions (in white boxes.)
    • The dark-colored list on the left shows what ShiftLeft calls conclusions. These are our potentially actionable items. As we’ll see, even if you find some conclusions non-actionable, these will do a great deal to raise your awareness of potential security issues for code that you’ll write tomorrow or need to maintain today. You can expand each item (dark box) to reveal more information.
    • On the right-hand-side, you see a tree with paths of allpublic classes organized by package. On the left of that pane is a list of packages. You can expand each package to reveal of the public classes it contains. You can then expand each class to show its methods. We’ll see of this later. Leading away from tree item that have a conclusion, you’ll see light-colored path to its category. In other words, if you see a path leading away from an item, be it a package or class, that means one of its containing items carries with it a conclusion.

    The first thing to notice of course is that I no longer have to consider the whole JSON report file. In the UI, the conclusions are presented in an expandable list without having to  filter out the graph data (and thank goodness for that.) There is also a heading called “Issues” you will use to track which conclusions you want to track for changes. Since we’ve not marked any conclusions as issues, the UI presents the expected “0” count and that “No conclusions marked as issues”.

    The first UI elements to notice are the two summary boxes for “Sensitive Data” and “Untrusted Data”. ShiftLeft uses these two terms in conclusion descriptions to organize its findings.

    The Trusted and Sensitive Kind

    Lets describe “Sensitive Data” and “Untrusted Data”.


    Conclusions described as dealing with Sensitive Data tell you: Lookout, if you have a password in this variable, it’s in plain text. Now, it’s up to me to make sure that this password does not end up in a clear text file or anywhere else that is not secure. This is where context matters, you are the SME of your code, you know how much trouble you can get yourself and your users into, ShiftLeft has no opinion, it offers ‘conclusions.’

    Conclusions referring to Untrusted Data: This tells me I should take precautions before doing anything with that data. Should I just execute this script? Should I need to worry about JSON Hijacking? See Why does Google prepend while(1); to their JSON responses?


    Looking for Trouble Again

    Let’s start with a simple conclusion and get deeper in the weeds after that. When you click on “Sensitive Data” and “Untrusted Data”, you filter the list of conclusions. I choose “Untrusted Data” because I am looking for the first interesting conclusion I found while writing Using ShiftLeft in Open Source: The method IOUtils.buffer(Writer, int) does not support handling untrusted data to be passed as parameter size because it controls the size of a buffer, giving an attacker the chance to starve the system of memory. I find it quickly using a page search:


    I can click on the link to open a page on exact line of code in GitHub:


    While this example may seem trivial, ShiftLeft shows understanding of what the code does in this method: We are allowing call sites to control memory usage in an unbounded manner.

    Let’s imagine an application that would allow an unbound value to be used, for example, to process a 2 GB file and that would care about this API and the conclusion rendered by ShiftLeft. To track this conclusion, we mark it as an issue to have it tracked in our Issues list:


    Now, for the fun part. Let’s edit the code to guard against unbounded usage. Let’s institute an arbitrary 10 MB limit. We’ll change the code from:

         * Returns the given Writer if it is already a {@link BufferedWriter}, otherwise creates a BufferedWriter from the
         * given Writer.
         * @param writer the Writer to wrap or return (not null)
         * @param size the buffer size, if a new BufferedWriter is created.
         * @return the given Writer or a new {@link BufferedWriter} for the given Writer
         * @throws NullPointerException if the input parameter is null
         * @since 2.5
        public static BufferedWriter buffer(final Writer writer, int size) {
            return writer instanceof BufferedWriter ? (BufferedWriter) writer : new BufferedWriter(writer, size);


        private static final int MAX_BUFFER_SIZE = 10 * 1024 * 1024; // 10 MB
         * Returns the given Writer if it is already a {@link BufferedWriter}, otherwise creates a BufferedWriter from the
         * given Writer.
         * @param writer the Writer to wrap or return (not null)
         * @param size the buffer size, if a new BufferedWriter is created.
         * @return the given Writer or a new {@link BufferedWriter} for the given Writer
         * @throws NullPointerException if the input parameter is null
         * @since 2.5
        public static BufferedWriter buffer(final Writer writer, int size) {
        	if (size &amp;amp;amp;amp;amp;amp;amp;amp;gt; MAX_BUFFER_SIZE) {
                throw new IllegalArgumentException("Request buffer cannot exceed " + MAX_BUFFER_SIZE);
            return writer instanceof BufferedWriter ? (BufferedWriter) writer : new BufferedWriter(writer, size);

    After pushing this change to GitHub, I do not see a change in my ShiftLeft report; ah, this is a beta still, should I chalk this up to work in progress or is there still potential trouble ahead?

    I wonder if this method shouldn’t be always flagged anyway. Yes, I changed the code so that the memory allocation is no longer unbounded, but who is to decide if my MAX_BUFFER_SIZE is reasonable or not? It might be fine for a simple use case like a single threaded app does does it once. What if I have ten thousand concurrently tasks that want to do this? Is that still reasonable? I’m not so sure. So for now, I think I like being notified of this memory allocation.

    Digging deeper

    In my previous ShiftLeft post — based on Apache Commons IO 2.5, not master — I had found this conclusion (in raw form edited for brevity):

     "id": ",",
     "description": "The method `copyFileToDirectory` does not support handling **sensitive data** to be passed as parameter `srcFile` because it is leaked over I/O **File**.",
     "unsupportedDataType": "SENSITIVE",
     "interfaceId": "FILE/false",
     "methodId": ",",
     "codeLocationUrl": "",
     "state": "NEUTRAL",
     "externalIssueUrl": "https://todo"

    Looking at the methodId tells us to go look at FileUtils.copyFileToDirectory(File, File) where we find:

     * Copies a file to a directory preserving the file date.
     * This method copies the contents of the specified source file
     * to a file of the same name in the specified destination directory.
     * The destination directory is created if it does not exist.
     * If the destination file exists, then this method will overwrite it.
     * &amp;amp;amp;amp;amp;amp;amp;amp;amp;lt;strong&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;Note:&amp;amp;amp;amp;amp;amp;amp;amp;amp;lt;/strong&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt; This method tries to preserve the file's last
     * modified date/times using {@link File#setLastModified(long)}, however
     * it is not guaranteed that the operation will succeed.
     * If the modification operation fails, no indication is provided.
     * @param srcFile an existing file to copy, must not be {@code null}
     * @param destDir the directory to place the copy in, must not be {@code null}
     * @throws NullPointerException if source or destination is null
     * @throws IOException if source or destination is invalid
     * @throws IOException if an IO error occurs during copying
     * @see #copyFile(File, File, boolean)
     public static void copyFileToDirectory(final File srcFile, final File destDir) throws IOException {
      copyFileToDirectory(srcFile, destDir, true);

    This method just delegates to another copyFileToDirectory() with an added parameter, no big deal. What is interesting is that the codeLocationUrl points to code not in this method but to a private utility method:

    FileUtils at line 1141 is in the guts of a private method called, File, boolean) which is where ShiftLeft flagged an issue where the method creates a new FileInputStream. Because ShiftLeft is working with a code graph, when I search the JSON conclusions for this URL, I find a total of 14 conclusions that use this URL. This tells me that this code fragment creates 14 possible vulnerabilities in the component; with a careful emphasis on possible since context is important.

    If I search in the Conclusions list on the left f the page, I find several hits for “FileUtils.copyFileToDirectory”. Then, I can click to expand each one so see the exact location and hyperlink to GitHub. What I hope is coming is the ability to filter sort so I create a mental picture like I was able with the JSON report.

    ShiftLeft also has a user friendly way to discover this information: the tree view:


    In this view, the “” node is the topmost package in Apache Commons IO. You can see that it has a path that leads to all three different categories: Generic, File, and Child process. This means that the root package contains conclusions and that these conclusions are in the linked categories.

    When I expand the root node, I find the FileUtils class (highlighted):


    You can see that the class has a path leading away from it, so I know it contains conclusions. At that point, it’s a little harder to make sense of the categories as they’ve scrolled off the top of the screen. It would be nice if the categories floated down as you scroll. Version 2 I hope! You can also see that some classes like FilenameUtils and IOCase do not have paths leading away from them and therefore do not carry conclusions. A relief I suppose, but I’d like to ability to filter out items that are conclusion-free.

    I now expand the FileUtils class:


    Here, some methods have paths, some don’t; scrolling down, we get to copyFileToDirectory:


    As expected, the method has a path leading away from it which indicates a conclusion but we do not know which kind or which one. We do get a description of its parameters though, a nice touch.

    For now, clicking on the method does not do anything where I would expect to be able perform the same operations as in the list. This view lets you explore the whole library but I do not find it terribly useful beyond the path to categories. I’d like to see hyperlinks to code and also the use of color to distinguish which methods are flagged as Untrusted Data and Sensitive Data as well as an indication as to which categories are involved that does not scroll of the screen.

    The nice thing though is that I have two paths of exploration in the UI: the conclusion list and the explorer tree.

    There are two key technologies at work here and that I expect both to get better as the beta progresses: First, building a code graph to give us the power to see that once a problem has been identified on a line of code, that all (I assume public) call-sites can be flagged. Second, what constitutes a problem or a conclusion in ShiftLeft’s neutral parlance will improve and be configurable, filterable and sortable.

    In this example, the conclusion description reads:

    The method `copyFileToDirectory` does not support handling **sensitive data** to be passed as parameter `srcFile` because it is leaked over I/O **File**.

    What goes through my head when I read that is: Yeah, I do not want just anybody to be able to copy any file anywhere like overwriting a password vault a la copyFileToDirectory(myFile, "/etc/shadow"). Granted, Apache Commons IO is a library, not an application, so there is no alarm bells to ring here, but you get the idea.

    Stepping back, I think it is important to reiterate what happened here: ShiftLeft found an issue (less dramatic than a problem) on a line of code in a private methods, then, using its code graph, created conclusions (report items) for each public facing method that may eventually call this private method in its code path.

    Working from a baseline

    If you think that having a list over 200 hundred conclusions to sift through is daunting, I would agree with you. This is why I look forward to using some sorting and filtering in the UI!

    What matters just as much is how to use ShiftLeft when your code evolves. I want to track differences from commit to commit and from build to build: Did I create or squash vulnerabilities? This I can tell by watching the Conclusions and Issues list in the UI. I am hoping that ShiftLeft will implement a similar feature to Coveralls where you get an email that tells how much your test code coverage has changed in a build.

    As an experiment, let’s see what happens when I add some possibly malicious code, a method to delete all files and directories from a given directory:

    public class ADangerousClass {
        public void deleteAll(File directory) throws IOException {

    Note that all this method does is delegate to another method. I hit refresh in my browser and I see my commit:


    My commit comment, date, and commit hash are there. ShiftLeft goes to work for about two minutes (the two counts are reset to 0 as ShiftLeft is analyzing.) Then the Sensitive Data and Untrusted Data conclusion counts have gone up. Scrolling down I see my new class:


    I also see it in the tree of course:


    Notice that the deleteAll method has a path to the File category on the right hand side, this makes sense based on my previous findings.

    Now I really want to click on the categories on the right as filters! I am especially intrigued by the “Child process” category.

    What is worth noting here is that my new class and method do not in themselves actually do anything dangerous. But since we are working with a code graph, and that graph leads to a dangerous place, the new code is flagged.

    Now for a bit of fun, let’s change the method to make the dangerous bits unreachable:

        public void deleteAll(File directory) throws IOException {
            if (false) {

    The dangerous class is gone from the list but present in the tree since it is a public API. What if it’s something more tricky? Let’s make some code unreachable through a local variable, and we will make it final to make it obvious to the code graph that the value is immutable:

        public void deleteAll(File directory) throws IOException {
            final boolean test = 1 == 2;
            if (test) {

    The dangerous class is still gone from the list. Pretty clever it is. Let’s see about delegating the test to a method:

        public void deleteAll(File directory) throws IOException {
            final boolean test = test();
            if (test) {
        private boolean test() {
            return 1 == 2;

    ShiftLeft now shows the deleteAll() method in both the Untrusted Data and Sensitive Data lists. So that’s a false positive. Let’s get away from using a method and use two local variables instead:

        public void deleteAll(File directory) throws IOException {
            final Object obj = null;
            boolean test = true;
            if (obj == null) {
                test = false;
            if (test) {

    With this change, ShilfLeft still puts the method as Untrusted Data and Sensitive Data lists. OK, so this is a bit like Eclipse’s compiler warnings for null analysis, it flags what it can see without really evaluating, fair enough.

    Linking to the root cause

    Let’s go back to the conclusions list for a minute. My deleteAll experiment created two conclusions: one untrusted data, one senstive data. Let’s take a closer look at these.

    Untrusted Data


    The method deleteAll does not support handling untrusted data to be passed as parameter directory because it controls access to I/O File in a manner that would allow an attacker to abuse it.

    When I click on the GitHub link for Untrusted Data, I see:

    Note that we are not in the deleteAll method here, rather we are where the ShiftLeft code graph flags as the root issue. In other words, if I wrote a public method that called deleteAll, I would get the same conclusion and link. Graph Power!

    Why is calling directory.listFiles() labeled untrusted? Well, passing a sensitive file path should not be considered a problem, because the file path you are searching for would not end up written on the disk. It is however considered dangerous if attackers were to control the input path, because they could be able to list arbitrary directories on the system. That’s a breach.

    Only considering the method verifiedListFiles(), ShiftLeft does not know that the method is used in an operation to delete files. That’s up next:

    Sensitive Data
    The method deleteAll does not support handling sensitive data to be passed as parameter directory because it is leaked over I/O File.

    When I click on the GitHub link for Sensitive Data, I see:

    Clearly calling File.delete() can be trouble but using the sensitive data category may be a bit of a stretch. If any sensitive data is used in a file operation, (for example, as the path of the file, like “path/to/my-secrets”,) then that data will end up on disk. For a delete operation, you could say that that’s not the case because you’re doing the reverse, but actually just the fact that you are deleting a file with a sensitive name is interesting. It’s also possible that you already had previously written sensitive data unencrypted to the disk. That’s a roundabout way to get there but it feels justifiable.

    Finding arbitrary code attacks

    When I first ran ShiftLeft on Apache Commons 2.5, I found a few conclusions for arbitrary code attacks in the Java7Support class. Now that Apache Commons in Git master requires Java 7, the Java7Support class is gone. At the moment, I’ve not found a way to run ShiftLeft on anything but the master branch of a repository, so let’s make our own trouble with Method.invoke() to call BigInteger.intValueExact() on Java 8 and intValue() on older versions of Java:

    import java.lang.reflect.InvocationTargetException;
    import java.lang.reflect.Method;
    import java.math.BigInteger;
    public class BigIntHelper {
        private static Method intValueExactMethod;
        static {
            try {
                intValueExactMethod = BigInteger.class.getMethod("intValueExact");
            } catch (NoSuchMethodException | SecurityException e) {
                intValueExactMethod = null;
        public static int getExactInt(BigInteger bigInt) {
            try {
                return (int) (intValueExactMethod != null
                    ? intValueExactMethod.invoke(bigInt)
                    : bigInt.intValue());
            } catch (IllegalAccessException | IllegalArgumentException | InvocationTargetException e) {
                return bigInt.intValue();
        public static void main(String[] args) {

    This code is OK by ShiftLeft even though our intValueExactMethod variable is private but not final:


    Let’s open things up by making the variable by changing:

    private static Method intValueExactMethod;


    public static Method intValueExactMethod;

    For the Java7Support class in Apache Commons 2.5, ShiftLeft reports several arbitrary code attack vulnerabilities. Unfortunately, ShiftLeft does not report any such vulnerabilities for this example. Growing pains I suppose. Well, that’s all I have for now. A fun exploration in an area I’d like to get back to soon.


    fin_3_0I’d like to wrap up this exploration of ShiftLeft with a quick summary of what we found: a tool we can add to our build pipelines to find potential security vulnerabilities.

    There are a lot of data here, and this is just for Apache Commons IO! Another lesson is that context matters. This is low-level library as opposed to an application. Finding vulnerabilities in a low level library is good but this may not be vulnerabilities for your application. ShiftLeft conclusions can at least make you aware of how to use this library safely. ShiftLeft currently provides conclusions based on a code graph, this is powerful, as the examples show. We found conclusions about untrusted data (I’m not sure what’s in here so don’t go executing it) and sensitive data (don’t save passwords in plain text!)

    I hope revisit this story and run ShiftLeft on other Apache Commons projects soon. This sure is fun!

    Happy Coding,
    Gary Gregory

    0 0

    The time has come for a regular OT post.

    The journey of the software developer is always about finding the home where he or she can enjoy being every day, can look forward to contributing to the bigger effort every day.

    In addition to that the journey of the web services developer is always about finding the web services framework which will help with creating the coolest HTTP service on the Web. We all know there are many quality HTTP service frameworks around.

    My software developer's journey so far has been mostly about supporting one of such web services frameworks, Apache CXF. It has been a great journey.

    Some of you helped by using and contributing to Apache CXF earlier, some of you are long term Apache CXF users and contributors, preparing the ground for the new users and contributors who are yet to discover CXF.

    No matter which group you are in, even if you're no longer with CXF, I'm sure you've had that feeling at least once that you'd like your CXF experience last forever :-).

    Listen to a message from the best boys band in the world. Enjoy :-)


    0 0

    I felt like writing a small and complete example, clarifying the role of XML Schema 1.1 inheritable attributes (please see inheritable = boolean within the definition "XML Representation Summary: attribute Element Information Item").

    The XSD 1.1 inheritable attributes, are primarily useful while implementing Type Alternatives.

    Below is a specific XSD 1.1 example, and two corresponding valid XML documents. These as a whole implement inheritable attributes, and few other areas of XSD 1.1.

    XSD document:

    <?xml version="1.0"?>
    <xs:schema xmlns:xs="">
       <xs:element name="X">
                <xs:element name="Y">
                   <xs:alternative test="@y = 'A'">
                            <xs:element name="A">
                                     <xs:extension base="xs:string">
                                        <xs:attribute name="attr" type="xs:integer"/>
                   <xs:alternative test="@y = 'B'">
                            <xs:element name="B">
                                     <xs:extension base="xs:string">
                                        <xs:attribute name="attr" type="xs:integer"/>
             <xs:attribute name="y" inheritable="true">
                   <xs:restriction base="xs:string">
                     <xs:enumeration value="A"/>
                     <xs:enumeration value="B"/>

    (the most interesting parts are shown italicized)

    XML document 1:

    <?xml version="1.0"?>
    <X y="A">
        <A attr="1">hello</A>

    XML document 2:

    <?xml version="1.0"?>
    <X y="B">
        <B attr="1">hello</B>

    The XML documents shown (1 & 2) should be validated with the XSD document provided.

    I'll provide a little explanation here, with-respect-to the semantics of this example:
    An inheritable attribute "y" is defined on element "X". The XSD type of element "Y" is chosen at runtime (i.e at validation time), depending on the value of this attribute. Notice how, the attribute defined on element "X" makes its definition accessible to element "Y" in the schema document.

    If you're a novice to XML Schema 1.1, I'd suggest reading some areas of the XML Schema 1.1 language more deeply.

    I do hope, that this post is useful to some practitioners in this field.

    0 0

    On thursday 20th July I am doing a live webinar:

    For Java developers, it may be daunting to get started developing container applications that run locally on Kubernetes/OpenShift. 

    In this session, we’ll build a set of Apache Camel- and Java-based microservices that use Spring Boot and WildFly Swarm. We’ll show how fabric8 Maven tools can be used to build, deploy, and run your Java projects on local or remote OpenShift clusters, as well as to easily perform live debugging. 

    Additionally, we’ll discuss best practices for building distributed and fault-tolerant microservices using technologies such as Kubernetes Services, Netflix Hystrix, and Apache Camel Enterprise Integration Patterns (EIPs) for fault tolerance.

    The webinar is on a timezone that is friendly to developers based in the asia/pacific region which is at 1:00 pm SGT (Singapore Time). That means I have to get up early in the morning ;)

    The webinar is a mix between slides and live demos (5 demo sessions) so there is a lot of action going on. I have captured all the important information in the slides, so after attending the webinar you should be able to try this on your own, by just browsing the slides, and downloading the sample code.

    You can register (for free) to the webinar with this link. I am not aware of any upper cap, but you may have to hurry to be safe to get a spot, because I was told there is already move than 1300 registations a couple of days ago.

    0 0

    This is the first post in a series of articles on securing Apache Hive. In this article we will look at installing Apache Hive and doing some queries on data stored in HDFS. We will not consider any security requirements in this post, but the test deployment will be used by future posts in this series on authenticating and authorizing access to Hive.

    1) Install and configure Apache Hadoop

    The first step is to install and configure Apache Hadoop. Please follow section 1 of this earlier tutorial for information on how to do this. In addition, we need to configure two extra properties in 'etc/hadoop/core-site.xml':

    • hadoop.proxyuser.$user.groups: *
    • hadoop.proxyuser.$user.hosts: localhost
    where "$user" above should be replaced with the user that is going to run the hive server below. As we are not using authentication in this tutorial, this allows the $user to impersonate the "anonymous" user, who will connect to Hive via beeline and run some queries.

    Once HDFS has started, we need to create some directories for use by Apache Hive, and change the permissions appropriately:
    • bin/hadoop fs -mkdir -p /user/hive/warehouse /tmp
    • bin/hadoop fs -chmod g+w /user/hive/warehouse /tmp
    • bin/hadoop fs -mkdir /data
    The "/data" directory will hold a file which represents the output of a map-reduce job. For the purposes of this tutorial, we will use a sample output of the canonical "Word Count" map-reduce job on some text. The file consists of two columns separated by a tab character, where the left column is the word, and the right column is the total count associated with that word in the original document.

    I've uploaded such a sample output here. Download it and upload it to the HDFS data directory:
    • bin/hadoop fs -put output.txt /data
    2) Install and configure Apache Hive

    Now we will install and configure Apache Hive. Download and extract Apache Hive (2.1.1 was used for the purposes of this tutorial). Set the "HADOOP_HOME" environment variable to point to the Apache Hadoop installation directory above. Now we will configure the metastore and start Hiveserver2:
    • bin/schematool -dbType derby -initSchema
    • bin/hiveserver2
    In a separate window, we will start beeline to connect to the hive server, where $user is the user who is running Hadoop (necessary as we are going to create some data in HDFS, and otherwise wouldn't have the correct permissions):
    • bin/beeline -u jdbc:hive2://localhost:10000 -n $user
    Once we are connected, then create a Hive table and load the map reduce output data into a new table called "words":
    • create table words (word STRING, count INT) row format delimited fields terminated by '\t' stored as textfile;
    • LOAD DATA INPATH '/data/output.txt' INTO TABLE words;
    Now we can run some queries on the data as the anonymous user. Log out of beeline and then back in and run some queries via:
    • bin/beeline -u jdbc:hive2://localhost:10000
    • select * from words where word == 'Dare';

    0 0

    This release changes DefaultComparisonFormatter in order to simplify creating custom ComparisonFormatters.

    The full list of changes:

    • made DefaultComparisonFormatter more subclass friendly. Issue #93.

    0 0

    Mapped the eject key of my old iMac to the following script to rate limit it:

    [ -e $L ] || touch $L
    exec 3<$L
    flock -n 3 || exit
    eject "$@"
    sleep 5
    Because I found it really annoying when the kids fill the keyboard buffer with eject events that take forever to be processed.

    0 0

    An LWN article suggests to disable maximum mount count. I do the opposite. I fsck on each boot. fsck on ext4 is very fast these days even on large filesystems. If you are like me and run the latest kernel it may save your butt from fs regressions that do sometimes happen.

    tune2fs -c 1 /dev/disk/by-label/ROOT

    0 0

    A recent blog post covered SSO support for Apache Syncope REST services. This was a new feature added in the 2.0.3 release, which allows a user to obtain a JWT from the Syncope "accessTokens/login" REST endpoint. This token can then be used to repeatedly invoke on a Syncope REST service. However, what if you wish to allow a user invoke on a Syncope REST service using a (JWT) token issued by a third party IdP instead? From Syncope 2.0.5 this will be possible.

    In this post we will cover how to use a JWT issued by a third-party to invoke on an Apache Syncope REST service. The code is available on github here:

    • cxf-syncope2-webapp: A pre-configured web application of the Syncope core for use in the tests.
    • cxf-syncope2: Some integration tests that use cxf-syncope2-webapp for authentication and authorization purposes. JWTTestIT illustrates third party SSO integration with Syncope as covered in this post.
    1) Configuring Apache Syncope to accept third-party JWTs

    Naturally, if we invoke on an Apache Syncope REST service using an arbitrary third-party token, access will be denied as Syncope will not be able to validate the signature on the token correctly. By default, Syncope uses the following properties defined in '' to both issue and validate signed tokens:
    • jwtIssuer: The issuer of the token
    • jwsKey: The Hex-encoded (symmetric) verification key
    The default signature algorithm is the symmetric algorithm HS512. To allow third-party tokens we need to implement the JWTSSOProvider interface provided in Syncope. By default, Syncope searches for JWTSSOProvider implementations on the classpath under the package name "org.apache.syncope.core", so no explicit configuration changes are required to plug in a custom JWTSSOProvider implementation.

    When Syncope receives a signed JWT it will query which of the configured JWTSSOProvider implementations can verify the token, by matching the 'getIssuer()' method to the issuer of the token. The 'getAlgorithm()' method should match the signature algorithm of the received token. The 'verify' method should validate the signature of the received token. The implementation used in the tests is available here. A keystore is read in and the certificate contained in it is used to verify the signature on the received token. 

    One final interesting point is that we need to map the authenticated JWT subject to a user in Syncope somehow. This is done in the JWTSSOProvider implementation via the 'resolve' method. In our test implementation, we map the JWT subject directly to a Syncope username.

    2) Obtain a JWT from the Apache CXF STS using REST

    Now that we have set up Apache Syncope to allow third-party JWTs, we need to obtain such a token to get our test-case to work. We will use the Apache CXF Security Token Service (STS) to obtain a JWT. For simplicity we will leverage the REST interface of the CXF STS, which allows us to obtain a token with a simple REST call. The STS is configured via spring to issue signed JWTs. User authentication to the STS is enforced via basic authentication. In the test code, we use the CXF WebClient to invoke on the STS and to get a JWT back:

    Now we can use this token with the Syncope client API to call the user "self service" successfully:

    0 0

    I’ve spent today in a workshop rehearsing Rachmaninov’s Vespers.  Perhaps the most celebrated major work of Russian orthodox music to enter our conscience – let alone repertoire – in Blighty, and perhaps the West more generally.  We will be performing it in concert on Tuesday evening, at the main church in Tavistock, as part of the Exon singers’ festival.

    While the music is of moderate complexity and not unduly challenging[1], what has made the day really hard work is singing in Russian.  That set me thinking.  It’s easy to sing a language I speak, but also a language I don’t speak but with which I have a workable level of familiarity, like Latin or French.  Russian is in a whole different league, not just due to the cyrillic alphabet (we have a broadly-phonetic latin transcription in the score), but more the near-complete unfamiliarity.  The crux of it is, it takes a lot more of my concentration than a more-familiar language, making it harder to look up at the conductor!

    If my time were unlimited, I’d love to learn Russian.

    [1] Not even the bass range.  We have a surprising number of low basses, so I’m singing the upper and (where applicable) middle bass lines, not the legendary Russian bottom range.

    0 0

    We have just copied a big chunk of files off a broken home NAS by mounting the hdd in a windows PC. Luckily this one has a windows format partition on it, the other one doesn't so I'll need to make a bootable Linux USB dongle, probably via and boot from that.

    However back to windows, and I can't read any of the files on Win10 because the permissions don't map onto the users and groups in Win10, and UAC doesn't help because it won't allow me to "be" an administrator.

    But I found the answer! :-D
    Run powershell "as administrator" cd to the broken directory and use this command to recursively reset the permissions to the system default set for a whole tree.:

    icacls .\ /reset/T/C/L

    If that doesn't work you can take ownership of the tree using this command and then try the icacls (eye cackles?) command again

    takeown.exe /F .\ /R

    0 0

    I have been interviewing a lot of software engineers recently, as I am leading a new team and looking to expand it.  That has led me to reflect a little on what I am actually looking for.  The following five qualities have been shared by all of the really good, fun-to-work-with developers who I have had the pleasure to work with.

    1. Technical mastery
    Really good developers fully understand what they are doing.  This might sound funny, but unfortunately, it is all too common for people to get things to work by cutting and pasting examples or fumbling through a quasi-random hacking process to arrive at code that "works" without actually understanding how or why (or in fact even if) it works.  There is nothing wrong with experimentation and leveraging experience - and working code - of others.  When really good developers do that, though, they always find their way to full understanding of the technologies and techniques that they are using.  When I interview developers, I always ask them to explain exactly how the solutions that they developed work.  I can usually tell very quickly if I am talking to an individual who masters the technology that they use.  I would much rather have a developer with strong mastery of a small set of technologies than someone whose resume is full of advanced technologies that they don't understand.

    2. Simple mindedness
    In The Humble ProgrammerEdsger W. Dijkstra said "The competent programmer is fully aware of the strictly limited size of his own skull; therefore he approaches the programming task in full humility, and among other things he avoids clever tricks like the plague." Really good developers have a wonderful tendency to keep things as simple as possible, as long as possible.  Fancy constructs, excessive OO complexity, needless external dependencies and exotic algorithms never find their way into their code.  If there is a simple way to do something, that is what they do.  Reading the code of a simple-minded developer is like reading a mathematical paper written by a great mathematician.  If the content is straightforward, the progression is 100% predictable.  You can stop in the middle and scribble out what should come next and then see that is what comes next.  When you get to a difficult part where you have to think, you are happy to see that the author found something so simple that you should have thought of it.

    3. Organizing
    Another one of my favorite Dijkstra quotes is that the art of programming is the art of organizing complexity.  Great developers are organizing forces.  Note that this is not the same as "being organized." It means that they help define problems in a way that they can be solved with simple solutions and they help get contracts, interface boundaries and tests in place so that the teams they are part of can be organized.  The scope of what developers can organize naturally grows as they progress in their careers; but they need to have the drive and ability to be "organizers" from the beginning.  Developers that have to be told how to think about problems are net drags on the teams they are part of.  Good ones are key contributors to their teams arrival at nicely organized approaches to problems.

    4. Fast-learning
    Technology changes so fast and business problems are spread across such a large surface that developers constantly need to learn new things.  And these things are not just programming languages, frameworks or constructs.  Developers need to learn business domain concepts, data science and AI concepts as needed, often in ridiculously short timeframes.  This means that they have to be able to learn very fast.  And they have to be able to do this and immediately exercise their knowledge with a high level of independence.  Its great to be able to learn together and share knowledge with others, but sometimes developers need to figure things out for themselves and good ones have the ability and determination to learn what they need to learn - however hairy it gets - to solve the problems in front of them.

    5. Situational awareness
    Good developers ask about - and clearly understand - the execution context of the code they work on.  If something needs to be thread-safe, they write it that way.  They know what the performance and scalability bottlenecks in and around their code are.  They know about its security context.  They see enough of the larger system that their code is running in / interacting with to ensure that it will be operable, failing fast and loudly when it needs to fail, maintaining invariants that it needs to maintain, and providing sufficient logging / events / monitoring interfaces.  And of course, all of this is validated in unit tests.

    I know some people will say that some of what I have above - especially in 3. and 5. - can't really be expected of "SE's." These qualities, one might argue, are qualities of architects.  Developers just need to be "feature machines" and architects can worry about how to organize the code and make sure the whole system is operable.  My biggest learning in 30 years of software development is that that is the wrong way to think about it.  Architecture influence and scope of vision naturally increases as developers progress in their careers; but it is part of what they do, every day, from day 1 and the better they do it, the more successful the teams they are part of can be.  And the senior ones - those who might have "Architect" or "Principal" or "Staff" in their titles - need to encourage, cultivate, challenge and be influenced by the design thinking of SEs at all levels.

    0 0

    Includes declarative pipeline support (note that you need Jenkins 2.66+ for it to work) and lots of bug fixes

    The full changelog:

    • Add an experimental Declarative Agent extension for Kubernetes JENKINS-41758 #127
    • Implement Port mapping #165
    • Support idleMinutes field in pipeline #154
    • Add command liveness probe support #158
    • Add toggle for node usage mode #158
    • Add namespace support on PodTemplate.
    • Make PodTemplate optional within pipeline JENKINS-42315
    • Make Slave Jenkins connection timeout configurable #141
    • Fix durable pipeline PID NumberFormatException JENKINS-42048 #157
    • Don’t provision nodes if there are no PodTemplates set to usage mode Normal #171
    • Refactoring add/set methods in PodTemplate #173
    • Delete the build pod after we have finished with the template block #172
    • Default to use the kubernetes.default.svc.cluster.local endpoint
    • Do not print stack trace on ConnectException
    • Upgrade kubernetes client to 2.3.1 JENKINS-44189
    • Step namespace should have priority over anything else #161
    • Wait for pod to exist up to 60 seconds before erroring #155
    • Catch IOException on ContainerExecProc#kill
    • Do not print stack trace on connection exception
    • Restore random naming for pipeline managed pod templates.
    • Dir context is not honored by shell step JENKINS-40925 #146
    • Limit pod name to 63 characters, and change the randomly generated string #143
    • Fix workingDir inheritance error #136
    • Use name instead of label for the nesting stack #137
    • Exception in configure page when ‘Kubernetes URL’ isn’t filled JENKINS-45282 #174
    • kubectl temporary config file should work where Jenkins project contains spaces #178
    • Thread/connection leak #177

    0 0

    This is the second post in a series of articles on securing Apache Hive. The first post looked at installing Apache Hive and doing some queries on data stored in HDFS. In this post we will show how to add authorization to the previous example using Apache Ranger.

    1) Install the Apache Ranger Hive plugin

    If you have not done so already, please follow the first post to install and configure Apache Hadoop and Apache Hive. Next download Apache Ranger and verify that the signature is valid and that the message digests match. Due to some bugs that were fixed for the installation process, I am using version 1.0.0-SNAPSHOT in this post. Now extract and build the source, and copy the resulting plugin to a location where you will configure and install it:

    • mvn clean package assembly:assembly -DskipTests
    • tar zxvf target/ranger-1.0.0-SNAPSHOT-hive-plugin.tar.gz
    • mv ranger-1.0.0-SNAPSHOT-hive-plugin ${ranger.hive.home}
    Now go to ${ranger.hive.home} and edit "". You need to specify the following properties:
    • POLICY_MGR_URL: Set this to "http://localhost:6080"
    • REPOSITORY_NAME: Set this to "cl1_hive".
    • COMPONENT_INSTALL_DIR_NAME: The location of your Apache Hive installation
    Save "" and install the plugin as root via "sudo -E ./". The Apache Ranger Hive plugin should now be successfully installed. Make sure that the default policy cache for the Hive plugin '/etc/ranger/cl1_hive/policycache' is readable by the user who is running the Hive server. Then restart the Apache Hive server to enable the authorization plugin.

    2) Create authorization policies in the Apache Ranger Admin console

    Next we will use the Apache Ranger admin console to create authorization policies for Apache Hive. Follow the steps in this tutorial to install the Apache Ranger admin service. Start the Ranger admin service via 'sudo ranger-admin start' and open a browser at 'http://localhost:6080', logging on with the credentials 'admin/admin'. Click the "+" button next to the "HIVE" logo and enter the following properties:
    • Service Name: cl1_hive
    • Username/Password: admin
    • jdbc.url: jdbc:hive2://localhost:10000
    Note that "Test Connection" won't work as the "admin" user will not have the necessary authorization to invoke on Hive at this point. Click "Add" to create the service. If you have not done so in a previous tutorial, click on "Settings" and then "Users/Groups" and add two new users called "alice" and "bob", who we will use to test authorization. Then go back to the newly created "cl1_hive" service, and click "Add new policy" with the following properties:
    • Policy Name: SelectWords
    • database: default
    • table: words
    • Hive column: *
    Then under "Allow Conditions", give "alice" the "select" permission and click "Add".

    3) Test authorization with Apache Hive

    Once our new policy has synced to '/etc/ranger/cl1_hive/policycache' we can test authorization in Hive. The user 'alice' can query the table according to our policy:
    • bin/beeline -u jdbc:hive2://localhost:10000 -n alice
    • select * from words where word == 'Dare'; (works)
    However, the user 'bob' is denied access:
    • bin/beeline -u jdbc:hive2://localhost:10000 -n alice
    • select * from words where word == 'Dare'; (fails)

    0 0

    On this page, Roger L. Costello has posted some wonderful write up on XML Schema 1.1 technology. Enthusiasts are encouraged to read that.

    Roger's language is very simple, and covers almost every thing from the perspective of XML Schema 1.1 user's needs.

    0 0

    One of the big challenges people face when starting out working with Cassandra and time series data is understanding the impact of how your write workload will affect your cluster. Writing too quickly to a single partition can create hot spots that limit your ability to scale out. Partitions that get too large can lead to issues with repair, streaming, and read performance. Reading from the middle of a large partition carries a lot of overhead, and results in increased GC pressure. Cassandra 4.0 should improve the performance of large partitions, but it won’t fully solve the other issues I’ve already mentioned. For the foreseeable future, we will need to consider their performance impact and plan for them accordingly.

    In this post, I’ll discuss a common Cassandra data modeling technique called bucketing. Bucketing is a strategy that lets us control how much data is stored in each partition as well as spread writes out to the entire cluster. This post will discuss two forms of bucketing. These techniques can be combined when a data model requires further scaling. Readers should already be familiar with the anatomy of a partition and basic CQL commands.

    When we first learn about data modeling with Cassandra, we might see something like the following:

    CREATE TABLE raw_data (
        sensor text,
        ts timeuuid,
        readint int,
        primary key(sensor, ts)
      AND compaction = {'class': 'TimeWindowCompactionStrategy', 
                        'compaction_window_size': 1, 
                        'compaction_window_unit': 'DAYS'};

    This is a great first data model for storing some very simple sensor data. Normally the data we collect is more complex than an integer, but in this post we’re going to focus on the keys. We’re leveraging TWCS as our compaction strategy. TWCS will help us deal with the overhead of compacting large partitions, which should keep our CPU and I/O under control. Unfortunately it still has some significant limitations. If we aren’t using a TTL, as we take in more data, our partition size will grow constantly, unbounded. As mentioned above, large partitions carry significant overhead when repairing, streaming, or reading from arbitrary time slices.

    To break up this big partition, we’ll leverage our first form of bucketing. We’ll break our partitions into smaller ones based on time window. The ideal size is going to keep partitions under 100MB. For example, one partition per sensor per day would be a good choice if we’re storing 50-75MB of data per day. We could just as easily use week (starting from some epoch), or month and year as long as the partitions stay under 100MB. Whatever the choice, leaving a little headroom for growth is a good idea.

    To accomplish this, we’ll add another component to our partition key. Modifying our earlier data model, we’ll add a day field:

    CREATE TABLE raw_data_by_day (
    sensor text,
    day text,
    ts timeuuid,
    reading int,
    primary key((sensor, day), ts)
           AND COMPACTION = {'class': 'TimeWindowCompactionStrategy', 
                         'compaction_window_unit': 'DAYS', 
                         'compaction_window_size': 1};

    Inserting into the table requires using the date as well as the now() value (you could also generate a TimeUUID in your application code):

    INSERT INTO raw_data_by_day (sensor, day, ts, reading) 
    VALUES ('mysensor', '2017-01-01', now(), 10);

    This is one way of limiting the amount of data per partition. For fetching large amounts of data across multiple days, you’ll need to issue one query per day. The nice part about querying like this is we can spread the work over the entire cluster rather than asking a single node to perform a lot of work. We can also issue these queries in parallel by relying on the async calls in the driver. The Python driver even has a convenient helper function for this sort of use case:

    fromitertoolsimportproductfromcassandra.concurrentimportexecute_concurrent_with_argsdays=["2017-07-01","2017-07-12","2017-07-03"]# collecting three days worth of datasession=Cluster([""]).connect("blog")prepared=session.prepare("SELECT day, ts, reading FROM raw_data_by_day WHERE sensor = ? and day = ?")args=product(["mysensor"],days)# args: ('test', '2017-07-01'), ('test', '2017-07-12'), ('test', '2017-07-03')# driver handles concurrency for youresults=execute_concurrent_with_args(session,prepared,args)# Results:#[ExecutionResult(success=True, result_or_exc=<cassandra.cluster.ResultSet object at 0x106d36750>),# ExecutionResult(success=True, result_or_exc=<cassandra.cluster.ResultSet object at 0x106d36a90>),# ExecutionResult(success=True, result_or_exc=<cassandra.cluster.ResultSet object at 0x106d36550>)]

    A variation on this technique is to use a different table per time window. For instance, using a table per month means you’d have twelve tables per year:

    CREATE TABLE raw_data_may_2017 (
        sensor text,
        ts timeuuid,
        reading int,
        primary key(sensor, ts)
    ) WITH COMPACTION = {'class': 'TimeWindowCompactionStrategy', 
                         'compaction_window_unit': 'DAYS', 
                         'compaction_window_size': 1};

    This strategy has a primary benefit of being useful for archiving and quickly dropping old data. For instance, at the beginning of each month, we could archive last month’s data to HDFS or S3 in parquet format, taking advantage of cheap storage for analytics purposes. When we don’t need the data in Cassandra anymore, we can simply drop the table. You can probably see there’s a bit of extra maintenance around creating and removing tables, so this method is really only useful if archiving is a requirement. There are other methods to archive data as well, so this style of bucketing may be unnecessary.

    The above strategies focuses on keeping partitions from getting too big over a long period of time. This is fine if we have a predictable workload and partition sizes that have very little variance. It’s possible to be ingesting so much information that we can overwhelm a single node’s ability to write data out, or the ingest rate is significantly higher for a small percentage of objects. Twitter is a great example, where certain people have tens of millions of followers but it’s not the common case. It’s common to have a separate code path for these types of accounts where we need massive scale

    The second technique uses multiple partitions at any given time to fan out inserts to the entire cluster. The nice part about this strategy is we can use a single partition for low volume, and many partitions for high volume.

    The tradeoff we make with this design is on reads we need to use a scatter gather, which has significantly higher overhead. This can make pagination more difficult, amongst other things. We need to be able to track how much data we’re ingesting for each gizmo we have. This is to ensure we can pick the right number of partitions to use. If we use too many buckets, we end up doing a lot of really small reads across a lot of partitions. Too few buckets, we end up with really large partitions that don’t compact, repair, stream well, and have poor read performance.

    For this example, we’ll look at a theoretical model for someone who’s following a lot of users on a social network like Twitter. Most accounts would be fine to have a single partition for incoming messages, but some people / bots might follow millions of accounts.

    Disclaimer: I have no knowledge of how Twitter is actually storing their data, it’s just an easy example to discuss.

    CREATE TABLE tweet_stream (
        account text,
        day text,
        bucket int,
        ts timeuuid,
        message text,
        primary key((account, day, bucket), ts)
             AND COMPACTION = {'class': 'TimeWindowCompactionStrategy', 
                           'compaction_window_unit': 'DAYS', 
                           'compaction_window_size': 1};

    This data model extends our previous data model by adding bucket into the partition key. Each day can now have multiple buckets to fetch from. When it’s time to read, we need to fetch from all the partitions, and take the results we need. To demonstrate, we’ll insert some data into our partitions:

    cqlsh:blog> insert into tweet_stream (account, day, bucket, ts, message) VALUES ('jon_haddad', '2017-07-01', 0, now(), 'hi');
    cqlsh:blog> insert into tweet_stream (account, day, bucket, ts, message) VALUES ('jon_haddad', '2017-07-01', 1, now(), 'hi2');
    cqlsh:blog> insert into tweet_stream (account, day, bucket, ts, message) VALUES ('jon_haddad', '2017-07-01', 2, now(), 'hi3');
    cqlsh:blog> insert into tweet_stream (account, day, bucket, ts, message) VALUES ('jon_haddad', '2017-07-01', 3, now(), 'hi4');

    If we want the ten most recent messages, we can do something like this:

    fromitertoolsimportchainfromcassandra.utilimportunix_time_from_uuid1prepared=session.prepare("SELECT ts, message FROM tweet_stream WHERE account = ? and day = ? and bucket = ? LIMIT 10")# let's get 10 buckets partitions=range(10)# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]args=product(["jon_haddad"],["2017-07-01"],partitions)result=execute_concurrent_with_args(session,prepared,args)# [ExecutionResult(success=True, result_or_exc=<cassandra.cluster.ResultSet object at 0x106d1e6d0>),#  ExecutionResult(success=True, result_or_exc=<cassandra.cluster.ResultSet object at 0x106d1d710>),#  ExecutionResult(success=True, result_or_exc=<cassandra.cluster.ResultSet object at 0x106d1d4d0>),#  ExecutionResult(success=True, result_or_exc=<cassandra.cluster.ResultSet object at 0x106d1d950>),#  ExecutionResult(success=True, result_or_exc=<cassandra.cluster.ResultSet object at 0x106d1db10>),#  ExecutionResult(success=True, result_or_exc=<cassandra.cluster.ResultSet object at 0x106d1dfd0>),#  ExecutionResult(success=True, result_or_exc=<cassandra.cluster.ResultSet object at 0x106d1dd90>),#  ExecutionResult(success=True, result_or_exc=<cassandra.cluster.ResultSet object at 0x106d1d290>),#  ExecutionResult(success=True, result_or_exc=<cassandra.cluster.ResultSet object at 0x106d1e250>),#  ExecutionResult(success=True, result_or_exc=<cassandra.cluster.ResultSet object at 0x106d1e490>)]results=[x.result_or_excforxinresult]# append all the results togetherdata=chain(*results)sorted_results=sorted(data,key=lambdax:unix_time_from_uuid1(x.ts),reverse=True)# newest stuff first# [Row(ts=UUID('e1c59e60-7406-11e7-9458-897782c5d96c'), message=u'hi4'),#  Row(ts=UUID('dd6ddd00-7406-11e7-9458-897782c5d96c'), message=u'hi3'),#  Row(ts=UUID('d4422560-7406-11e7-9458-897782c5d96c'), message=u'hi2'),#  Row(ts=UUID('d17dae30-7406-11e7-9458-897782c5d96c'), message=u'hi')]

    This example is only using a LIMIT of 10 items, so we can be lazy programmers, merge the lists, and then sort them. If we wanted to grab a lot more elements we’d want to use a k-way merge algorithm. We’ll come back to that in a future blog post when we expand on this topic.

    At this point you should have a better understanding of how you can distribute your data and requests around the cluster, allowing it to scale much further than if a single partition were used. Keep in mind each problem is different, and there’s no one size fits all solution.

    0 0

    Tomorrow I will be leaving for a tour in the APAC region where I am going to spread the words about agile integration using Apache Camel and Kubernetes and OpenShift.

    I have plotted my destinations in the map below

    The first stop is Tokyo where I will arrive on this Saturday August 5th (11 hour flight). I will then be jet-lagged and a bit tired so I will take it easy on Saturday.  On Sunday I plan to polish my presentations, and then visit some spots in Tokyo.

    Monday and Tuesday are full day workshops. Red Hat has posted information about these workshops on the registration page for the Sydney and Melbourne events in case you are interested. I am not aware of any registration page for Tokyo which was quickly fully booked so we had to do a 2nd day as well.

    On Wednesday I do have some time in the morning and afternoon to see a bit more of Tokyo before I will travel to Sydney (10h flight) where I arrive August 10th Thursday morning. In the evening I will attend and speak at the Sydney meetup. So if you are in this area you are very welcome to come by, as I love chatting with fellow developers and possible Camel users.  Then we have the full day workshop on Friday 11th.

    On Saturday the 12th is my first leisure day where I have booked the bridgeclimb tour at 10am. If I am up early in the morning I will go see the opera house (as it was designed by a Danish architect) and the viewpoint of the harbour from the botanic garden. In the afternoon I will go to The Rocks for a cup of coffee and/or beer(s). I am in talks with a few Camel users from a Sydney office whom want to meet there for some drinks. You are welcome to join us there, you can reach out to me on my email or twitter etc.

    On Sunday I travel to Melbourne. On Monday 14th I plan to run the F1 circuit track in Albert Park. If all goes well then I will run several rounds so I can do a half marathon distance. The following day we have a full day workshop in Melbourne. At this time of writing there is a potential meetup in Melbourne happening as well on the evening of Tuesday 15th. They are currently finding a venue.

    On Wednesday I travel to Wellington where we have two full day workshops on Thursday and Friday.

    Saturday 19th I have a full day in Wellington where I plan to walk the city and see various stuff. And see if I can recall some of the places I visited 15 years ago when I backpacked New Zealand and Australia.

    I then fly to Auckland on Sunday 20th August and have a mini vacation there until Wednesday where I travel to USA (12 hour flight). If you are from around Auckland and want to have a cup of coffee or a drink somewhere then you are welcome to reach out to me via my email or twitter etc.

    In USA I will visit an old friend whom I have not see in about 15 years, when he relocated to Seattle when he got a job at Microsoft. Among others he is taking me to my first live american football game where we are going to see Seattle Seahawks.

    I will travel back to Denmark on Monday 28th and arrive back home in the afternoon on the 29th.
    And then its back to Camel land and work from the 30th August.

    My travel plan is as follows (where there is an empty space means traveling):

    2017-08-05: Tokyo
    2017-08-06: Tokyo
    2017-08-07: Tokyo
    2017-08-08: Tokyo
    2017-08-10: Sydney
    2017-08-11: Sydney
    2017-08-12: Sydney
    2017-08-14: Melbourne
    2017-08-15: Melbourne
    2017-08-17: Wellington
    2017-08-18: Wellington
    2017-08-19: Wellington
    2017-08-21: Auckland
    2017-08-22: Auckland
    2017-08-24: Seattle
    2017-08-25: Seattle
    2017-08-26: Seattle
    2017-08-27: Seattle
    2017-08-29: Denmark

    0 0

    The Velocity developers are pleased to announce the release of Velocity Engine 2.0.

    Among the main new features and enhancements:

    + Logging to the SLF4J logging facade.

    + Configurable whitespace gobbling.

    + Method arguments and array subscripts can now be arithmetic expressions.

    + Configurable method arguments conversion handler with automatic conversions between booleans, numbers, strings and enums.

    + Significant reduction of the memory consumption.

    + JSR-223 Scripting Engine implementation.

    For a full list of changes, consult Velocity Engine 2.0 Changes section and JIRA changelog.

    For notes on upgrading from Velocity 1.x, see Velocity Engine 2.0 Upgrading section.

    Note for Velocity Tools users: Velocity Tools 3.0 shall soon be released. Meanwhile, you are encouraged to use the Velocity Tools 3.x last snapshot (see Velocity Tools 3.x Upgrading notes).

    Downloads of 2.0 are available here.

    0 0

    On Thursday, April 24, 2014, I was in a very serious cycling accident in Boulder, Colorado while riding my new Cervélo S3 during the lunch hour and I am currently hospitalized in Denver, Colorado for at least the next 60 days. 

    Damage Report

    In the wreckage, I suffered 11 fractured ribs (10 on the left side, most in multiple places, and one on the right), fractures of the L3 and L4 spinal vertebrae, one collapsed/punctured lung, one deflated lung, a nasty laceration on my left hip that required stitches and loads of road rash all over my back and left hip from being run over and rolled by the car. The worst part was being conscious through the entire ordeal, i.e., I knew I was being run over by a car.

    Current Status 

    After undergoing emergency spinal surgery involving the insertion of mounting hardware, rods and screws from the L2 - L5 vertebrae to support the fusion between L3 - L4. Also, L3 and L4 were dislocated which is what damaged the disc between them requiring the fusion. They also had to clean out much debris from various spinal process fractures (fractures on T9, L1, L2, L3, L4) that punctured the spinal dura and required repair. There was also damage to the left psoas muscle and I experienced an ileus, a disruption of the normal propulsive ability of the gastrointestinal tract. Luckily they removed my chest tube while I was in the ICU under heavy pain meds. When getting out of bed, first I must put on a rigid clamshell brace from my armpits to my pelvis, two pieces that velcros very tightly together. This is not fun due to all the rib fractures. I am now able to control everything from the knees up with the exception of my butt. I do feel my feet somewhat as I can distinguish sharp vs. dull touches in some areas but I am not able to flex my feet/ankles or wiggle my toes at this time. After the surgery, I was in the Intensive Care Unit (ICU) at Boulder Community Hospital for 10 days or so. Since this time, I have been transferred to Craig Hospital in Denver, Colorado. Craig is a world-renowned hospital for it's spinal and brain injury rehabilitation programs.

    (For those who are curious, I'm told that the bike was left almost untouched. But I will certainly have my family take it to Excel Sports in Boulder to be fully evaluated. )

    A Very Special Thank You 

    There is one guy who deserves a special thank you for his compassion for a stranger in distress. Gareth, your voice rescued me and got me through the initial accident and your clear thinking helped me more than you will ever know. After you visited me at the Boulder Hospital, I totally fell apart just because I heard your voice again. We will meet again, my brother. 

    Also, a special thank you to Mike O. for introducing Gareth and I after the accident. Thanks, buddy. 

    To My Family and Friends 

    My wife Janene has truly been my rock through this entire ordeal. Never did she waver and, for me, the sun rises and sets with her. She and my girls have given me such strength when I needed it most. I am truly blessed with my family and friends.

    My brother, Michael, was like a sentry -- by my side, from early morning until late into the night, supporting me in any way he could. I love you, Michael!

    The moment my brother, my parents and my in-laws received the news of the accident, they packed their cars and hauled ass through the night 1000 miles to be by my side. I love you all so much and I could not have gotten this far without you. You are all amazing!

    Thank you to my close friends for whom this experience only brought us closer. Karen, Dan, Anna, Sarah, and Sasha, I love you guys! Filip, you are very special to me and your dedication to visiting me and helping me keep my spirits high is stellar, thank you! Mike O., the chicken curry was delicious! Who knew this dude can cook *and* write code, thank you! Tim R., you have been my cycling buddy for a number of years and we were riding together the day of the accident just prior to its occurrence. You've stayed by me and met my family and helped in any way you can, thank you!

    To all my friends and neighbors who immediately mobilized to provide my family with more delicious meals than they could possibly keep up with eating, you have really made us feel loved and watched over. Not only has Louisville, Colorado been ranked as one of the best places to live in the USA by Money Magazine for the last several years, the community of friends and neighbors is like an extended family -- you guys are the best!

    Thank You To Everyone 

    Thank you for all of your phone calls, emails, texts, tweets, concerns, hospital visits and well-wishes from everyone around the world. The level of compassion that I have experienced from near and far has been absolutely overwhelming for me. Please understand that I am not able to communicate directly with every single person simply due to the sheer volume of communications and the amount of time I am now spending doing rehab at Craig Hospital. I still get exhausted fairly easily doing rehab and just trying to live my life right now at Craig Hospital -- and this is coming from someone who could easily run 10 miles or ride 30+ miles at lunch just a couple weeks ago. This whole experience is absolutely flooding me emotionally and physically.

    The Gory Details 

    For those who want more details as the shit went down, please see the Caring Bridge :: Bruce Snyder website set up my extraordinary friend Jamie Hogan. This website is where Janene has been posting updates about my experience since the beginning. I will be adding my experiences henceforth here on my blog as I travel the winding road of recovery.


    Life is precious and I am so very happy to be alive.

    And please, please do not ever text and drive. 

    0 0

    I wanted to spend our summer vacation driving our VWs up the California coast, on a mammoth 3500-mile road trip over two weeks. However, when a landslide happened near Big Sur, I knew it was probably best to move this road trip from my yearly goals to my bucket list. Instead, we opted to drive to Montana and spend a couple of weeks vacationing in my childhood playground.

    Our journey began with a bit of work involved. Trish's company was sponsoring a family movie night event in Sandy, Utah. We found out that my company was sponsoring as well, so we decided to take the scenic route to Montana. We left Denver at 9 pm on Thursday, June 29, and arrived in Grand Junction, CO at 2 am. Trish needed to be in Sandy for a lunch meeting, so we woke up promptly at 6 am and got back on the road.

    The event in Sandy was super-fun. We enjoyed talking to customers, handing out swag, and watching the Despicable Me 3 premiere with everyone.

    A family that works together, stays together.

    We high-tailed it to Montana after that, spending two days driving along scenic I-15 through Utah, Idaho, and Montana. We arrived at the Raible Homestead on Sunday afternoon.

    Pretty nice place to be. #vacation #vanlife #montanaboundWe made it to The Cabin!

    The next two weeks were spent rafting, hiking, hanging out with friends, driving a lot, and relaxing with good books. Rafting the Middle Fork of the Flathead was a highlight for me, especially watching Abbie do all the rapids by herself in a duckie.

    A beautiful day for a float!

    Happy DudesKids love #riverlife

    Near where "A River Runs Through It" took place

    We also hiked up to Lower Rumble Lake. The trail goes straight up the mountain, with no switchbacks. It can be a grueling hike. My Mom, Abbie, Jack, our two border collies (Sagan and Jake), and I made up the trail crew. About halfway up, I thought, "this isn't as hard as I remember." By the time we got to the top (90 minutes later), it was just as challenging as I remembered. It was probably 30 years since I was last at Rumble Lake; it's still as majestic as ever.

    Worth the hike

    Rumble Lake Hikers

    Our final event in Montana was the Bob Marshall Music Festival. We attended the first annual event last year and had a great time. This year was even better, especially since we had one of the bands camped right next to us. Trish and I loved the late night campfire sessions listening to the local pickers.

    Let the weekend birthday party begin!#riverlife

    My Happy FamilyMy Crazy Family

    Trish and the kids flew back to avoid the long drive home. I took the slow, scenic route home with Stout the Syncro, living the #vanlife for a couple days. Waking up with a view of the Tetons was awe-inspiring, as was the view driving through the Flaming Gorge Recreation area. I used an AT&T Unite Explore MiFi device for connectivity, or worked at coffee shops along the way.

    Pretty nice views to wake up to this morning! #carpediem #working #vanlife

    We name it Flaming Gorge" — John Wesley Powell

    More photos on Flickr → Summer Vacation in Montana 2017

    I arrived home around midnight on Wednesday, July 19. Total miles: 3450. Issues with Stout the Syncro: none.

    Epilogue: On my journey home, I wrote and polished presentations for ÜberConf. I delivered those presentations that Friday and uploaded them to Speaker Deck. You can view them using the following links:

    Two weeks later, and I'm writing this from the back of our van on another Raible Road Trip. This time, we're heading to central Idaho for a week-long rafting trip with family and friends. Another couple thousand miles, many more unforgettable memories. There's something special about traveling the country in a VW Van.

    0 0

    Here's a simple example, using XML Schema 1.1 <assert> to validate elementary school mathematical tables.

    XML document:
    <?xml version="1.0"?>
    <table id="2">

    XSD 1.1 document:
    <?xml version="1.0"?>
    <xs:schema xmlns:xs="">
       <xs:element name="table">
               <xs:element name="x" minOccurs="10" maxOccurs="10"/>
            <xs:attribute name="id" type="xs:positiveInteger" use="required">
                 <xs:documentation>Mathematical table of @id is represented.</xs:documentation>
            <xs:assert test="x[1] = @id"/>
            <xs:assert test="every $x in x[position() gt 1] satisfies $x = $x/preceding-sibling::x[1] + @id">
                  <xs:documentation>An XPath 2.0 expression validating the depicted mathematical table.    

    0 0

    Have you ever thought about centralizing control with Apache Syncope of your Zimbra accounts?

    0 0

    We’re rebuilding the front steps, and since the masons are using concrete blocks, we have an opportunity to include a time capsule. Here are a few notes we’re including. Dear Future Shane This is a message from the past. The … Continue reading

    0 0
    0 0

    A handy feature was silently added to Apache Cassandra’s nodetool just over a year ago. The feature added was the -j (jobs) option. This little gem controls the number of compaction threads to use when running either a scrub, cleanup, or upgradesstables. The option was added to nodetool via CASSANDRA-11179 to version 3.5. It has been back ported to Apache Cassandra versions 2.1.14, 2.2.6, and 3.5.

    If unspecified, nodetool will use 2 compaction threads. When this value is set to 0 all available compaction threads are used to perform the operation. Note that the total number of available compaction threads is controlled by the concurrent_compactors property in the cassandra.yaml configuration file. Examples of how it can be used are as follows.

    $ nodetool scrub -j 3
    $ nodetool cleanup -j 1
    $ nodetool upgradesstables -j 1 

    The option is most useful in situations where disk space is scarce and a limited number of threads for the operation need to be used to avoid disk exhaustion.

    0 0

    I was starting to get interested in Fallout 4, which seems like a fairly interesting game.

    But, I just got Windows 10 Creators Update installed.

    Which, you might think, would be a good thing!

    Unfortunately, it seems to have been the kiss of death for Fallout 4.

    This is not the first bad experience I've had with the Fallout games. Fallout New Vegas was totally unplayable on my machine, as well.

    When will I learn?

    0 0

    0 0

    Over more than three decades, Tony Hillerman wrote a series of absolutely wonderful detective novels set on the Navajo Indian Reservation and featuring detectives Lieutenant Joe Leaphorn and Sergeant Jim Chee.

    Recently, I learned that, after Hillerman's death, his daughter, Anne Hillerman, has begun publishing her own novels featuring Leaphorn, Chee, and the other major characters developed by her father, such as Officer Bernadette Manuelito.

    So far, she has published three books, the first of which is Spider Woman's Daughter.

    If you loved Tony Hillerman's books, I think you will find Anne Hillerman's books lovely, as well. Not only is she a fine writer, she brings an obvious love of her father's choices of setting, of character(s), and of the Navajo people and their culture.

    I'm looking forward to reading the other books that she has written, and I hope she continues writing many more.

    0 0

    • Allen curve – Wikipedia

      During the late 1970s, [Professor Thomas J.] Allen undertook a project to determine how the distance between engineers’ offices affects the frequency of technical communication between them. The result of that research, produced what is now known as the Allen Curve, revealed that there is a strong negative correlation between physical distance and the frequency of communication between work stations. The finding also revealed the critical distance of 50 meters for weekly technical communication. With the fast advancement of internet and sharp drop of telecommunication cost, some wonder the observation of Allen Curve in today’s corporate environment. In his recently co-authored book, Allen examined this question and the same still holds true. He says[2] “For example, rather than finding that the probability of telephone communication increases with distance, as face-to-face probability decays, our data show a decay in the use of all communication media with distance (following a “near-field” rise).” [p. 58]
      Apparently a few years back in Google, some staff mined the promotion data, and were able to show a Allen-like curve that proved a strong correlation between distance from Jeff Dean’s desk, and time to getting promoted.

      (tags: jeff-deangooglehistoryallen-curveworkcommunicationdistanceofficesworkplaceteleworkingremote-work)

    • Arq Backs Up To B2!

      Arq backup for OSX now supports B2 (as well as S3) as a storage backend. “it’s a super-cheap option ($.005/GB per month) for storing your backups.” (that is less than half the price of $0.0125/GB for S3’s Infrequent Access class)

      (tags: s3storageb2backblazebackupsarqmacosxops)

    • After Charlottesville, I Asked My Dad About Selma

      Dad told me that he didn’t think I was going to have to go through what he went through, but now he can see that he was wrong. “This fight is a never-ending fight,” he said. “There’s no end to it. I think after the ‘60s, the whole black revolution, Martin Luther King, H. Rap Brown, Stokely Carmichael and all the rest of the people, after that happened, people went to sleep,” he said. “They thought, ‘this is over.’”

      (tags: selmacharlottesvilleracismnazisamericaracehistorycivil-rights1960s)

    0 0


      Foursquare’s open source repo, where they extract reusable components for open sourcing — I like the approach of using a separate top level module path for OSS bits

      (tags: open-sourceossfoursquarelibrariesmaintainancecodinggitmonorepos)

    • GTK+ switches build from Autotools to Meson

      ‘The main change is that now GTK+ takes about ? of the time to build compared to the Autotools build, with likely bigger wins on older/less powerful hardware; the Visual Studio support on Windows should be at least a couple of orders of magnitude easier (shout out to Fan Chun-wei for having spent so, so many hours ensuring that we could even build on Windows with Visual Studio and MSVC); and maintaining the build system should be equally easier for everyone on any platform we currently support.’ Looking at it appears to be Python-based and AL2-licensed open source. On the downside, though, the Meson file is basically a Python script, which is something I’m really not fond of :( more details at .

      (tags: mesonbuildcodingdevautotoolsgtk+python)

    • Matt Haughey ???????? on Twitter: “high quality LED light tape for bikes and wheels is ridiculously cheap these days”

      good thread on fitting out a bike with crazy LED light tape; see also EL string. Apparently it’ll run off a 4.5V (3xAAA) battery pack nowadays which makes it pretty viable!

      (tags: bikescyclingsafetyled-lightsel-tapeled-tapehacksvia:mathowie)

    • M00N

      a beautifully-glitched photo of the moon by Giacomo Carmagnola; more on his art at . (Via Archillect)

      (tags: via:archillectartgiacomo-carmagnolaglitch-artmoonglitchimages)

    • How to shop on AliExpress

      From the aptly-named Thanks, Elliot — the last thing I needed was something to feed my addiction to cheap tat from China!

      (tags: chinaaliexpressdealextremegearbestgadgetsbuyingtataliholicstuff)

    • TIL you shouldn’t use conditioner if you get nuked

      If you shower carefully with soap and shampoo, Karam says [Andrew Karam, radiation expert], the radioactive dust should wash right out. But hair conditioner has particular compounds called cationic surfactants and polymers. If radioactive particles have drifted underneath damaged scales of hair protein, these compounds can pull those scales down to create a smooth strand of hair. “That can trap particles of contamination inside of the scale,” Karam says. These conditioner compounds are also oily and have a positive charge on one end that will make them stick to negatively charged sections of a strand of hair, says Perry Romanowski, a cosmetics chemist who has developed personal hygiene formulas and now hosts “The Beauty Brains” podcast on cosmetics chemistry. “Unlike shampoo, conditioners are meant to stay behind on your hair,” Romanowski says. If the conditioner comes into contact with radioactive material, these sticky, oily compounds can gum radioactive dust into your hair, he says.

      (tags: factoidsconditionersurfactantsnuclear-bombsfallouthairbizarretilvia:boingboing)

    0 0


    Have you already watched the Vice News Tonight mini-documentary on the events in Charlottesville?

    It's really powerful, really disturbing, really hard to watch. I don't know a lot about Vice News Tonight, but apparently it's an independent journalism effort receiving funding (and air time) from HBO. This is the first and only Vice News Tonight documentary I've ever watched.

    I was really moved by the Vice News Tonight reportage, and by the work of correspondent Elle Reeve, about whom I knew nothing before seeing that report. She did some very fine reporting, I think.

    I'm paying particular attention to this issue all of a sudden because my daughter now (since 1 month ago) lives in Richmond, Virginia, just one mile from Monument Avenue, the probable next locus of confrontation.

    I haven't ever visited Richmond, but hope to do so one day, now that my daughter lives there.

    In the meantime, I'm paying a lot more attention to events in Virginia that I did before.

    As are we all.

    0 0

    • NASA’s Sound Suppression Water System

      If you’ve ever watched a rocket launch, you’ve probably noticed the billowing clouds around the launch pad during lift-off. What you’re seeing is not actually the rocket’s exhaust but the result of a launch pad and vehicle protection system known in NASA parlance as the Sound Suppression Water System. Exhaust gases from a rocket typically exit at a pressure higher than the ambient atmosphere, which generates shock waves and lots of turbulent mixing between the exhaust and the air. Put differently, launch ignition is incredibly loud, loud enough to cause structural damage to the launchpad and, via reflection, the vehicle and its contents. To mitigate this problem, launch operators use a massive water injection system that pours about 3.5 times as much water as rocket propellant per second. This significantly reduces the noise levels on the launchpad and vehicle and also helps protect the infrastructure from heat damage.

      (tags: waterrocketslaunchnasaspacesound-suppressionsoundscience)

    • The White Lies of Craft Culture – Eater

      Besides field laborers, [Southern US] planter and urban communities both depended on proficient carpenters, blacksmiths, gardeners, stable hands, seamstresses, and cooks; the America of the 1700s and 1800s was literally crafted by people of color. Part of this hidden history includes the revelation that six slaves were critical to the operation of George Washington’s distillery, and that the eponymous Jack Daniel learned to make whiskey from an enslaved black man named Nathan “Nearest” Green. As Clay Risen reported for the New York Times last year, contrary to the predominant narrative that views whiskey as an ever “lily-white affair,” black men were the minds and hands behind American whiskey production. “In the same way that white cookbook authors often appropriated recipes from their black cooks, white distillery owners took credit for the whiskey,” he writes. Described as “the best whiskey maker that I know of” by his master, Dan Call, Green taught young Jack Daniel how to run a whiskey still. When Daniel later opened his own distillery, he hired two of Green’s sons. The popular image of moonshine is a product of the white cultural monopoly on all things ‘country’ Over time, that legacy was forgotten, creating a gap in knowledge about American distilling traditions — while English, German, Scottish, and Irish influences exist, that combination alone cannot explain the entirely of American distilling. As bourbon historian Michael Veach suggests, slave culture pieces together an otherwise puzzling intellectual history.

      (tags: historycraft-beercraft-culturefooddrinkwhiskeydistillingblack-historyjack-danielsnathan-nearest-green)

    • Meet the Espresso Tonic, Iced Coffee’s Bubbly New Cousin

      Bit late on this one but YUM

      To make the drink, Box Kite baristas simply load a glass with ice, fill it about three quarters of the way with chilled tonic, and then top it off with an espresso shot — typically from roasters like Madcap (MI) and Ritual (SF). Often, baristas pull the espresso shot directly on top of the tonic and ice mixture, forgoing the process of first pulling it into a cup and then pouring the espresso from cup to glass.

      (tags: tonic-waterrecipesespressocoffeedrinkscocktails)

    0 0

    Those of you in the “Java EE” may have already seen the announcement from Oracle that was posted yesterday concerning the future of Java EE. This is potentially very exciting news, particularly for the various Apache projects that implement some of the Java EE specs. Since Apache CXF implements a couple of the specs (JAX-WS and JAX-RS), I’m looking forward to seeing where Oracle goes with this.

    For those that don’t know, several years ago, I spent a LOT of time and effort reviewing contracts, the TCK licenses, sending emails and proposals back and forth with Oracle’s VP’s and Legal folks in an attempt to allow Apache to license some of the TCK’s (Technology Compatibility Kit) that the Apache projects needed. In order to claim 100% compliance with the spec, the projects need to have access to the TCK to run the tests. Unfortunately, Apache and Oracle were never able to agree on terms that would allow the projects to have access AND be able to act as an Apache project. Thus, we were not able to get the TCK’s. Most of the projects were able to move on and continue doing what they needed to do, but without the TCK’s, that “claim of compliance” that they would like is missing.

    I’m hoping that with the effort to open up the Java EE spec process, they will also start providing access to the TCK’s with an Open Source license that is compatible with the Apache License and Apache projects.