Are you the publisher? Claim or contact us about this channel


Embed this content in your HTML

Search

Report adult content:

click to rate:

Account: (login)

More Channels


Channel Catalog


Articles on this Page

(showing articles 1 to 50 of 50)
(showing articles 1 to 50 of 50)

    0 0

    We’re rebuilding the front steps, and since the masons are using concrete blocks, we have an opportunity to include a time capsule. Here are a few notes we’re including. Dear Future Shane This is a message from the past. The … Continue reading

    0 0

    A handy feature was silently added to Apache Cassandra’s nodetool just over a year ago. The feature added was the -j (jobs) option. This little gem controls the number of compaction threads to use when running either a scrub, cleanup, or upgradesstables. The option was added to nodetool via CASSANDRA-11179 to version 3.5. It has been back ported to Apache Cassandra versions 2.1.14, 2.2.6, and 3.5.

    If unspecified, nodetool will use 2 compaction threads. When this value is set to 0 all available compaction threads are used to perform the operation. Note that the total number of available compaction threads is controlled by the concurrent_compactors property in the cassandra.yaml configuration file. Examples of how it can be used are as follows.

    $ nodetool scrub -j 3
    $ nodetool cleanup -j 1
    $ nodetool upgradesstables -j 1 
    

    The option is most useful in situations where disk space is scarce and a limited number of threads for the operation need to be used to avoid disk exhaustion.


    0 0

    Those of you in the “Java EE” may have already seen the announcement from Oracle that was posted yesterday concerning the future of Java EE. This is potentially very exciting news, particularly for the various Apache projects that implement some of the Java EE specs. Since Apache CXF implements a couple of the specs (JAX-WS and JAX-RS), I’m looking forward to seeing where Oracle goes with this.

    For those that don’t know, several years ago, I spent a LOT of time and effort reviewing contracts, the TCK licenses, sending emails and proposals back and forth with Oracle’s VP’s and Legal folks in an attempt to allow Apache to license some of the TCK’s (Technology Compatibility Kit) that the Apache projects needed. In order to claim 100% compliance with the spec, the projects need to have access to the TCK to run the tests. Unfortunately, Apache and Oracle were never able to agree on terms that would allow the projects to have access AND be able to act as an Apache project. Thus, we were not able to get the TCK’s. Most of the projects were able to move on and continue doing what they needed to do, but without the TCK’s, that “claim of compliance” that they would like is missing.

    I’m hoping that with the effort to open up the Java EE spec process, they will also start providing access to the TCK’s with an Open Source license that is compatible with the Apache License and Apache projects.


    0 0

    We’re delighted to introduce cassandra-reaper.io, the dedicated site for the open source Reaper project! Since we adopted Reaper from the incredible folks at Spotify, we’ve added a significant number of features, expanded the supported versions past 2.0, added support for incremental repair, and added a Cassandra backend to simplify operations.

    The road ahead is looking promising. We’re working to improve the Cassandra backend even further, leveraging Cassandra’s multi-dc features to enable multi-dc repair as well as fault tolerance for Reaper itself. We’ve tested this work internally as The Last Pickle as well as received community feedback. In addition to the site, we’ve also set up a Gitter based chat to keep development out in the open as well as help foster the community.

    Over time we’re looking to expand the functionality of Reaper past handling just repairs. We would love for the Reaper WebUI to be the easiest way to perform all administrative tasks to a Cassandra cluster.


    0 0
    0 0
  • 08/24/17--23:31: Nick Kew: Cyrillic WordPress
  • Three years back, a bizarre glitch in WordPress turned the operation of this blog Turkish.  That felt rather Kafkaesque until it resolved itself, as I’d’ve had to navigate through a lot of Turkish to fix my settings back to English!

    Now in a faint echo of that, it’s sent me email in a cyrillic language.  The email template looks like a regular notification that someone is following my blog.  Comparing it to the last such notification (from earlier this week) and pasting the subject line[1] into google translate both confirm it.  Indeed, I tried google with several Cyrillic languages: Russian, Belarusian, Bulgarian, Ukrainian, and it appears to say much the same in all of them.

    And the punchline: the person who just subscribed is in Ankara, Turkey!  Which is what first reminded me of the Turkish incident, but no more cyrillic than English!

    This is not in the same league as the Turkish WordPress: when I log in, things are still in English.  Nevertheless, bizarre.

    [1] Subject line truncated to preserve privacy of the latest subscriber.



    0 0

    By default, an ImageView is not zoomable in an Android app. Just found a brilliant customization of Android's ImageView, which is zoomable. It's located here: https://github.com/MikeOrtiz/TouchImageView.

    To use it in the app, we just have to do TouchImageView iv = (TouchImageView) findViewById(R.id.img); (in an Activity's onCreate method for example). Of course, TouchImageView supports all the behavior of standard ImageView class.

    0 0

    Thirteen years ago today, my son was born. Jack has grown up to be a wicked smart and fun kid to be around. He started the 7th grade this year and is into Xbox, basketball, hanging out with his friends, and exotic cars. He dreams of driving a Lamborghini one day and enjoyed pretending he owned a bunch of Ferraris a few weeks back.

    This weekend, I cleaned up Hefe to chauffeur Jack and his buddies to Dart Warz for an afternoon of fun with Nerf guns.

    Life's a journey, enjoy the ride!

    Jack and his buddies at Dart WarzDart Guns!

    As expected, the boys had a great time. A few of his friends had a sleepover Saturday night and competed to see who could stay up the latest. They made it pretty late, but no one saw the sun rise.

    My parents drove down from Montana to help revel in Jack's birthday. We had a wonderful time visiting with them and celebrating with his mom and her husband, Dave. Thanks to Julie and Dave for being such awesome co-parents!

    Happy 13th Birthday Jack!

    Family photo on Jack's 13th Birthday


    0 0

    JAX-RS 2.1 (JSR 370) has been finally released and JAX-RS users can now start looking forward to experimenting with the new features very soon, with a number of final JAX-RS 2.1 implementations being already available (such as Jersey) or nearly ready to be released.

    Apache CXF 3.2.0 is about to be released shortly, and all of the new JAX-RS 2.1 features have been implemented:  reactive client API extensions, client/server Server Sent Events support, returning CompletableFuture from the resource methods and other minor improvements.

    As part of the 2.1 work (but also based on the CXF JIRA request) we also introduced RxJava Observable and recently - RxJava2 Flowable/Observable client and server extensions. One can use them as an alternative to using CompletableFuture  on the client or/and the server side. Note, the combination of RxJava2 Flowable with JAX-RS AsyncResponse on the server is quite cool.

    The other new CXF extension which was introduced as part of the JAX-RS 2.1 work is the NIO extension, this will be a topic of the next post.

    Pavel Bucek and Santiago Pericas-Geertsen were the great JAX-RS 2.1 spec leads. Andriy Redko spent a lot of his time with getting CXF 3.2.0 JAX-RS 2.1 ready.

    0 0

    In CXF 3.2.0 we have also introduced a server-side NIO extension which is based on the very first JAX-RS API prototype done by Santiago Pericas-Geertsen. The client NIO API prototype was not ready but the server one had some promising start. It was immediately implemented in CXF once a long-awaited 1st 2.1 API jar got published to Maven.

    However, once the JAX-RS 2.1 group finally resumed its work and started working on finalizing NIO API, the early NIO API was unfortunately dropped (IMHO it could've stayed as an entry point, 'easy' NIO API), while the new NIO API did not materialize primarily due to the time constraints of the JCP process.

    The spec leads did all they could but it was too tight for them to make it right. As sad as it was, they did the right decision, rather then do something in a hurry, better do it right at some later stage...

    It was easily the major omission from the final 2.1 API. How long JAX-RS users will wait till the new JAX-RS version will get finalized with the new NIO API becoming available to them given that it takes years for major Java EE umbrella of various specs be done ?

    In meantime the engineering minds in SpringBoot and RxJava and other teams will come up with some new brilliant ways of doing it. There will be not 1 but several steps ahead.

    Which brings me to this point: if I were to offer a single piece of advice to Java EE process designers, I'd recommend them to make sure that the new features can be easily added after the EE release date with the minor EE releases embracing these new features to follow soon,  without waiting for N years. If it were an option then we could've seen a JAX-RS 2.2 NIO in say 6 months - just a dream at the moment, I know. The current mechanism where EE users wait for several years for some new features is out of sync with the competitive reality of the software industry and only works because of the great teams around doing EE, the EE users loyalty and the power of the term 'standard'.

    Anyway, throwing away our own implementation of that NIO API prototype now gone from 2.1 API just because it immediately became the code supporting a non-standard feature was not a good idea.

    It offers an easy link to the Servlet 3.1 NIO extensions from the JAX-RS code and offers the real value. Thus the code stayed and is now available for the CXF users to experiment with.

    It's not very shiny but it will deliver. Seriously, if you need to have a massive InputStream copied to/from the HTTP connection with NIO and asynchronous callbacks involved, what else do you need but a simple and easy way to do it from the code ? Well, nothing can be simpler than this option for sure.

    Worried a bit it is not a standard feature ? No, it is fine, doing it the CXF way is a standard :-)

    0 0

    This release makes CommentLessSource and friends use XSLT 2.0 under the covers and adds new methods to override the XSLT version being used.

    The full list of changes:

    • CommentLessSource, DiffBuilder#ignoreComments and CompareMatcher#ignoreComments now all use XSLT version 2.0 stylesheets in order to strip comments. New constructors and methods have been added if you need a different version of XSLT (in particular if you need 1.0 which used to be the default up to XMLUnit 2.4.0).

      Issue #99.


    0 0

    [TL;DR: Apache Lucene 6.0 quietly introduced a powerful new feature called near-real-time (NRT) segment replication, for efficiently and reliably replicating indices from one server to another, and taking advantage of ever faster and cheaper local area networking technologies. Neither of the popular search servers (Elasticsearch, Solr) are using it yet, but it should bring a big increase in indexing and searching performance and robustness to them.]

    Lucene has a unique write-once segmented architecture: recently indexed documents are written to a new self-contained segment, in append-only, write-once fashion: once written, those segment files will never again change. This happens either when too much RAM is being used to hold recently indexed documents, or when you ask Lucene to refresh your searcher so you can search all recently indexed documents.

    Over time, smaller segments are merged away into bigger segments, and the index has a logarithmic "staircase" structure of active segment files at any time. This is an unusual design, when compared with databases which continuously update their files in-place, and it bubbles up to all sorts of nice high-level features in Lucene. For example:
    • Efficient ACID transactions.

    • Point-in-time view of the index for searching that will never change, even under concurrent indexing, enabling stable user interactions like pagination and drill-down.

    • Multiple point-in-time snapshots (commit points) can be indefinitely preserved in the index, even under concurrent indexing, useful for taking hot backups of your Lucene index.
    The ZFS filesystem's has similar features, such as efficient whole-filesystem snapshots, which are possible because it also uses a write-once design, at the file-block level: when you change a file in your favorite text editor and save it, ZFS allocates new blocks and writes your new version to those blocks, leaving the original blocks unchanged.

    This write-once design also empowers Lucene to use optimized data-structures and apply powerful compression techniques when writing index files, because Lucene knows the values will not later change for this segment. For example, you never have to tell Lucene that your doc values field needs only 1 byte of storage, nor that the values are sparse, etc.: Lucene figures that out by itself by looking at all values it is about to write to each new segment.

    Finally, and the topic of this post: this design enables Lucene to efficiently replicate a search index from one server ("primary") to another ("replica"): in order to sync recent index changes from primary to replica you only need to look at the file names in the index directory, and not their contents. Any new files must be copied, and any files previously copied do not need to be copied again because they are never changed!

    Taking advantage of this, we long ago added a replicator module to Lucene that used exactly this approach, and it works well. However, to use those APIs to replicate recent index changes, each time you would like to sync, you must first commit your changes on the primary index. This is unfortunately a costly operation, invoking fsync on multiple recently written files, and greatly increases the sync latency when compared to a local index opening a new NRT searcher.

    Document or segment replication?


    The requirement to commit in order to replicate Lucene's recent index changes was such a nasty limitation that when the popular distributed search servers Elasticsearch and Solr added distributed indexing support, they chose not to use Lucene's replication module at all, and instead created their own document replication, where the primary and all replicas redundantly index and merge all incoming documents.

    While this might seem like a natural approach to keeping replicas in sync, there are downsides:
    • More costly CPU/IO resource usage across the cluster: all nodes must do the same redundant indexing and merging work, instead of just one primary node. This is especially painful when large merges are running and interfere with concurrent searching, and is more costly on clusters that need many replicas to support high query rates. This alone would give Elasticsearch and Solr a big increase in cluster wide indexing and searching throughput.

    • Risk of inconsistency: ensuring that precisely the same set of documents is indexed in primary and all replicas is tricky and contributed to the problems Elasticsearch has had in the past losing documents when the network is mis-behaving. For example, if one of the replicas throws an exception while indexing a document, or if a network hiccup happens, that replica is now missing a document that the other replicas and primary contain.

    • Costly node recovery after down time: when a replica goes down for a while and then comes back up, it must replay (reindex) any newly indexed documents that arrived while it was down. This can easily be a very large number, requiring a large transaction log and taking a long time to catch up, or it must fallback to a costly full index copy. In contrast, segment based replication only needs to copy the new index files.

    • High code complexity: the code to handle the numerous possible cases where replicas can become out of sync quickly becomes complex, and handling a primary switch (because the old primary crashed) especially so.

    • No "point in time" consistency: the primary and replicas refresh on their own schedules, so they are all typically searching slightly different and incomparable views of the index.
    Finally, in Lucene 6.0, overshadowed by other important features like dimensional points, we quietly improved Lucene's replication module with support for NRT segment replication, to copy new segment files after a refresh without first having to call commit. This feature is an especially compelling when combined with the neverending trend towards faster and cheaper local area networking technologies.

    How does it work?


    While the logical design is straightforward ("just copy over the new segment files from primary to replicas"), the implementation is challenging because this adds yet another concurrent operation (replicas slowly copying index files over the wire), along side the many operations that already happen in IndexWriter such as opening new readers, indexing, deleting, segment merging and committing. Fortunately we were able to build on pre-existing Lucene capabilities like NRT readers, to write new segments and identify all their files, and snapshotting, to ensure segment files are not deleted until replicas have finished copying them.

    The two main APIs are PrimaryNode, which holds all state for the primary node including a local IndexWriter instance, and ReplicaNode for the replicas. The replicas act like an index writer, since they also create and delete index files, and so they acquire Lucene's index write lock when they start, to detect inadvertent misuse. When you instantiate each replica, you provide a pointer to where its corresponding primary is running, for example a host or IP address and port.

    Both the primary and replica nodes expose a SearcherManager so you can acquire and release the latest searcher at any time. Searching on the primary node also works, and would match the behavior of all Elasticsearch and Solr nodes today (since they always do both indexing and searching), but you might also choose to dedicate the primary node to indexing only.

    You use IndexWriter from the primary node to make index changes as usual, and then when you would like to search recently indexed documents, you ask the primary node to refresh. Under the hood, Lucene will open a new NRT reader from its local IndexWriter, gather the index files it references, and notify all connected replicas. The replicas then compute which index files they are missing and then copy them over from the primary.

    Document deletions, which normally carry in memory as a bitset directly from IndexWriter to an NRT IndexReader, are instead written through to the filesystem and copied over as well. All files are first copied to temporary files, and then renamed (atomically) in the end if all copies were successful. Lucene's existing end-to-end checksums are used to validate no bits were flipped in transit by a flaky network link, or bad RAM or CPU. Finally, the in-memory segments file (a SegmentInfos instance) is serialized on the wire and sent to the replicas, which then deserialize it and open an NRT searcher via the local SearcherManager. The resulting searcher on the replica is guaranteed to search the exact same point-in-time view as the primary.

    This all happens concurrently with ongoing searches on the replica, and optionally primary, nodes. Those searches see the old searcher until the replication finishes and the new searcher is opened. You can also use Lucene's existing SearcherLifetimeManager, which tracks each point-in-time searcher using its long version, if you need to keep older searchers around for a while as well.

    The replica and primary nodes both expose independent commitAPIs; you can choose to call these based on your durability requirements, and even stagger the commits across nodes to reduce cluster wide search capacity impact.

    No transaction log


    Note that Lucene does not provide a transaction log! More generally, a cluster of primary + replicas linked by NRT replication behave a lot like a single IndexWriter and NRT reader on a single JVM. This means it is your responsibility to be prepared to replay documents for reindexing since the last commit point, if the whole cluster crashes and starts up again, or if the primary node crashes before replicas were able to copy the new point-in-time refresh.

    Note that at the filesystem level, there is no difference between the Lucene index for a primary versus replica node, which makes it simple to shut down the old primary and promote one of the replicas to be a new primary.

    Merging


    With NRT replication, the primary node also does all segment merging. This is important, because merging is a CPU and IO heavy operation, and interferes with ongoing searching.

    Once the primary has finished a merge, and before it installs the merged segment in its index, it uses Lucene's merged segment warming API to give all replicas a chance to pre-copy the merged segment. This means that a merge should never block a refresh, so Lucene will keep fast refreshes even as large merged segments are still copying. Once all running replicas have pre-copied the merge, then the primary installs the merged segment, and after the next refresh, so do all the replicas.

    We have discussed having replicas perform their own merging, but I suspect that will be a poor tradeoff. Local area networks (e.g. 10 gigabit ethernet) are quickly becoming plenty fast and cheap, and asking a replica to also do merging would necessarily impact search performance. It would also be quite complex trying to ensure replicas perform precisely the same merges as the primary at the same time, and would otherwise break the same point-in-time view across replicas.

    Abstractions


    The primary and replica nodes are abstract: you must implement certain functions yourself. For example, the low level mechanics of how to copy bytes from primary to replica is up to you. Lucene does not provide that, except in its unit tests which use simple point-to-point TCP "thread per connection" servers. You could choose to use rsync, robocopy, netty servers, a central file server, carrier pigeons, UDP multicast (likely helpful if there are many replicas on the same subnet), etc.

    Lucene also does not provide any distributed leader election algorithms to pick a new primary when the current primary has crashed, nor heuristics to detect a downed primary or replica. But once you pick the new primary, Lucene will take care of having all replicas cutover to it, removing stale partially copied files from the old primary, etc.

    Finally, Lucene does not provide any load-balancing to direct queries to the least loaded replica, nor any cluster state to keep track of which node is the primary and which are the replicas, though Apache Zookeeper is useful for such shared distributed state.  These parts are all up to you!

    Expected failure modes


    There are many things that can go wrong with a cluster of servers indexing and searching with NRT index replication, and we wrote an evil randomized stress test case to exercise Lucene's handling in such cases. This test creates a cluster by spawning JVM subprocesses for a primary and multiple replica nodes, and begins indexing and replicating, while randomly applying conditions like an unusually slow network to specific nodes, a network that randomly flips bits, random JVM crashes (SIGSEGV!) of either primary or replica nodes, followed by recovery, etc. This test case uncovered all sorts of fun corner cases!

    An especially important case is when the primary node goes down (crashes, loses power or is intentionally killed). In this case, one of the replicas, ideally the replica that was furthest along in copying files from the old primary as decided by a distributed election, is promoted to become the new primary node. All other replicas then switch to the new primary, and in the process must delete any files copied or partially copied from the old primary but not referenced by the new primary. Any documents indexed into the primary but not copied to that replica will need to be indexed again, and this is the caller's responsibility (no transaction log).

    If the whole cluster crashes or loses power, on restart you need to determine whichever index (primary or replica) has the "most recent" commit point, and start the primary node on that host and replica nodes on the other hosts. Those replica nodes may need to delete some index files in order to switch to the new primary, and Lucene takes care of that. Finally, you will have to replay any documents that arrived after the last successful commit.

    Other fun cases include: a replica crashing while it was still pre-copying a merge; a replica that is falling behind because it is still copying files from the previous refresh when a new refresh happens; a replica going down and coming back up later after the primary node has also changed.


    Downsides


    There are also some minor downsides to segment replication:
    • Very new code: this feature is quite new and also quite complex, and not widely used yet. Likely there exciting bugs! Patches welcome!

    • Slightly slower refresh time: after refreshing on the primary, including writing document deletions to disk as well, we must then copy all new files to the replica and open a new searcher on the replica, adding a bit more time before documents are visible for searching on the replica when compared to a straight NRT reader from an IndexWriter.  If this is really a problem, you can use the primary node for searching and have refresh latency very close to what a straight NRT reader provides.

    • Index problems might be replicated: if something goes wrong, and the primary somehow writes a broken index file, then that broken file will be replicated to all replicas too. But this is uncommon these days, especially with Lucene's end to end checksums.


    Concluding


    NRT segment replication represents an opportunity for sizable performance and reliability improvements to the popular distributed search servers, especially when combined with the ongoing trend towards faster and cheaper local area networks.  While this feature unfortunately came too late for Elasticsearch and Solr, I am hopeful that the next popular distributed search server, and its users, can benefit from it!

    0 0

    After about a year-and-a-half of wearing my old carbon fiber AFO braces (Ankle and Foot Orthoses), I recently got a new pair of Phat Braces which are also made of carbon fiber, but have a much better warranty and are widely used by people everywhere.  The big difference between my old braces and the new Phat Braces is that the Phat Braces are taller and stiffer (but they are beginning to soften a bit). They come up my leg to right below my knee which is further than my old braces . This makes them much more stable and it which allows me to balance and walk much easier. They also have some flexible plastic that wraps around the foot (as you can see in the image to the right) which also helps to provide more stability. The biggest benefit about them so far, however, is that it did not take my body six weeks to adjust to them. The previous braces actually took six weeks for my body to adjust and I was in pain the entire time. The company that provided them told me that's just how it goes. Through that adjustment period, I had to have at least a dozen manual adjustments to the carbon fiber (e.g., heat them up, bend out here and there, etc.), probably closer to 18 or so. With the new Phat Braces, I've only had two adjustments and my body has already adjusted to them -- literally less than a week. In fact, I have had the Phat Braces two weeks now and yesterday I did my first true Colorado hike since my injury in April 2014!


    Yesterday we decided to go hiking in Evergreen, CO because we were trying to get back to the spot where I proposed to Janene 20 years ago. We thought it would be cool to go back there because later this month Janene and I will be celebrating our 20th wedding anniversary. I was a bit intimidated when we started the hike because of the elevation gain on the trail and the number of large rocks that you hike over on the trail. I did take a single arm crutch with me but it almost made things more difficult because of the angle at which you hold the arm crutch vs. the angle of the rocks on such an uphill elevation. Also, my new braces make going uphill difficult because they are still stiff, but they will soften a bit more in time. But with Janene's help, I completed the hike. Janene did make a good suggestion that instead of using an arm crutch I should consider getting some hiking poles. Because you hold them at a different angle, it could make going uphill and downhill over rocks easier for me. So, I'm going to try some out soon at REI.

    Although the distance was not that great (1.7 miles), this was the most uphill/downhill I have done since my spinal cord injury 3.5 years ago -- I actually impressed myself. As proof of the level of workout for my body, my lower back and my hips were really tired after the hike and sore this morning. But I really enjoyed getting out for a hike with Janene and Bailey. So, I'm really looking forward to doing more hiking. I guess I can really start enjoying the fact that we live in Colorado again!

    0 0

    We are a rafting family. We solidified that when we bought a raft five years ago. Since then, we've had many adventures, on many rivers, and met a plethora of good friends along the way. We call these friends our "river family". Our river family gathers every January and chooses where we want to apply for river permits. We wait for a couple months until permits are granted. A person or two usually gets a permit granted, then the planning begins!

    This year, we were granted a permit to float the main fork of the Salmon River in Idaho. We started our journey just over a week after rafting, hiking, and enjoying life in Montana. It was a long drive (878 miles / 1448 km) from our house. It took two days to drive there and we stopped in Pocatello, Idaho to rendezvous with my dad along the way. He brought our raft from Montana and we wanted to leave his truck so we wouldn't have to pay $500 to shuttle it. Yep, that's right - the trek from our put-in (Corn Creek), to take-out (Carey Creek) was so long (383 miles / 616 km) that the shuttle company charged $500 per vehicle!

    We had 28 members in our river family on the Salmon. There were more children than adults, and something like 15 watercraft in total. It was epic, it was joyous, it is the source of many lasting memories. I think the kids might've even enjoyed it as much as the adults. Their "gossip circles" where a highlight for them, as was floating in their duckies, and the river romances that developed along the way.

    This story is best told with Trish's photos, where you can see the many smiles, the clear water, and how the good times flourished.

    Salmon River Posse

    Corn Creek put inWeather was gorgeous

    "That there's a fire!"E Kuhl

    Fire!

    20170801-DSC_1181Hi friend!Hello there!

    Fishin'

    Pirate night!Jack and LeviBrody

    The Stricker Pirate Family!Raible River Pirates!Arrrr!

    Levi and EricKindle Pirate Family!Homo Pirates

    Pirate Posse!

    After the Salmon, I thought we might be done rafting for the year. However, an opportunity to raft and camp the upper Colorado river cropped up last week. Abbie and Trish had a horse show they wanted to attend, so it was just Jack and I this time. When we arrived last Saturday morning, we were surprised to find it wasn't just a small contingency of our river family. 18 people joined us for two days and an overnight of river bliss. The kids outnumbered the parents once again, and we marveled at bald eagles and a mother bear with her three cubs along the river's edge.

    Cliff Jumping

    Bears!Lone TreeSleeping under the stars

    Kid PosseBreakfast at Lone Tree

    Happy Kids

    What was Jack's favorite part? The cliff jumping, of course.

    We arrived home Sunday to find that Abbie and Trish had a stellar weekend too. Abbie and Tucker won their division at the SummerFest Horse Show!

    Abbie and Tucker

    As summer draws to a close, I can't help but look to the fall season with anticipation. Football season, cooler days, beautiful colors, and ski season is just around the corner. Yee haw!


    0 0

    While thinking about a title of this post I thought the current title line, with the " Keeps Getting Better" finishing touch may work well; I knew I used a similar line before, and after looking through my posts I found it.

    Oh dear. I'm transported back to 2008, I can see myself, 9 years younger, walking to the Iona Technologies office, completely wired on trying to stop the Jersey JAX-RS domination :-), spotting an ad of the latest  Christina Aguilera's albom on the exit from the Lansdowne Dart station and thinking, it would be fun, trying to blog about it and link to CXF, welcome to the start of the [OT] series. I'm not sure now if I'm more surprised it was actually me who did write that post or that 9 years later I'm still here, talking about CXF :-).

    Let me get back to the actual subject of this post. You know CXF started quite late with embracing Swagger, and I'm still getting nervous whenever I remind myself Swagger does not support 'matrix' parameters :-). But the Swagger team has done a massive effort through the years, my CXF hat is off to them.

    I'm happy to say that now Apache CXF offers one of the best Swagger2 integrations around, at the JSON only and UI levels and it just keeps getting better.

    We've talked recently with Dennis Kieselhorst and one can now configure Swagger2Feature with the external properties file which can be especially handy when this feature is auto-discovered.

    Just at the last minute we resolved an issue reported by a CXF user to do with accessing Swagger UI from the reverse proxies.

    Finally, Freeman contributed a java2swagger Maven plugin.

    Swagger 3 will be supported as soon as possible too.

    Enjoy!

    0 0

    Includes support to get container logs from the pod, Kubernetes API auto configuration and lots of bug fixes

    The full changelog:

    • containerLog step to get the logs of a container running in the agent pod JENKINS-46085 #195
    • Autoconfigure cloud if kubernetes url is not set #208
    • Change containerCap and instanceCap 0 to mean do not use JENKINS-45845 #199
    • Add environment variables to container from a secret JENKINS-39867 #162
    • Deprecate containerEnvVar for envVar and added secretEnvVar
    • Enable setting slaveConnectTimeout in podTemplate defined in pipeline #213
    • Read Jenkins URL from cloud configuration or KUBERNETES_JENKINS_URL env var #216
    • Make withEnv work inside a container JENKINS-46278 #204
    • Close resource leak, fix broken pipe error. Make number of concurrent requests to Kubernetes configurable JENKINS-40825 #182
    • Delete pods in the cloud namespace when pod namespace is not defined JENKINS-45910 #192
    • Use Util.replaceMacro instead of our custom replacement logic. Behavior change: when a var is not defined it is not replaced, ie. ${key1} or ${key2} or ${key3} -> value1 or value2 or ${key3} #198
    • Allow to create non-configurable instances programmatically #191
    • Do not cache kubernetes connection to reflect config changes and credential expiration JENKINS-39867 #189
    • Inherit podAnnotations when inheriting pod templates #209
    • Remove unneeded plugin dependencies, make pipeline-model-extensions optional #214


    0 0

    sharkbait posted a photo:

    Making Dahl


    0 0


    Behind Picton Street

    IBM have published a lovely paper on their Stocator 0-rename committer for Spark

    Stocator is:

    1. An extended Swift client
    2. magic in their FS to redirect mkdir and file PUT/HEAD/GET calls under the normal MRv1 __temporary paths to new paths in the dest dir
    3. generating dest/part-0000 filenames using the attempt & task attempt ID to guarantee uniqueness and to ease cleanup: restarted jobs can delete the old attempts
    4. Commit performance comes from eliminating the COPY, which is O(data),
    5. And from tuning back the number of HTTP requests (probes for directories, mkdir 0 byte entries, deleting them)
    6. Failure recovery comes from explicit names of output files. (note, avoiding any saving of shuffle files, which this wouldn't work with...spark can do that in memory)
    7. They add summary data in the _SUCCESS file to list the files written & so work out what happened (though they don't actually use this data, instead relying on their swift service offering list consistency). (I've been doing something similar, primarily for testing & collection of statistics).

    Page 10 has their benchmarks, all; of which are against an IBM storage system, not real amazon S3 with its different latencies and performance.

    Table 5: Average run time


    Read-Only 50GB
    Read-Only 500GB
    Teragen
    Copy
    Wordcount
    Terasort
    TPC-DS
    Hadoop-Swift Base
    37.80±0.48
    393.10±0.92
    624.60±4.00
    622.10±13.52
    244.10±17.72
    681.90±6.10
    101.50±1.50
    S3a Base
    33.30±0.42
    254.80±4.00
    699.50±8.40
    705.10±8.50
    193.50±1.80
    746.00±7.20
    104.50±2.20
    Stocator
    34.60±0.56
    254.10±5.12
    38.80±1.40
    68.20±0.80
    106.60±1.40
    84.20±2.04
    111.40±1.68
    Hadoop-Swift Cv2
    37.10±0.54
    395.00±0.80
    171.30±6.36
    175.20±6.40
    166.90±2.06
    222.70±7.30
    102.30±1.16
    S3a Cv2
    35.30±0.70
    255.10±5.52
    169.70±4.64
    185.40±7.00
    111.90±2.08
    221.90±6.66
    104.00±2.20
    S3a Cv2 + FU
    35.20±0.48
    254.20±5.04
    56.80±1.04
    86.50±1.00
    112.00±2.40
    105.20±3.28
    103.10±2.14

    The S3a is the 2.7.x version, which has the stabilisation enough to be usable with Thomas Demoor's fast output stream (HADOOP-11183). That stream buffers in RAM & initiates the multipart upload once the block size threshold is reached. Provided you can upload data faster than you run out of RAM, it avoids the log waits at the end of close() calls, so has significant speedup. (The fast output stream has evolved into the S3ABlockOutput Stream (HADOOP-13560) which can buffer off heap and to HDD, and which will become the sole output stream once the great cruft cull of HADOOP-14738 goes in)

    That means in the doc, "FU" == fast upload, == incremental upload & RAM storage. The default for S3A will become HDD storage, as unless you have a very fast pipe to a compatible S3 store, it's easy to overload the memory

    Cv2 means MRv2 committer, the one which does  single rename operation on task commit (here the COPY), rather than one in task commit to promote that attempt, and then another in job commit to finalise the entire job. So only: one copy of every byte PUT, rather than 2, and the COPY calls can run in parallel, often off the critical path

     Table 6: Workload speedups when using Stocator



    Read-Only 50GB
    Read-Only 500GB
    Teragen
    Copy
    Wordcount
    Terasort
    TPC-DS
    Hadoop-Swift Base
    x1.09
    x1.55
    x16.09
    x9.12
    x2.29
    x8.10
    x0.91
    S3a Base
    x0.96
    x1.00
    x18.03
    x10.33
    x1.82
    x8.86
    x0.94
    Stocator
    x1
    x1
    x1
    x1
    x1
    x1
    x1
    Hadoop-Swift Cv2
    x1.07
    x1.55
    x4.41
    x2.57
    x1.57
    x2.64
    x0.92
    S3a Cv2
    x1.02
    x1.00
    x4.37
    x2.72
    x1.05
    x2.64
    x0.93
    S3a Cv2 + FU
    x1.02
    x1.00
    x1.46
    x1.27
    x1.05
    x1.25
    x0.93


    Their TCP-DS benchmarks show that stocator & swift is slower than TCP-DS Hadoop 2.7 S3a + Fast upload & MRv2 commit. Which means that (a) the Hadoop swift connector is pretty underperforming and (b) with fadvise=random and columnar data (ORC, Parquet) that speedup alone will give better numbers than swift & stocator. (Also shows how much the TCP-DS Benchmarks are IO heavy rather than output heavy the way the tera-x benchmarks are).

    As the co-author of that original swift connector then, what the IBM paper is saying is "our zero rename commit just about compensates for the functional but utterly underperformant code Steve wrote in 2013 and gives us equivalent numbers to 2016 FS connectors by Steve and others, before they started the serious work on S3A speedup". Oh, and we used some of Steve's code to test it, removing the ASF headers.

    Note that as the IBM endpoint is neither the classic python Openstack swift or Amazon's real S3, it won't exhibit the real issues those two have. Swift has the worst update inconsistency I've ever seen (i.e repeatable whenever I overwrote a large file with a smaller one), and aggressive throttling even of the DELETE calls in test teardown. AWS S3 has its own issues, not just in list inconsistency, but serious latency of HEAD/GET requests, as they always go through the S3 load balancers. That is, I would hope that IBM's storage offers significantly better numbers than you get over long-haul S3 connections. Although it'd be hard (impossible) to do a consistent test there, I 'd fear in-EC2 performance numbers to be actually worse than that measures.

    I might post something faulting the paper, but maybe I'll should to do a benchmark of my new committer first. For now though, my critique of both the swift:// and s3a:// clients is as follows

    Unless the storage services guarantees consistency of listing along with other operations, you can't use any of the MR commit algorithms to reliably commit work. So performance is moot. Here IBM do have a consistent store, so you can start to look at performance rather than just functionality. And as they note, committers which work with object store semantics are the way to do this: for operations like this you need the atomic operations of the store, not mocked operations in the client.

    People who complain about the performance of using swift or s3a as a destination are blisfully unaware of the key issue: the risk of data loss due inconsistencies. Stocator solves both issues at once.

    Anyway, means we should be planning a paper or two on our work too, maybe even start by doing something about random IO and object storage, as in "what can you do for and in columnar storage formats to make them work better in a world where a seek()+ read is potentially a new HTTP request."

    (picture: parakeet behind Picton Street)







    0 0

    Whimsy had four applications which made use of React.js; two of which previously were written using Angular.js.  One of these applications has already been converted to Vue, conversion of a second one is in progress.

    The reason for the conversion was the decision by Facebook not to change their license.

    Selection of Vue was based on two criteria: community size and the ability to support a React-like development model.  As a bonus, Vue supports an Angular-like development model too, is smaller in download size than either, and has a few additional features.  It is also fast, though I haven’t done any formal measurements.

    Note that the API is different than React.js’s, in particular lifecycle methods and event names.  Oh, and the parameters to createElement are completely different.  Much of my conversion was made easier by the fact that I was already using a ruby2js filter, so all I needed to do was to write a new filter.

    Things I like a lot:

    • Setters actually change the values synchronously.  This has been a source of subtle bugs and surprises when implementing a React.js application.
    • Framework can be used without preprocessors.  This is mostly true for React, but React.createClass is now deprecated.

    Things I find valuable:

    • Mixins.  And probably in the near future extends.  These make components true building blocks, not mere means of encapsulation.
    • Computed values.  Better than Angular’s watchers, and easier than React’s componentWillReceiveProps.
    • Events.  I haven’t made much use of these yet, but this looks promising.

    Things I dislike (but can work around):

    • Warnings are issued if property and data values are named the same.  I can understand why this was done; but I can access properties and data separately, and I’m migrating a codebase which often uses properties to define the initial values for instance data. It would be fine if there were a way to silence this one warning, but the only option available is to silence all warnings.
    • If I have a logic error in my application (it happens :-)), the stack traceback on Chrome doesn’t show my application.  On firefox, it does, but it is formatted oddly, and doesn’t make use of source maps so I can’t directly navigate to either the original source or the downloaded code.
    • Mounting an element replaces the entire element instead of just its children.  In my case, I’m doing server side rendering followed by client side updates.  Replacing the element means that the client can’t find the mount point.  My workaround is to add the enclosing element to the render.
    • Rendering on both the server and client can create a timing problem for forms.  At times, there can be just enough of a delay where the user can check a box or start to input data only to have Vue on the client wipe out the input.  I’m not sure why this wasn’t a problem with React.js, but for now I’m rendering the input fields as disabled until mounted on the client.

    Things I’m not using:

    • templates, directives, and filters.  Mostly because I’m migrating from React instead of Angular.  But also because I like components better than those three.

    On balance, so far I like Vue best of the three (even ignoring licensing issues), and am optimistic that Vue will continue to improve.


    0 0

    If we talk about the data injestion in the big data streaming pipelines it is fair to say that in the vast majority of cases it is the files in the CSV and other text, easy to parse formats which provide the source data.

    Things will become more complex when the task is to read and parse the files in the format such as PDF. One would need to create a reader/receiver capable of parsing the PDF files and feeding the content fragments (the regular text, the text found in the embedded attachments and the file metadata) into the processing pipelines. That was tricky to do right but you did it just fine.

    The next morning you get a call from your team lead letting you know the customer actually needs the content injested not only from the PDF files but also from the files in a format you've never heard of before. You spend the rest of the week looking for a library which can parse such files and when you finish writing the code involving that library's not well documented API all you think of is that the weekends have arrived just in time.

    On Monday your new task is to ensure that the pipelines have to be initialized from the same network folder where the files in PDF and other format will be dropped. You end up writing a frontend reader code which reads the file, checks the extension, and then chooses a more specific reader.   

    Next day, when you are told that Microsoft Excel and Word documents which may or may not be zipped will have to be parsed as well, you report back asking for the holidays...

    I'm sure you already know I've been preparing you for a couple of good news.

    The first one is a well known fact that Apache Tika allows to write a generic code which can collect the data from the massive number of text, binary, image and video formats. One has to prepare or update the dependencies and configuration and have the same code serving the data from the variety of the data formats.

    The other and main news is that Apache Beam 2.2.0-SNAPSHOT now ships a new TikaIO module (thanks to my colleague JB for reviewing and merging the PR). With Apache Beam capable of running the pipelines on top of Spark, Flink and other runners and Apache Tika taking care of various file formats, you get the most flexible data streaming system.

    Do give it a try, help to improve TikaIO with new PRs, and if you are really serious about supporting a variety of the data formats in the pipelines, start planning on integrating it into your products :-)

    Enjoy!




    0 0

    In this blog post we will take a look at consistency mechanisms in Apache Cassandra. There are three reasonably well documented features serving this purpose:

    • Read repair gives the option to sync data on read requests.
    • Hinted handoff is a buffering mechanism for situations when nodes are temporarily unavailable.
    • Anti-entropy repair (or simply just repair) is a process of synchronizing data across the board.

    What is far less known, and what we will explore in detail in this post, is a fourth mechanism Apache Cassandra uses to ensure data consistency. We are going to see Cassandra perform another flavour of read repairs but in far sneakier way.

    Setting things up

    In order to see this sneaky repair happening, we need to orchestrate a few things. Let’s just blaze through some initial setup using Cassandra Cluster Manager (ccm - available on github).

    # create a cluster of 2x3 nodes
    ccm create sneaky-repair -v 2.1.15
    ccm updateconf 'num_tokens: 32'
    ccm populate --vnodes -n 3:3
    
    # start nodes in one DC only
    ccm node1 start --wait-for-binary-proto
    ccm node2 start --wait-for-binary-proto
    ccm node3 start --wait-for-binary-proto
    
    # create table and keypsace
    ccm node1 cqlsh -e "CREATE KEYSPACE sneaky WITH replication = {'class': 'NetworkTopologyStrategy', 'dc1': 3};"
    ccm node1 cqlsh -e "CREATE TABLE sneaky.repair (k TEXT PRIMARY KEY , v TEXT);"
    
    # insert some data
    ccm node1 cqlsh -e "INSERT INTO sneaky.repair (k, v) VALUES ('firstKey', 'firstValue');"

    The familiar situation

    At this point, we have a cluster up and running. Suddenly, “the requirements change” and we need to expand the cluster by adding one more data center. So we will do just that and observe what happens to the consistency of our data.

    Before we proceed, we need to ensure some determinism and turn off Cassandra’s known consistency mechanisms (we will not be disabling anti-entropy repair as that process must be initiated by an operator anyway):

    # disable hinted handoff
    ccm node1 nodetool disablehandoff
    ccm node2 nodetool disablehandoff
    ccm node3 nodetool disablehandoff
    
    # disable read repairs
    ccm node1 cqlsh -e "ALTER TABLE sneaky.repair WITH read_repair_chance = 0.0 AND dclocal_read_repair_chance = 0.0"

    Now we expand the cluster:

    # start nodes
    ccm node4 start --wait-for-binary-proto
    ccm node5 start --wait-for-binary-proto
    ccm node6 start --wait-for-binary-proto
    
    # alter keyspace
    ccm node1 cqlsh -e "ALTER KEYSPACE sneaky WITH replication ={'class': 'NetworkTopologyStrategy', 'dc1': 3, 'dc2':3 };"

    With these commands, we have effectively added a new DC into the cluster. From this point, Cassandra can start using the new DC to serve client requests. However, there is a catch. We have not populated the new nodes with data. Typically, we would do a nodetool rebuild. For this blog post we will skip that, because this situation allows some sneakiness to be observed.

    Sneakiness: blocking read repairs

    Without any data being put on the new nodes, we can expect no data to be actually readable from the new DC. We will go to one of the new nodes (node4) and do a read request with LOCAL_QUORUM consistency to ensure only the new DC participates in the request. After the read request we will also check the read repair statistics from nodetool, but we will set that information aside for later:

    ccm node4 cqlsh -e "CONSISTENCY LOCAL_QUORUM; SELECT * FROM sneaky.repair WHERE k ='firstKey';"
    ccm node4 nodetool netstats | grep -A 3 "Read Repair"
    
     k | v
    ---+---
    
    (0 rows)
    
    

    No rows are returned as expected. Now, let’s do another read request (again from node4), this time involving at least one replica from the old DC thanks to QUORUM consistency:

    ccm node4 cqlsh -e "CONSISTENCY QUORUM; SELECT * FROM sneaky.repair WHERE k ='firstKey';"
    ccm node4 nodetool netstats | grep -A 3 "Read Repair"
    
     k        | v
    ----------+------------
     firstKey | firstValue
    
    (1 rows)
    

    We now got a hit! This is quite unexpected because we did not run rebuild or repair meanwhile and hinted handoff and read repairs have been disabled. How come Cassandra went ahead and fixed our data anyway?

    In order to shed some light onto this issue, let’s examine the nodetool netstat output from before. We should see something like this:

    # after first SELECT using LOCAL_QUORUM
    ccm node4 nodetool netstats  | grep -A 3 "Read Repair"
    Read Repair Statistics:
    Attempted: 0
    Mismatch (Blocking): 0
    Mismatch (Background): 0
    
    # after second SELECT using QUORUM
    ccm node4 nodetool netstats  | grep -A 3 "Read Repair"
    Read Repair Statistics:
    Attempted: 0
    Mismatch (Blocking): 1
    Mismatch (Background): 0
    
    # after third SELECT using LOCAL_QUORUM
    ccm node4 nodetool netstats  | grep -A 3 "Read Repair"
    Read Repair Statistics:
    Attempted: 0
    Mismatch (Blocking): 1
    Mismatch (Background): 0
    

    From this output we can tell that:

    • No read repairs happened (Attempted is 0).
    • One blocking read repair actually did happen (Mismatch (Blocking) is 1).
    • No background read repair happened (Mismatch (Background) is 0).

    It turns out there are two read repairs that can happen:

    • A blocking read repair happens when a query can not complete with desired consistency level without actually repairing the data. read_repair_chance has no impact on this.
    • A background read repair happens in situations when a query succeeds but inconsistencies are found. This happens with read_repair_chance probability.

    The take-away

    To sum things up, it is not possible to entirely disable read repairs and Cassandra will sometimes try to fix inconsistent data for us. While this is pretty convenient, it also has some inconvenient implications. The best way to avoid any surprises is to keep the data consistent by running regular repairs.

    In situations featuring non-negligible amounts of inconsistent data this sneakiness can cause a lot of unexpected load on the nodes, as well as the cross-DC network links. Having to do cross-DC reads can also introduce additional latency. Read-heavy workloads and workloads with large partitions are particularly susceptible to problems caused by blocking read repair.

    A particular situation when a lot of inconsistent data is guaranteed happens when a new data center gets added to the cluster. In these situations, LOCAL_QUORUM is necessary to avoid doing blocking repairs until a rebuild or a full repair is done. Using a LOCAL_QUORUM is twice as important when the data center expansion happens for the first time. In one data center scenario QUORUM and LOCAL_QUORUM have virtually the same semantics and it is easy to forget which one is actually used.


    0 0

    This is the fourth in a series of blog posts on securing Apache Hive. The first post looked at installing Apache Hive and doing some queries on data stored in HDFS. The second post looked at how to add authorization to the previous tutorial using Apache Ranger. The third post looked at how to use Apache Ranger to create policies to both mask and filter data returned in the Hive query.

    In this post we will show how Apache Ranger can create "tag" based authorization policies for Apache Hive using Apache Atlas. In the second post, we showed how to create a "resource" based policy for "alice" in Ranger, by granting "alice" the "select" permission for the "words" table. Instead, we can grant a user "bob" the "select" permission for a given "tag", which is synced into Ranger from Apache Atlas. This means that we can avoid managing specific resources in Ranger itself.

    1) Start Apache Atlas and create entities/tags for Hive

    First let's look at setting up Apache Atlas. Download the latest released version (0.8.1) and extract it. Build the distribution that contains an embedded HBase and Solr instance via:

    • mvn clean package -Pdist,embedded-hbase-solr -DskipTests
    The distribution will then be available in 'distro/target/apache-atlas-0.8.1-bin'. To launch Atlas, we need to set some variables to tell it to use the local HBase and Solr instances:
    • export MANAGE_LOCAL_HBASE=true
    • export MANAGE_LOCAL_SOLR=true
    Now let's start Apache Atlas with 'bin/atlas_start.py'. Open a browser and go to 'http://localhost:21000/', logging on with credentials 'admin/admin'. Click on "TAGS" and create a new tag called "words_tag".  Unlike for HDFS or Kafka, Atlas doesn't provide an easy way to create a Hive Entity in the UI. Instead we can use the following json file to create a Hive Entity for the "words" table that we are using in our example, that is based off the example given here:
    You can upload it to Atlas via:
    • curl -v -H 'Accept: application/json, text/plain, */*' -H 'Content-Type: application/json;  charset=UTF-8' -u admin:admin -d @hive-create.json http://localhost:21000/api/atlas/entities
    Once the new entity has been uploaded, then you can search for it in the Atlas UI. Once it is found, then click on "+" beside "Tags" and associate the new entity with the "words_tag" tag.

    2) Use the Apache Ranger TagSync service to import tags from Atlas into Ranger

    To create tag based policies in Apache Ranger, we have to import the entity + tag we have created in Apache Atlas into Ranger via the Ranger TagSync service. After building Apache Ranger then extract the file called "target/ranger-<version>-tagsync.tar.gz". Edit 'install.properties' as follows:
    • Set TAG_SOURCE_ATLAS_ENABLED to "false"
    • Set TAG_SOURCE_ATLASREST_ENABLED to  "true" 
    • Set TAG_SOURCE_ATLASREST_DOWNLOAD_INTERVAL_IN_MILLIS to "60000" (just for testing purposes)
    • Specify "admin" for both TAG_SOURCE_ATLASREST_USERNAME and TAG_SOURCE_ATLASREST_PASSWORD
    Save 'install.properties' and install the tagsync service via "sudo ./setup.sh". Start the Apache Ranger admin service via "sudo ranger-admin start" and then the tagsync service via "sudo ranger-tagsync-services.sh start".

    3) Create Tag-based authorization policies in Apache Ranger

    Now let's create a tag-based authorization policy in the Apache Ranger admin UI (http://localhost:6080). Click on "Access Manager" and then "Tag based policies". Create a new Tag service called "HiveTagService". Create a new policy for this service called "WordsTagPolicy". In the "TAG" field enter a "w" and the "words_tag" tag should pop up, meaning that it was successfully synced in from Apache Atlas. Create an "Allow" condition for the user "bob" with the "select" permissions for "Hive":
    We also need to go back to the Resource based policies and edit "cl1_hive" that we created in the second tutorial, and select the tag service we have created above. Once our new policy (including tags) has synced to '/etc/ranger/cl1_hive/policycache' we can test authorization in Hive. Previously, the user "bob" was denied access to the "words" table, as only "alice" was assigned a resource-based policy for the table. However, "bob" can now access the table via the tag-based authorization policy we have created:
    • bin/beeline -u jdbc:hive2://localhost:10000 -n bob
    • select * from words where word == 'Dare';

    0 0

    This is the fifth in a series of blog posts on securing Apache Hive. The first post looked at installing Apache Hive and doing some queries on data stored in HDFS. The second post looked at how to add authorization to the previous tutorial using Apache Ranger. The third post looked at how to use Apache Ranger to create policies to both mask and filter data returned in the Hive query. The fourth post looked how Apache Ranger can create "tag" based authorization policies for Apache Hive using Apache Atlas. In this post we will look at an alternative authorization solution called Apache Sentry.

    1) Build the Apache Sentry distribution

    First we will build and install the Apache Sentry distribution. Download Apache Sentry (1.8.0 was used for the purposes of this tutorial). Verify that the signature is valid and that the message digests match. Now extract and build the source and copy the distribution to a location where you wish to install it:

    • tar zxvf apache-sentry-1.8.0-src.tar.gz
    • cd apache-sentry-1.8.0-src
    • mvn clean install -DskipTests
    • cp -r sentry-dist/target/apache-sentry-1.8.0-bin ${sentry.home}
    I previously covered the authorization plugin that Apache Sentry provides for Apache Kafka. In addition, Apache Sentry provides an authorization plugin for Apache Hive. For the purposes of this tutorial we will just configure the authorization privileges in a configuration file locally to the Hive Server. Therefore we don't need to do any further configuration to the distribution at this point.

    2) Install and configure Apache Hive

    Please follow the first tutorial to install and configure Apache Hadoop if you have not already done so. Apache Sentry 1.8.0 does not support Apache Hive 2.1.x, so we will need to download and extract Apache Hive 2.0.1. Set the "HADOOP_HOME" environment variable to point to the Apache Hadoop installation directory above. Then follow the steps as outlined in the first tutorial to create the table in Hive and make sure that a query is successful.

    3) Integrate Apache Sentry with Apache Hive

    Now we will integrate Apache Sentry with Apache Hive. We need to add three new configuration files to the "conf" directory of Apache Hive.

    3.a) Configure Apache Hive to use authorization

    Create a file called 'conf/hiveserver2-site.xml' with the content:
    Here we are enabling authorization and adding the Sentry authorization plugin.

    3.b) Add Sentry plugin configuration

    Create a new file in the "conf" directory of Apache Hive called "sentry-site.xml" with the following content:
    This is the configuration file for the Sentry plugin for Hive. It essentially says that the authorization privileges are stored in a local file, and that the groups for authenticated users should be retrieved from this file. As we are not using Kerberos, the "testing.mode" configuration parameter must be set to "true".

    3.c) Add the authorization privileges for our test-case

    Next, we need to specify the authorization privileges. Create a new file in the config directory called "sentry.ini" with the following content:
    Here we are granting the user "alice" a role which allows her to perform a "select" on the table "words".

    3.d) Add Sentry libraries to Hive

    Finally, we need to add the Sentry libraries to Hive. Copy the following files from ${sentry.home}/lib  to ${hive.home}/lib:
    • sentry-binding-hive-common-1.8.0.jar
    • sentry-core-model-db-1.8.0.jar
    • sentry*provider*.jar
    • sentry-core-common-1.8.0.jar
    • shiro-core-1.2.3.jar
    • sentry-policy*.jar
    • sentry-service-*.jar
    In addition we need the "sentry-binding-hive-v2-1.8.0.jar" which is not bundled with the Apache Sentry distribution. This can be obtained from "http://repo1.maven.org/maven2/org/apache/sentry/sentry-binding-hive-v2/1.8.0/sentry-binding-hive-v2-1.8.0.jar" instead.

    4) Test authorization with Apache Hive

    Now we can test authorization after restarting Apache Hive. The user 'alice' can query the table according to our policy:
    • bin/beeline -u jdbc:hive2://localhost:10000 -n alice
    • select * from words where word == 'Dare'; (works)
    However, the user 'bob' is denied access:
    • bin/beeline -u jdbc:hive2://localhost:10000 -n bob
    • select * from words where word == 'Dare'; (fails)


    0 0

    There are really three aspects to your project’s decision (to use React.js or not based on the BSD+Patents license), and it’s important to consider each of them. You really need to consider which aspects are important to your project’s success — and which ones don’t really matter to you.
    (See the updated FAQ about the PATENTS issue on Medium!)

    • Legal— both details of the license and PATENTS file that Facebook offers React.js under, and some realistic situations where the patent clauses might actually come into play (which is certainly rare in court, but it’s the chilling effect of uncertainty that’s the issue)
    • Technology— are other libraries sufficiently functional to provide the features your project needs? Does a project have the capacity to make a change, if they decided to?
    • Community— how does the rest of the open source community-of-communities see the issue, and care about your choices? This includes both future buyers of a startup, as well as future partners, as well as future talent (employees) or contributors (open source developers).

    PATENTLY Legal

    I’ll start off what is almost certainly the least important issue to consider for your project: licenses and patents.

    • The legal issue is immaterial unless you’ve really thought through your project’s business model and really taken the time to evaluate how any patents you might now or in the future hold play into this. Seriously, before you read about this controversy (and even earlier), how much did you worry about potential future patent claims that might be in the many open source components your company uses?
    • The major legal point that’s worth bringing up as a generality is that including software under a non-OSI approved license always adds complexity, immaterial of the details of the license. In an honest open source world, there is never a good reason to use a license besides one of the OSI-approved licenses. OSI-approval is not a magic stamp; however, it does show licenses that are so widely used — and reviewed by lawyers — that there is seen as less risk to everyone else in consuming software under an OSI license.
    • Note: React is not offered under “BSD + Patent” (or more specifically BSD-2-Clause-Patentthanks, SPDX) OSI-approved license. It is offered under the BSD-3-Clause license (OSI-approved), plus Facebook’s own custom-written PATENTS file. It’s the addition of the custom bits about patents — which may be well written, but are different than other well-used licenses — that is the issue. Different licenses mean the lawyers need to spend extra time reviewing them before you can even get an informed opinion.

    Technology Changes

    If you are not yet using React.JS in your project(s), then now is an excellent time to review the functionality and ecosystems around the similar libraries, like PreactVue.jsReact-liteInferno.jsRiot.js, or other great JS libraries out there.

    If you are already using React.JS — like a lot of people — then you should take a brief moment to read up on this licensing issue. Don’t simply listen to the hype pro or con, but think how the issues you’ve read about apply to your project and your goals. React.JS has been using the Facebook PATENTS license for years now, so this is not a new situation and certainly doesn’t mean you need to make any quick changes.

    If you are really worried about the legal aspects of the license now, then you need to ask yourself: is it practical for us to change libraries?

    • Is there another open source library that provides sufficient functionality for what your project needs?
    • Do you have the technical capacity — engineering staff (for a company) or passionate volunteers (for an opensource project) — to change your architecture to use a new library?
    • Are there aspects of your project that could work better if you changed to a different library?

    These questions probably look familiar to most readers than the license and patent issues. And in most cases, these are the most important questions for your project — your technical capacity to make any changes, and if this is an opportunity to improve things, or if it’s just extra make-work to switch libraries.

    Community Expectations

    What does your community think about this issue? Again: not just the hype, take the time to think this through. And consider what “community” means to your specific project — VCs to buy your startup, customers to buy your software, contributors to join your open source project or developer talent you want to hire for your company. You need to understand who your community is to understand how they will view your decision.

    • If you are a big company, your lawyers have probably already told you what to do. Most likely if you’re already using React.JS, the issue was decided long ago when you first started using it.
    • If you are a startup thinking about VC exits, then don’t worry about the hype. But you do need to do an analysis of how this (old) news affects your project and your specific goals. My bet is that it won’t matter much — any major VC’s lawyers have long known about this issue and already calculated their reaction. More to the point, if you’re looking for a big buyout, at that point you’ll add enough staff to rewrite at the time if you decide it’s necessary (but I bet it won’t be).
    • If you are a software company building a variety of applications, you probably don’t need to worry about existing tools. Certainly, consider alternatives for new tools you start.
    • If you are a non-software company, you don’t need to worry about it. React.JS has used this license for ages, so there’s no change (just hype and news about the ASF policy change).
    • If you are an open source project, you’re probably already realizing that many open source contributors expect a level playing field for any software that calls itself “open source”. That means using an OSI-approved license, period. In particular, if your project might intend to ever join the Apache Software Foundation, then you need to consider OSI-licensed alternatives since the ASF no longer allows React.JS in its projects.

    What Open Source Means

    The big lesson here is: if you expect to play in the open source arena, you need to be honest about what “open source” means. There are a lot of aspects to the definition, but the most important one is publicly providing the source code under an OSI-approved licenseReact.JS is not offered under an OSI-approved license — and now that people are talking about it, they’re realizing it’s not the kind of open source they expected.

    The details of licensing are complex but rarely matter in a developer’s day to day life. What is important is managing expectations and risk, not just for yourself but for consumers and contributors to your project. Using an OSI-approved license means that the world can easily and quickly understand what you’re offering. Using a custom license means… people need to pause and evaluate before considering contributing to your projects.

    Even if you think you have a reason to use a custom license, you probably don’t (other than using a proprietary license, which is just fine too). Stick with OSI, because that’s what the world expects.

    This Three Reactions post previously appeared on Medium.

    The post Three React-ions to the Facebook PATENTS License appeared first on Community Over Code.


    0 0

    This the sixth and final blog post in a series of articles on securing Apache Hive. The first post looked at installing Apache Hive and doing some queries on data stored in HDFS. The second post looked at how to add authorization to the previous tutorial using Apache Ranger. The third post looked at how to use Apache Ranger to create policies to both mask and filter data returned in the Hive query. The fourth post looked how Apache Ranger can create "tag" based authorization policies for Apache Hive using Apache Atlas. The fifth post looked at an alternative authorization solution called Apache Sentry.

    In this post we will switch our attention from authorization to authentication, and show how we can authenticate Apache Hive users via kerberos.

    1) Set up a KDC using Apache Kerby

    A github project that uses Apache Kerby to start up a KDC is available here:

    • bigdata-kerberos-deployment: This project contains some tests which can be used to test kerberos with various big data deployments, such as Apache Hadoop etc.
    The KDC is a simple junit test that is available here. To run it just comment out the "org.junit.Ignore" annotation on the test method. It uses Apache Kerby to define the following principals for both Apache Hadoop and Apache Hive:
    • hdfs/localhost@hadoop.apache.org
    • HTTP/localhost@hadoop.apache.org
    • mapred/localhost@hadoop.apache.org
    • hiveserver2/localhost@hadoop.apache.org
    • alice@hadoop.apache.org 
    Keytabs are created in the "target" folder. Kerby is configured to use a random port to lauch the KDC each time, and it will create a "krb5.conf" file containing the random port number in the target directory.

    2) Configure Apache Hadoop to use Kerberos

    The next step is to configure Apache Hadoop to use Kerberos. As a pre-requisite, follow the first tutorial on Apache Hive so that the Hadoop data and Hive table are set up before we apply Kerberos to the mix. Next, follow the steps in section (2) of an earlier tutorial on configuring Hadoop with Kerberos that I wrote. Some additional steps are also required when configuring Hadoop for use with Hive.

    Edit 'etc/hadoop/core-site.xml' and add:
    • hadoop.proxyuser.hiveserver2.groups: *
    • hadoop.proxyuser.hiveserver2.hosts: localhost
    The previous tutorial on securing HDFS with kerberos did not specify any kerberos configuration for Map-Reduce, as it was not required. For Apache Hive we need to configure Map Reduce appropriately. We will simplify things by using a single principal for the Job Tracker, Task Tracker and Job History. Create a new file 'etc/hadoop/mapred-site.xml' with the following properties:
    • mapreduce.framework.name: classic
    • mapreduce.jobtracker.kerberos.principal: mapred/localhost@hadoop.apache.org
    • mapreduce.jobtracker.keytab.file: Path to Kerby mapred.keytab (see above).
    • mapreduce.tasktracker.keytab.file: mapred/localhost@hadoop.apache.org
    • mapreduce.tasktracker.keytab.file: Path to Kerby mapred.keytab (see above).
    • mapreduce.jobhistory.kerberos.principal:  mapred/localhost@hadoop.apache.org
    • mapreduce.jobhistory.keytab.file: Path to Kerby mapred.keytab (see above).
    Start Kerby by running the JUnit test as described in the first section. Now start HDFS via:
    • sbin/start-dfs.sh
    • sudo sbin/start-secure-dns.sh
    3) Configure Apache Hive to use Kerberos

    Next we will configure Apache Hive to use Kerberos. Edit 'conf/hiveserver2-site.xml' and add the following properties:
    • hive.server2.authentication: kerberos
    • hive.server2.authentication.kerberos.principal: hiveserver2/localhost@hadoop.apache.org
    • hive.server2.authentication.kerberos.keytab: Path to Kerby hiveserver2.keytab (see above).
    Start Hive via 'bin/hiveserver2'. In a separate window, log on to beeline via the following steps:
    • export KRB5_CONFIG=/pathtokerby/target/krb5.conf
    • kinit -k -t /pathtokerby/target/alice.keytab alice
    • bin/beeline -u "jdbc:hive2://localhost:10000/default;principal=hiveserver2/localhost@hadoop.apache.org"
    At this point authentication is successful and we should be able to query the "words" table as per the first tutorial.

    0 0

    Earlier this year, I showed how to use Talend Open Studio for Big Data to access data stored in HDFS, where HDFS had been configured to authenticate users using Kerberos. A similar blog post showed how to read data from an Apache Kafka topic using kerberos. In this tutorial I will show how to create a job in Talend Open Studio for Big Data to read data from an Apache Hive table using kerberos. As a prerequisite, please follow a recent tutorial on setting up Apache Hadoop and Apache Hive using kerberos. 

    1) Download Talend Open Studio for Big Data and create a job

    Download Talend Open Studio for Big Data (6.4.1 was used for the purposes of this tutorial). Unzip the file when it is downloaded and then start the Studio using one of the platform-specific scripts. It will prompt you to download some additional dependencies and to accept the licenses. Click on "Create a new job" called "HiveKerberosRead". In the search bar under "Palette" on the right hand side enter "hive" and hit enter. Drag "tHiveConnection" and "tHiveInput" to the middle of the screen. Do the same for "tLogRow":

    "tHiveConnection" will be used to configure the connection to Hive. "tHiveInput" will be used to perform a query on the "words" table we have created in Hive (as per the earlier tutorial linked above), and finally "tLogRow" will just log the data so that we can be sure that it was read correctly. The next step is to join the components up. Right click on "tHiveConnection" and select "Trigger/On Subjob Ok" and drag the resulting line to "tHiveInput". Right click on "tHiveInput" and select "Row/Main" and drag the resulting line to "tLogRow":



    3) Configure the components

    Now let's configure the individual components. Double click on "tHiveConnection". Select the following configuration options:
    • Distribution: Hortonworks
    • Version: HDP V2.5.0
    • Host: localhost
    • Database: default
    • Select "Use Kerberos Authentication"
    • Hive Principal: hiveserver2/localhost@hadoop.apache.org
    • Namenode Principal: hdfs/localhost@hadoop.apache.org
    • Resource Manager Principal: mapred/localhost@hadoop.apache.org
    • Select "Use a keytab to authenticate"
    • Principal: alice
    • Keytab: Path to "alice.keytab" in the Kerby test project.
    • Unselect "Set Resource Manager"
    • Set Namenode URI: "hdfs://localhost:9000"

    Now click on "tHiveInput" and select the following configuration options:
    • Select "Use an existing Connection"
    • Choose the tHiveConnection name from the resulting "Component List".
    • Click on "Edit schema". Create a new column called "word" of type String, and a column called "count" of type int. 
    • Table name: words
    • Query: "select * from words where word == 'Dare'"

    Now the only thing that remains is to point to the krb5.conf file that is generated by the Kerby project. Click on "Window/Preferences" at the top of the screen. Select "Talend" and "Run/Debug". Add a new JVM argument: "-Djava.security.krb5.conf=/path.to.kerby.project/target/krb5.conf":
    Now we are ready to run the job. Click on the "Run" tab and then hit the "Run" button. You should see the following output in the Run Window in the Studio:


    0 0

    The JHipster Mini-Book v4.0 is now available as a free download from InfoQ. Get it while it's hot! You'll also be able to buy a print version in a week or two. You can read all about what’s changed since v2.0 on the JHipster Mini-Book blog.

    The source code for the application developed in the book (21-Points Health) is available on GitHub.

    Thanks to the InfoQ publishing team, Dennis Sharpe for tech editing, and Lawrence Nyveen for copy editing. And most of all, thank you Asciidoctor for making the publishing process so easy!


    0 0

    We are moving forward with the Camel in Action 2nd edition book. Manning just informed us that they have moved the book into production phase.

    All chapters and appendixes except for Ch21 have now completed the copyedit and review cycle and have moved into the proofreading and indexing stages of production.

    The proofing is done by a professional, whom has not seen the material before (new fresh set of eyes). Once the proofreading and indexing stages are complete, the final Word manuscript chapters will be sent on to the typesetter for layout, along with prepped graphics from the illustrator.

    When we react this stage Jonathan and myself have chances to review PDFs of the formatted book pages, to ensure all the code examples are formatted correctly and so on.

    At present time Jonathan and I are now working on the front matter, and writing our preface and what else. We are also reaching out to the foreword writers to ensure they hand in their material (Gregor Hohpe and James Strachan are yet to return their forewords).

    And when Henryk Konsek has provided his updates, then chapter 21 can move along and also be part of the book.

    So it starts to look really good and we can start to see the finish line. Its been a long run this time which has taken us almost double the time to complete the 2nd edition than what we did for the 1st edition. But that is possible expected, we are getting older, and the book is also almost twice the size.

    To celebrate this, Manning has the book as deal of the day. This means you get 50% discount if you order the book on September 24th using the couple code: dotd092417au

    The book can be ordered from Manning website at: https://www.manning.com/books/camel-in-action-second-edition

    0 0

    ApacheCon Miami 2017 – Introduction to Cluster Management Framework and Metrics in Apache Solr – Anshum Gupta, IBM Watson



    0 0
  • 09/30/17--15:58: Nick Kew: A new MD
  • Today I have been rehearsing with the EMG, the Exeter-based symphony orchestra that performs with chorus every couple of years.  This is the group with which I have sung in, and much enjoyed, some of the biggest and most exciting works in the repertoire: Mahler’s 8th Symphony, Vaughan Williams’ Sea Symphony, and Britten’s War Requiem.

    A major reason I loved those concerts so much was their inspirational conductor, Marion Wood.  She has now moved elsewhere, so today was my first sight of her successor Leo Geyer.  How would he measure up?  First impression: he’s not inspirational in the sense Marion was, but he does have a good deal to offer, and I expect to go on enjoying EMG events.

    This is a lesser programme for chorus than the others: we’re only in half the programme.  The main choral work is Geyer’s own version of Elgar’s Enigma Variations, drawing on text from The Music Makers  – a work which also shares some musical material with the Enigma.

    Having spent time on this piece, I was curious to find out more about Geyer’s track record, so I googled.  He seems to be a musician of some distinction: his conducting includes Covent Garden as well as his own ballet company, and he’s won a serious-looking composition prize.  This is a young man making quite a name for himself!

    What about his composition?  I watched his prize work on youtube (here) and found myself much enjoying it.  Though I doubt I’d have liked it so much if it had been just the music without the visual aspect, which presents a circus-style ringmaster and clowns.  The Darmstadt tradition of squeak-bang “modern” music (as exemplified by Stockhausen and Boulez) is strong in there, but at the same time it’s playful and exciting, and ever-lively.  Among established works, Weir’s Night at the Chinese Opera might be a comparison.  And youtube’s recommendation of Pierrot Lunaire as a followup suggests a century’s worth of tradition behind it.

    Caveat: after a day with EMG I’m on a bit of a high, and my critical judgement may be mildly impaired.



    0 0

    Today is my last day in Sourcesense and I would like to dedicate some time for sharing with all of you thoughts about my path and all the great colleagues that I met here.

    I started my career in the ECM area working in Etnoteam (now NTT Data) and after four years working with Open Source technologies, I simply asked for contributing back to the community.

    When I realised that the company didn't want to invest time on giving back to the community, I started to search for a new job, where my passion would had been aligned with the core business and then I found Sourcesense.

    Sourcesense gave me the opportunity to increase my vision in the national and international field. My experience here was impressive, I learnt a lot of things and I met key people behind products and frameworks that I use every day for implementing projects.

    I also had the chance to lead the internal Innovation Strategy Committee working directly with the CEO, the Sales Team and the Delivery Team for helping and facilitate the overall approach in terms of value added on different aspects.

    It's time for me to start a new challenge with a brand new enterprise context, with a new team and different customers.

    I would like to thank each former and current colleague in Sourcesense for their work, their priceless support and their energy.

    THANK YOU ALL AND TAKE CARE :)


    0 0

    One of my voracious reader friends introduced me to Tana French and her Dublin Murder Squad series, of which In the Woods is the first entry.

    Structurally, In the Woods is a classic mystery: something horrible has happened, and the detectives are called; evidence is collected; witnesses are interviewed; leads are developed and followed; more is learned.

    Along the way, we explore issues such as gender discrimination in the workplace and the ongoing effects of the great recession of 2008.

    What distinguishes In the Woods is not these basic elements, but more the style and depth with which they are elaborated and pursued.

    But did I mention style? What really makes In the Woods a delight is the ferocious lyricism that French brings to her writing.

    For instance, here are three children, playing follow-the-leader in the woods:

    These three children own the summer. They know the wood as surely as they know the microlandscapes of their own grazed knees; put them down blindfolded in any dell or clearing and they could find their way out without putting a foot wrong. This is their territory, and they rule it wild and lordly as young animals; they scramble through its trees and hide-and-seek in its hollows all the endless day long, and all night in their dreams.

    They are running into legend, into sleepover stories and nightmares parents never hear. Down the faint lost paths you would never find alone, skidding round the tumbled stone walls, they stream calls and shoelaces behind them like comet-trails.

    How marvelous is this, at every level!

    Structurally, it's almost poetry, with a natural sing-song cadence and a subtly-reinforced pattern induced by the simple rhythms ("they know...", "they rule...", "they scramble...", "they stream...").

    Stylistically, each little turn of phrase is so graceful and just right ("their own grazed knees", "wild and lordly as young animals", "calls and shoelaces").

    And then:

    They are running into legend, into sleepover stories and nightmares parents never hear.

    Wow.

    Anyway, that's just page 2. French is just as polished and capable on page 302, and, like any good mystery, once you start, you won't want to stop, even as you know (or think you know) what lies ahead.

    From what I hear, French's subsequent books are wonderful as well; I shall certainly read more.


    0 0

    Last weekend I found some time to hack on new tooling for doing Apache Camel route coverage reports. The intention is to provide APIs and functionality out of the box from Apache Camel that other tooling vendors can leverage in their tooling. For example to show route coverage in IDEA or Eclipse tooling, or to generate SonarType reports, etc.

    I got as far to build a prototype that is capable of generating a report which you run via the camel-maven-plugin. Having a prototype built in the came-maven-plugin is a very good idea as its neutral and basically just plain Java. And make its possible for other vendors to look at how its implemented in the camel-maven-plugin and be inspired how to use this functionality in their tooling.

    I wanted to work on the hardest bit first which is being able to parse your Java routes and correlate to which EIPs was covered or not. We do have parts of such a parse based on the endpoint validation tooling which already exists in the camel-maven-plugin, which I have previously blogged about. The parser still needs a little bit more work however I do think I got it pretty far over just one weekend of work. I have not begun adding support for XML yet, but this should be much easier to do than Java, and I anticipate no problems there.

    I recorded a video demonstrating the tooling in action.

    I forgot to show in the video recording, that you can change the Camel routes, such as inserting empty lines, adding methods to the class etc, and when you re-run the camel-maven-plugin it will re-parse the route and output the line numbers correctly.

    Anyway enjoy the video, its about 12 minutes long, so go grab a cup of coffee or tea first.



    The plan is to include this tooling in the next Apache Camel 2.21 release which is scheduled in January 2018.

    The JIRA ticket about the tooling is CAMEL-8657

    Feedback is much welcome, and as always we love contributions at Apache Camel, so you are welcome to help out. The current work is upstream on this github branch. The code will be merged to master branch later when Apache Camel 2.20.0 is officially released.

    When I get some more time in the future I would like to add support for route coverage in the Apache Camel IDEA plugin. Eclipse users may want to look at the JBoss Fuse tooling which has support for Apache Camel, which could potentially also add support for route coverage as well.


    0 0

    I can't say this was a total stunner, but still: USA Stunned by Trinidad and Tobago, Eliminated From World Cup Contention

    The nightmare scenario has played out for the U.S. men's national team.

    A roller coaster of a qualifying campaign ended in shambles, with a stunning 2-1 loss to Trinidad & Tobago, coupled with wins by Panama and Honduras over Costa Rica and Mexico, respectively, has eliminated the USA from the World Cup. The Americans will not be playing in Russia next summer.

    Trinidad and Tobago, which hadn't won in its last nine matches (0-8-1), exacted revenge for the 1989 elimination at the hands of the United States, doing so in stunning fashion. An own goal from Omar Gonzalez and a rocket from Alvin Jones provided the offense, while Christian Pulisic's second-half goal wasn't enough to save the Americans.

    Oh, my.

    And it seems like there's a fair chance I won't be able to root for Leo Messi, either?

    Well, what shall I do?

    Let's see: there's still Iceland! They're easy to root for!

    Perhaps Wales? Perhaps Costa Rica? Perhaps Chile?

    I'm ready, I'm an eager Yankee, looking for a team with some charisma, some elan, some heart, some fighting spirit.

    Where are you? Are you out there?

    It's still a few weeks until the tournament qualifications are known.

    I guess I've got time to start looking...


    0 0

    Hierarchical Scheduling for Diverse Datacenter Workloads

    In this post we’ll cover the paper that introduced HDRF (Hierarchical Dominant Resource Fairness) which builds upon the team’s existing work DRF (Dominant Resource Fairness), but looking to also provide hierarchical scheduling.

    Background

    Prior work DRF, was an algorithm that was able to decide how to allocate multi-dimensional resources to multiple frameworks, which it described how it can enforce fairness when scheduling multiple resource types with a flat hierarchy:

                     DRF  

        | —— |  —— | —— – |

       dev   test staging prod

        10     10      30       50

    However, in most organizations it’s important to be able to describe resource allocation weights in a hierarchy that reflects its organizational intent:

                     H-DRF  

        | —— |  —— | —— – |

       fe      ads     spam   mail

       30      20        25       25

       /\        /\          /\         /\

     d  p     d p       d p     d  p       (d = dev, p = prod)

     50 50 20 80    30 70  40 60

    The key difference with hierarchical scheduling is that when a node is not using its resources, it’s redistributed among the sibling nodes as opposed to all leaf nodes. For example, when dev environment in FE is not using its resources, it’s allocated to prod in FE instead. 

    Naive implementations of hierarchical and multi-resource scheduling (such as collapsing the hierarchy into a flat hierarchy, or simply running DRF from root to leaf node) can lead to starvation, where in our example certain dev and prod environment never receiving any or their fair share of resources. This is referred as hierarchical share guarantee.

    H-DRF

    To avoid the problem of starvation, H-DRF incorporates two ideas when considering dominant share in the leaf nodes. The first idea is rescaling the leaf node’s resource consumption to the minimum node. The second idea is to ignore rescaling blocked nodes, where a node is blocked if one of the resources request is saturated or when it has no more tasks to launch. The actual proof and steps of the implementation is covered in the paper, and I won’t go over here in details. 

    Notes

    The interesting piece that was highlighted in this paper was that Hadoop implemented a naive version of HDRF and therefore has bugs where it can cause starvation in the tasks. Therefore, it’s not straightforward when attempting to modify how DRF works without proofing it’s starvation free and also provides fairness (unless it’s not the primary goal for your change). 

    That said, there are more papers that continues to extend and modify DRF and also shown ways that can continue to show blindspots that HDRF didn’t cover, which I’ll try to cover more in the future.



    0 0

    최근 들어던 세미나에 의하면 자연 법칙으로 존재하는 관성은 조직에도 존재한다.

    다소 철학적 고찰이긴 한데, 우선,  관성(inertia)부터 알아보자.

    관성이란 버스가 급출발 할 때 뒤로 쏠리는 그 힘이 바로 관성이다.  관성은 원래 상태를 유지하려는 성질에 불과하니 진짜힘이 아니다. 그래서 관성을 가짜힘이라고 한다. 진짜힘은 버스를 급출발 시키는 힘이다.

    몸이 뒤로 쏠리는 가짜힘의 크기는 버스를 급출발 시키는 진짜힘의 크기에 의해서만 결정된다. 이게 관성과 운동의 상대성이다.

    관성과 운동의 상대성을 생각하면, 조직의 관성은 변화의 크기에 의해서 결정되므로 그것은 요주의 대상이 아니라 오히려 변화의 파도라고 볼 수 있겠다.

    0 0

    Apache Camel 2.20 has been released today and as usual I am tasked to write a blog about this great new release and what's the highlights.


    The release has the following highlights.

    1) Java 9 technical preview support

    We have started our work to support Java 9 and this release is what we call technical preview. The source code builds and runs on Java 9 and we will continue to improve work for official support in the following release.

    2) Improved startup time

    We have found a few spots to optimise the startup time of Apache Camel so it starts 100 - 200 milli seconds faster.

    3) Optimised core to reduce footprint

    Many internal optimisations in the Camel routing engine, such as reducing thread contention when updating JMX statistics, reducing internal state objects to claim less memory, and reducing the number of allocated objects to reduce overhead on GC etc, and much more.

    4) Improved Spring Boot support and preparing for Spring Boot 2

    We have improved Camel running on Spring Boot in various ways.

    We also worked to make Apache Camel more ready and compatible with the upcoming Spring Boot 2 and Spring Framework 5. Officially support for these is expected in Camel 2.21 release.

    5) Improved Spring lifecycle

    Starting and stoping the CamelContext when used with Spring framework (SpringCamelContext) was revised to ensure that the Camel context is started last - when all resources should be available, and stopped first - while all resources are still available.

    6) JMS 2.0 support

    The camel-jms component now supports JMS 2.0 APIs.

    7) Faster Map implementation for message headers

    If you include camel-headersmap component on the classpath, then Camel will auto detect it on startup and use a faster implementation of case-insenstive map (used by camel message headers).

    8) Health-Check API

    We have added experimental support for a new health-check API  (which we will continue to work on over the next couple of releases).  The health checks can be leveraged in in cloud environments to detect non healthy contexts.

    9) Cluster API

    Introduced an experimental Cluster SPI (which we will continue to work on over the next couple of releases) for high availability contexts, out of the box Camel supports: atomix, consul, file, kubernetes and zookeeper as underlying clustering technologies through the respective components.

    10) RouteController API

    Introduced an experimental Route Controller SPI (which we will continue to work on over the next couple of releases) aimed to provide more fine-grained control of routes, out of the box Camel provides the following implementations:

    • SupervisingRouteController which delays startup of the routes after the camel context is properly started and attempt to restart routes that have not been starter successfully.
    • ClusteredRouteController which leverages Cluster SPI to start routes only when the context is elected as leader.

    11) More components

    As usual there is a bunch of new components for example we have support for calling AWS lambda functions in the camel-aws component. There is also a new json validator component, and camel-master is used with the new Cluster API to do route leader election in a cluster. There is 13 new components and 3 new data formats. You can find more details in the Camel 2.20 release notes.

    We will now start working on the next release 2.21 which is scheduled in start of 2018. We are trying to push for a bit quicker release cycle of these bigger Camel releases, so we can go from doing 2 to 3 releases per year. This allows people to quicker pickup new functionality and components etc.

    Also we want to get a release out that officially support Java 9, Spring Boot 2 and all the usual great stuff we add to each release, and what the community contributes.




    0 0

    (Today we’re interviewing Shane Curcuru about the recent issues reported with Facebook’s React.js software’s BSD + PATENTS file license, and what the Apache Software Foundation (ASF) has to do with it all. Shane serves in a leadership position at the ASF, but he wants you to know he’s speaking only as an individual here; this does not represent an official position of the ASF.)

    UPDATE:Facebook has relicensed React.js as well as some other software under the MIT license, without the FB+PATENTS file. That’s good news, in general!

    Hello and welcome to our interview about the recent licensing kerfuffle around Facebook’s React.js software, and the custom license including a custom PATENTS file that Facebook uses for the software.

    You’ve probably seen discussions recently, either decrying the downfall of your startup if you use React, or noting that this is an old issue that’s just a paper tiger. Let’s try to bring some clarity to the issue, and get you some easy-to-understand information to make your own decision. To start with, Shane, can you briefly describe what the current news hype is all about? Is this a new issue, or an old one?

    Well, like many things around licensing, the details are complicated, but the big picture is fairly simple. Big picture, the current news hype is only about policy at the ASF, and does not directly affect anyone else. The only recent change was made for projects already at Apache, and even that change will take a while to implement.

    I’m confused — isn’t this a new change in the licensing for the React.js project?

    No, actually — Facebook’s React.js project has used this license (often called BSD + PATENTS, but it’s really a Facebook-specific file) for several years, so the underlying issue with this specific PATENTS file is old. It’s just getting attention now because the ASF has made a change in their licensing policy. The current change last month was to declare that for Apache projects, the custom PATENTS clause that Facebook uses on React.JS software is now officially on the “Category-X” list of licenses that may not be shipped in Apache projects.

    So the news is about the fact that Apache projects will no longer include React.js in their source or releases. This is a policy change, and only affects Apache projects, but obviously it’s gotten some news coverage and has gotten a lot of developers to really go back and pay attention to the licensing deails around React.

    Many of our readers probably don’t understand what “Category X” means, unless it’s an X-Files reference. Can you explain more how the ASF determines which kinds of software licenses are acceptable in Apache projects?

    Great question, yes, Category X is the ASF’s term for software licenses that, by ASF policy, may not appear in Apache software source repositories or software releases. This is an operational decision by the ASF, and doesn’t mean that various licenses are incompatible with the Apache 2.0 license — just that the ASF doesn’t want it’s projects shipping code using these licenses.

    The rationale is this: the ASF wants to attract the maximum number of inbound contributions. Thus, we use the permissive and as some say “business-friendly” Apache license for all ASF software. This allows maximum freedom for people who use Apache software to do as they please, including making proprietary software. Part of the Apache brand is this expectation: when you get a software product from the ASF, you know what to expect from the license. Besides not suing us and not using Apache trademarks, the only real restriction is including the license if you redistribute something based on Apache 2.0 licensed software.

    Licenses that the ASF lists as Category X add additional restrictions on use to end users of the software, above and beyond what Apache 2.0 requires. The most obvious example are GPL* copyleft licenses, that require redistributors to provide any changes made publicly, under the GPL.

    OK — So Category X isn’t a legal determination of incompatibility, it’s just a policy choice the ASF is making? Is that right?

    Exactly right. Others are free to mix licenses in various ways — but the ASF chooses to not redistribute software with more restrictive licenses than Apache 2.0. So when you download an Apache product, it won’t have Category X software like React in it — but you’re free to mix Apache products with React yourself, if you like.

    Aren’t there some Apache projects shipping with React today, like CouchDB?

    Yes — CouchDB currently includes React in their tree and past releases, as do a few other projects. These projects will warn their users (by a NOTICE file or blog post) that their releases contain more restrictive licensed software, and are working on plans to re-design things to remove React and replace it with other, less restrictively licensed libraries.

    And before you ask, yes, this is extra work for the volunteer projects at Apache, and it’s not something the ASF does lightly. But ensuring that Apache projects have clean IP that never includes any licensing restrictions beyond what the well-known Apache 2.0 license requires is critical to the broad acceptance of Apache software everywhere.

    So if this recent change in ASF policy only affects Apache projects, why is it getting so much attention in tech circles these days?

    Because the ASF policy announcement has made some people go back and really look at Facebook’s custom BSD + PATENTS file license used in React. This is a good thing — you should always understand the licenses of software you’re using so you follow them — and so you don’t have surprises later, like now. People using React are already bound by this license, it’s just that many people didn’t look into the details until now.

    There are two conceptual issues here in terms of how open source participants decide if they want to accept Facebook’s license here. First is the addition of Facebook’s custom-written PATENTS file. Very briefly, it states that if you sue Facebook over (almost any) patent issues, you loose your license to Facebook patents. The first issue is that this patent termination clause — which is in a fair number of licenses — is a strict and exclusionary clause. The balance of rights granted (or taken away, if you sue) is strongly tilted to Facebook as a specific entity. It’s not the more even and generic balance of patent termination rights that are in the Apache 2.0 license.

    That asymmetry in patent rights is the problem: it directly puts Facebook’s interests above everyone else’s interests when patent lawsuits around React happen. Of course, there are a lot more details to the matter, but for those questions you need to ask your own attorney — all I can say is that it’s an issue that will happen incredibly rarely, if ever, for open source projects.

    So the Facebook BSD + PATENTS file license favors Facebook, even though they’re an open source project that wants your contributions. We kind of get that; patents are always tricky, but the asymmetry in rights there does seem a little odd compared to other licenses. You said there were two conceptual issues?

    The second conceptual issue is simpler to explain. The Facebook BSD + PATENTS file license is not on the OSI list of open source licenses.

    (pause) Um, is that it? What’s the real issue here about OSI approval?

    Yup, that’s the core of the issue. Being on the OSI list is huge. The generally accepted definition of “open source” is that your software’s license is listed by OSI.

    The reason OSI listing is key is because enough lawyers in many, many companies have vetted the OSI list licenses that the ecosystem knows what to expect. The OSI has a strong reputation, so to start with people know basically what to expect in terms of overall license to OSI listed licenses. More importantly, these licenses have been vetted over and over by counsel from a wide variety of companies.

    A lot of law work is risk management: ensuring your rights are preserved when doing business or using licenses. OSI-listed licenses are well known, so lawyers can quickly and confidently express the level of risk in using them. Non-OSI licenses mean the lawyers have to read them in detail, and do a new and comprehensive review of risks. It’s not just the work, it’s the uncertainty with something new that typically translates into saying “This new license has more risks than those well-used ones.”

    Now I get it — OSI licenses are popular and frequently reviewed, so people are comfortable with them. A new license — like the Facebook PATENTS file — might not be bad, but might be — people don’t know it well enough yet.

    Exactly right. I can’t think of any good reason for companies that want to work with open source groups to ever use a non-OSI listed license. People keep thinking so, but license proliferation is not worth it. Successful open source projects need new contributors from a variety of places. Keeping barriers to entry low — like unusual licenses — is one of the easiest ways to turn users into potential contributors.

    If the Facebook PATENTS license is unusual enough to turn off other projects from using it, like Apache, why won’t Facebook consider changing the license to an OSI-approved one?

    That’s a question you’ll need to ask Facebook. The ASF already asked Facebook to consider changing the license, and they said no. Facebook also wrote an explainer for their license that’s been widely shared.

    We have one listener asking: Is the Facebook PATENTS license viral? That is, if you use React.js in your software, must you use the same Facebook PATENTS license?

    No, the PATENTS clause is not “viral”, or rather, it’s not copyleft. So you are free to use whatever license you want on any software you write that uses or incorporates React.js.

    Note that the actual patent grant from Facebook to anyone using React.js software — even if it’s inside of your software project — is still there. The PATENTS terms apply to anyone who’s running the React.js software, and are between Facebook and all the end users. So that patent licensing issue doesn’t affect you as an application builder directly, but it might affect your users.

    Great, well we’ve covered a lot of ground in this interview. What else should readers know about, so they can make up their own mind about the licensing risks around React — that were always there, but they might not have understood.

    TL;DR: the only short-term question is if you’re thinking about donating your project to Apache. If so, start planning now to migrate away from React, because you won’t be able to bring it with you.

    For everyone else, this is a non-issue in the short term. Longer term, it’s something you should make your own mind up about, by considering all the aspects of any change: legal risk (probably low, but it’s patents so who knows), technology (several replacements out there, but none yet as strong as React), and community (what development capacity do you have, and does your community of contributors care?)

    I wrote a brief guide about the legal, technical, and community aspects of deciding to use or not use React earlier.

    Also — if you have strong opinions about this, let people — and Facebook — know! I have to say a some open source types were quite surprised when Facebook refused the ASF’s request to relicense. Facebook has some great open source projects, including some open governance ones. I’m personally a little surprised they aren’t using an OSI license for this kind of stuff.

    Thanks for reading along with Shane’s interview of Shane on the React licensing issue! Good luck to your project whichever licenses you choose.

    For More Information About React Licensing

    The ASF’s publishes their Licensing policies, including the Category X list, and some rationale for policy decisions on licenses at Apache.

    UPDATE! Automattic, the company behind WordPress, will be moving away from React:

    “We’ll look for something with most of the benefits of React, but without the baggage of a patents clause that’s confusing and threatening to many people

    Simon Phipps’ timeline and discussion about how Apache moved the PATENTS license to the Category X list:

    https://meshedinsights.com/2017/07/16/apache-bans-facebooks-license-combo/

    A popular post here on Medium focused on CTOs, with a balanced view, including a discussion on one patent lawsuit between Facebook and Yahoo!:

    https://medium.com/@ji/the-react-license-for-founders-and-ctos-b38d2538f3e5

    Detailed (long)discussion of “what does this mean for my project” from an engineer’s perspective:

    An Apache CouchDB developer’s take on React and the license:

    If you’re a startup, you should not use React (community/startup aspects):

    Don’t over-REACT to the Facebook Patents License (legal aspects)

    Why the Facebook Patents License Is A Paper Tiger (legal aspects)

    Why Facebook Patents License Was A Mistake — an early explanation from Simon Phipps on why the PATENTS license is bad for the open source ecosystem

    The post FAQ for Facebook React.js BSD + PATENTS License issue with the Apache Software Foundation appeared first on Community Over Code.


    0 0

    Digitalization is heavily used in enterprises today to achieve business success. Business entities which do not embrace this change are losing their market share and going down day-by-day, as the human society is now experiencing digitalization at a global scale. This experience starts with all day-to-day activities to the major political, industrial, informational, educational and even cultural engagements. In essence, we are experiencing a Digital Revolution.


    We are experiencing a Digital Revolution
    There are many examples of businesses not being able to transform itself with the emerging technologies and be defeated by their competitors:

    • Kodak defeated by the Digital Camera, mainly by Nikon and Canon
    • Nokia defeated by the smartphones, mainly by Samsung and Apple
    • BlockBuster Video defeated by online streaming, mainly by Netflix
    • Borders Books and Music defeated by online Bookstores, mainly by Amazon
    And this list continues..

    Digital Transformation

    With digitalization, consumer expectations on the quality of a service and the speed or the turnover time of the service have increased dramatically. Lack of adherence to this change is usually seen by the consumer market as lack of innovation in products and services. The process of shifting from the existing business methodologies to the new digitalized business methodologies or offerings is considered as Digital Transformation. A more formalized definition to this could be given as;
    “Digital Transformation is the use of technology to radically improve performance or reach of the enterprise”
    Let me take an example from the FIT (Fully Independent Traveller) domain. Assume that there is a good hotel booking service, available probably over the internet and many other media (such as via a call-center, a physical office, etc.) and they have had a very good market share of the hotel bookings. However if they continue to provide the same service that they have used to be offering, they will sure be losing their market share as there are competing services emerging to provide better QoS, lower response times and possibly novel and more convenient media (such as mobile applications); and most importantly, a better experience for the bookings through the use of new technology.
    Success of the business relies on the satisfaction of the consumers


    So how could our hotel booking service leverage digitalization to achieve this innovation for the business success of their products and services? Usually, this is the responsibility of the CIO and CTO, who should look at the following 3 key areas in terms of transforming the existing business into a Digital Business.



    Customer Experience

    Digital advances such as analytics, social media, mobile applications and other embedded devices help enterprises to improve customer experience.

    Operational Process

    CIOs are utilizing technology to be competitive in the market, and the urge to improve business performance also leads executives to think of improving internal processes with digitalization.

    Business Model

    Business processes/models and marketing models have to be adapted to support the new information age and transform themselves to seamlessly function with the rest of the digitalization stream.
    However digital transformation is not just the use of technology to achieve these qualities; rather, the complete business has to operate as one single system, from an external consumer’s perspective. That requires existing systems, new components, mobile applications and the bleeding edge IoT devices to be able to intercommunicate.

    System Integration

    With the rise of Information Technology, large businesses started to benefit by introducing software systems to accomplish different tasks within the enterprise. They have introduced different monolithic systems such as HR management systems, SAP, ERP systems, CRM systems and many more. However these systems were designed to perform a specific tasks, and later on, to achieve the intercommunication of these incompatible systems, different concepts had to be introduced. Starting with manual data entry between different systems, these gradually evolved into EAI, and have been driven towards API Management and Microservices, with ETL, EI and ESB in the middle. The formal definition of system integration could be presented as;
    “System Integration is the process of linking together different computing systems and software applications functionally to act as a coordinated whole”
    Integration in the enterprise domain has now been further expanded to mobile devices as well as IoT through APIs and Microservices, in order to meet consumer experience expectations.
    So in essence, integration of systems, APIs and devices plays a vital role in Digital Transformation as it essentially requires all these to connect with each other and intercommunicate. Ability to seamlessly integrate existing systems without much overhead is the key to a successful implementation of Digital Transformation.
    The other aspect of this equilibrium function of business success is the Time to Market factor, which requires integration and all these new technology usages to be adopted as fast as possible. However, these integration problems require some careful design, together with a good amount of development to achieve protocol and transport bridging, data transformation, routing and other required integration functionalities, despite the availability of dozens of frameworks and products to facilitate the same. 

    Composable Integration

    In order to reduce this integration development time, AdroitLogic has developed a lean framework for integration named Project-X, with a rich ecosystem of tooling around it. Project-X facilitates 3 building blocks for integration:

    Connectors & Processors are used to compose integration flows, without having to write a single line of code!

    Connectors

    Connectors could be used either to accept messages/events from outside (via Ingress Connectors) or to send out/invoke external services/functionalities (via Egress Connectors). In the rare case of not being able to find the ideal match in the existing connector palette, you could easily implement your own reusable connector.

    Processors

    Any processing of an accepted message such as conditional evaluations, routing, transformations, value extractions, composition, data enrichment, validation etc., could be achieved by the army of pre-built processors. In the rare case of not being able to find the most suitable processor to implement your functionality, you could implement your own reusable processor to be included in your integration flow.

    Features

    A set of utility functions available to be utilized by the processors and connectors, all of which could also be utilized by any custom connector or processor that you want to write. On top of that, you could also write your own features, which might utilize existing features in turn.
    All these pieces are seamlessly assembled via a composable drag-n-drop style integration flow editor named UltraStudio, based on the IDE champion IntelliJ IDEA, allowing you to compose your integration flows in minutes, test and debug it on the IDE itself, and build the final artifact using Maven to deploy it into the UltraESB-X lean runtime container for production execution.

    Compose your integration flow, test and debug it on the IDE itself prior to deployment
    You can pick and choose the relevant connectors and processors to be utilized in your project, from the existing connector and processor store, and the project artifact will be built as a self-contained bundle which could be deployed in the integration runtime without any hassle of adding other required drivers or 3rd party jar files; yet the runtime will have that — and only that — set of dependencies, making your execution runtime as lean as possible. Further, this makes the project lifecycle and maintainability of the solution more robust, as the project could use your existing version control systems and continuous integration management to benefit from the collaboration techniques that you have already been practicing.
    AdroitLogic is in the process of building the 4th layer on top of this, named Templates. These templates will have reusable, parameterized patterns (or frameworks) for building your integration flows. For example, a solution which requires guaranteed delivery to a defined set of downstream systems with specific mapping and filtering criteria, together with validation from a given upstream system with traceability and statistics, could utilize an existing template and just compose the mapping and filtering criteria to implement the whole solution in a matter of minutes.
    In conclusion, if your organization has not yet started on the digital transformation, this is high time that you consider stepping up your pace. While this will have multiple streams of transformation and a lot of impact on the way the business currently operates, one good initiating point is to start integrating your existing systems to work seamlessly, and facilitating the ability to connect with your partners and consumers through the latest technologies to improve consumer experience.

    0 0


    0 0


    0 0

    • Open-sourcing RacerD: Fast static race detection at scale | Engineering Blog | Facebook Code

      At Facebook we have been working on automated reasoning about concurrency in our work with the Infer static analyzer. RacerD, our new open source race detector, searches for data races — unsynchronized memory accesses, where one is a write — in Java programs, and it does this without running the program it is analyzing. RacerD employs symbolic reasoning to cover many paths through an app, quickly.
      This sounds extremely interesting…

      (tags: racerdrace-conditionsdata-racesthread-safetystatic-code-analysiscodingtestingfacebookopen-sourceinfer)

    • Solera – Wikipedia

      Fascinating stuff — from Felix Cohen’s excellent twitter thread.

      Solera is a process for aging liquids such as wine, beer, vinegar, and brandy, by fractional blending in such a way that the finished product is a mixture of ages, with the average age gradually increasing as the process continues over many years. The purpose of this labor-intensive process is the maintenance of a reliable style and quality of the beverage over time. Solera means literally “on the ground” in Spanish, and it refers to the lower level of the set of barrels or other containers used in the process; the liquid (traditionally transferred from barrel to barrel, top to bottom, the oldest mixtures being in the barrel right “on the ground”), although the containers in today’s process are not necessarily stacked physically in the way that this implies, but merely carefully labeled. Products which are often solera aged include Sherry, Madeira, Lillet, Port wine, Marsala, Mavrodafni, Muscat, and Muscadelle wines; Balsamic, Commandaria, some Vins doux naturels, and Sherry vinegars; Brandy de Jerez; beer; rums; and whiskies. Since the origin of this process is undoubtedly out of the Iberian peninsula, most of the traditional terminology was in Spanish, Portuguese, or Catalan.

      (tags: wineagingsolerasherrymuscatvinegarbrandybeerrumwhiskeywhiskybrewingspain)


    0 0

    open-source-summit-europe-2017.png

    During the next week I'll join the next Open Source Summit Europe in Prague. This year the event will includes different conferences under the same location (Hilton Prague): LinuxCon, ContainerCon, CloudOpen and the new Open Community Conference.

    I'm excited to contribute at this event in two different ways on behalf of the Apache Software Foundation:

    1. As a speaker inside the Open Community Conference
    2. As a sponsor during MesosCon taking care of the ASF booth duty with Sharan Foga

    During my session I'll share the path followed by the Apache ManifoldCF community for achieving the final graduation as a Top Level Project. I'll try to explain what the Apache Way steps are and how we have approached all the challenges during the journey. 

    The title of the session is The Journey of Apache ManifoldCF: learning from ASF's successes and I'll keep it on Wednesday 25th at 11:15am. If you want to take a look all the schedule please visit the program page.

    I would like to thank Apache Community for mentioning me in the article about the Community Development and hope this helps.

    Let me know if you come at this event and come to say hello at the booth :)

    CoverOSSummit17ApacheManifoldCF.png

    0 0

    this release fixes a serious bug in the difference engine when documents only differ in namespace prefix.

    The full list of changes for XMLUnit.NET:

    • elements that only differed in namespace prefix resulted in a false `ELEMENT_TAG_NAME` difference when compared. Issue #22

    0 0


    0 0

    I seem to be posting a lot less frequently recently. I was traveling, work has been crazy busy, you know how it goes. Oh, well.

    I was looking at some stuff while I was traveling, and reviewing what I thought, and decided it still holds, so I decided to post it here.

    It ain't perfect, but then nothing is, and besides which you get what you paid for, so here are my 100% free of charge simple rules for online security:

    • Always do your banking and other important web accesses from your own personal computer, not from public computers like those in hotel business centers, coffee shops, etc.
    • Always use Chrome or Firefox to access "important" web sites like the bank, credit cards, Amazon, etc.
    • Always use https:// URLs for those web sites
    • Always let Windows Update automatically update your computer as it wants to, also always let Chrome and Firefox update themselves when they want to.
    • Stick with GMail, it's about as secure as email can get right now. Train yourself to be suspicious of weird mails from unknown senders, or with weird links in them, just in case you get a "phishing" mail that pretends to be from your bank or credit card company, etc.
    • If you get a mail from a company you care about (bank, retirement account, credit card, health care company, etc.), instead of clicking on the link in the mail, ALWAYS open up a new browser window and type in the https:// URL of the bank or whatever yourself. It's clicking the link in the email that gets you in trouble.
    • At least once a week or so, sign on and look at your credit card charges, your bank account, your retirement account, etc., just to see that everything on there looks as it should be. If not, call your bank and credit card company, dispute the charge, and ask them to send you a new credit card, ATM card, whatever it was.
    • Don't accept phone calls from people who aren't in your contacts, or whose call you didn't expect. If you accept a phone call that you think might be legitimate (e.g., from your bank or credit card company), but you need to discuss your account, hang up and call them back, using the main service number from their web site, not the number that called you. Never answer "security questions" over the phone unless you initiated the call yourself. Con artists that call you on the phone can be really persuasive, this is actually the biggest threat nowadays I think.
    If you do these simple things, you have made yourself a sufficiently "hard" target that the bad guys will go find somebody who's a lot easier to attack instead of you.

    0 0

    Book three of the Expanse series is Abaddon's Gate.

    Abaddon's Gate starts out as a continuation of books one and two.

    Which is great, and I would have been just fine with that.

    But then, about halfway through (page 266, to be exact), Abaddon's Gate takes a sudden and startling 90 degree turn, revealing that much of what you thought you knew from the first two books is completely wrong, and exposing a whole new set of ideas to contemplate.

    And so, then, off we go, in a completely new direction!

    One of the things I'm really enjoying about the series is the "long now" perspective that it takes. You might think that a couple thousand years of written history is a pretty decent accomplishment for a sentient species, but pah! that's really nothing, in the big picture of things.

    If you liked the first two books, you'll enjoy Abaddon's Gate. If you didn't like any of this, well, you probably figured that out about 50 pages into Leviathan Wakes and that's fine, too.


    0 0

    Historically the Apache ActiveMQ message broker was originally created in a time where large messages was measured in MB and not in GB as you may do today.

    This is not the case with the next generation broker Apache ActiveMQ Artemis (or just Artemis) which has much better support for large messages.

    So its about time that the Camel team finally had some time to work on this to ensure Camel work well with Artemis and large messages. This work was committed this weekend and we provided an example to demonstrate this.

    The example runs Camel with the following two small routes:



    The first route just route files to a queue on the message broker named data. The 2nd route does the opposite, routes from the data queue to file.

    Pay attention to the 2nd route as it has turned on Camel's stream caching. This ensures that Camel will deal with large streaming payloads in a manner where Camel can automatic spool big streams to temporary disk space to avoid taking up memory. The stream caching in Apache Camel is fully configurable and you can setup thresholds that are based on payload size, memory left in the JVM etc to trigger when to spool to disk. However the default settings are often sufficient.

    Camel then uses the JMS component to integrate with the ActiveMQ Artemis broker which you setup as follows:



    This is all standard configuration (you should consider setting up a connection pool as well).

    The example requires to run a ActiveMQ Artemis message broker separately in a JVM, and then start the Camel JVM with a lower memory setting such as 128mb or 256mb etc which can be done via Maven:

      export MAVEN_OPTS="-Xmx256m"

    And then you run Camel via Maven

      mvn camel:run

    When the application runs, you can then copy big files to the target/inbox directory, which should then stream these big messages to the Artemis broker, and then back again to Camel which will then save this to the target/outbox directory.

    For example I tired this by copying a 1.6gb docker VM file, and Camel will log the following:
    INFO  Sending file disk.vmdk to Artemis
    INFO  Finish sending file to Artemis
    INFO  Received data from Artemis
    INFO  Finish saving data from Artemis as file

    And we can see the file is saved again, and its also the correct size of 1.6gb

    $ ls -lh target/outbox/
    total 3417600
    -rw-r--r--  1 davsclaus  staff   1.6G Oct 22 14:39 disk.vmdk

    I attached jconsole to the running Camel JVM and monitored the memory usage which is shown in the graph:


    The graph shows that the heap memory peaked at around 130mb and that after GC its back down to around 50mb. The JVM is configured with a max of 256mb.

    You can find detailed step by step instructions with the example how exactly to run the example, so you can try for yourself. The example is part of the upcoming Apache Camel 2.21 release, where the camel-jms component has been improved for supporting javax.jms.StreamMessage types and has special optimisation for ActiveMQ Artemis as demonstrated by this example.

    PS: The example could be written in numerous ways, but instead of creating yet another Spring Boot based example we chose to just use plain XML. In the end Camel does not care, you can implement and use Camel anyhow you like.



    0 0

    This month's Significance magazine includes a jarring article about fake news.  As one would expect in Significance, there is interesting empirical data in the article.  What I found most interesting was the following quote attributed to Dorothy Byrne, a British broadcast journalism leader:
    "You can't just feed people a load of facts...we are social animals, we relate to other people, so we have to always have a mixture of telling people's human stories while at the same time giving context to those stories and giving the real facts." 
    Just presenting and objectively supporting debunking factual evidence is not sufficient.  We need to acknowledge that just as emotional triggers are key to spreading fake news, so they need to be considered in repairing the damage.  I saw a great example of that in yesterday's Wall Street Journal.  An article, titled "Video Contradicts Kelly's Criticism of Congresswoman," sets out to debunk the fake news story promulgated by the Trump administration claiming that Florida Rep. Frederica Wilson had touted her personal efforts in getting funding for an FBI building in her district while not acknowledging the slain FBI agents for whom the building was named.  The Journal article could have stopped at the factual assertions that she had not been elected when the funding was approved and that a video of the speech she gave includes her acknowledging the agents.  But it goes on to provide emotive context, describing the Congresswoman's lifelong focus on issues affecting low-income families and her personal connection with Army Sgt. La David Johnson, the Green Beret whose passing ultimately led to her confrontation with the Trump administration.  The details on how she had known Sgt. Johnson's family for generations and that he himself had participated in a mentoring program that she founded provided context for the facts.  The emotive picture painted by the original fake news claim and the administration's name-calling "all hat, no cattle" was replaced with the image of a caring human being.  In that light, it's easier to believe the truth - that Rep. Wilson was gracious and respectful of the fallen agents and their families just as she was of Sgt. Johnson and his family.

    The lesson learned here is that in debunking fake news, "factual outrage" is not enough - we need to focus on selling the truth as the more emotionally satisfying position.  As the Significance article points out, people are drawn to simple explanations and beliefs that fit with what they want to be true.  So to repair the damage of fake news, we have to not just show people that their beliefs are inconsistent with reality - we need to provide them with another, emotionally acceptable reality that is closer to the truth.



    0 0

    This post is loosely based on a lightning talk last week in Brussels.  We had a few minutes to fill and I felt compelled to spill my guts, despite having nothing prepared.

    For those that have never heard about LDAPCon, it’s a biennial event, first held in ’07, with rotating venues, always in interesting places.  The talks are a 50/50 split between technology providers and usages.

    You can check out this year’s talks, along with sides — here.

    It’s not a ‘big’ conference — attendance hovers between 70 and 80.  It doesn’t last very long — about two days.  There’s very little glitz or glory.  You won’t find the big vendors with their entourages of executives and marketing reps, wearing fancy suits, sporting fast talk and empty promises.  Nor are there giveaways, flashy parties or big name entertainers.  For the most part the media and analysts ignore it; participants don’t get much exposure to the outside world.  Everyone just sits in a single, large conference room for the duration and listens to every talk (gasp).

    So what is it about this modest little gathering that I love so much?

    Not my first rodeo.  The end of my career is much closer than its beginning, and I’ve been to dozens of conferences over the decades.  Large, small and everything in between.  For example, I’ve attended JavaOne twelve times and been to half a dozen IBM mega conferences.

    Let’s start with relevance.  Contrary to what you may think LDAP is not going away.  It’s not sexy or exciting.  Depending on your role in technology you may not even have heard of it (although I can guarantee that your information is housed within its walls).  But it’s useful.  If you’re interested in security you better understand LDAP.  If you choose not to use it you better have good reasons.  Ignore at your peril.

    I’ve been working with LDAP technology (as a user) for almost twenty years.  When I first started, back in the late ’90’s there was a fair amount of hype behind it.  Over the years that hype has faded of course.  As it faded, I found myself alone in the tech centers.  In other words, I was the only one who understood how it worked, and why it was needed.  As the years passed, I found my knowledge growing stale.  Without others to bounce ideas off of there’s little chance for learning. You might say I was thirsting for knowledge.

    My first LDAPCon was back in ’11 in Heidelberg.  It was as if I had found an oasis after stumbling about in the desert alone for years.  It was like AH at last others who understand and from whom I can learn.

    Many conferences are rather impersonal.  This is understandable of course, because the communities aren’t well established or are so large that it would be impossible to know everyone, or even a significant minority.

    The leaders of these large technology communities are more like rock stars than ordinary people.  Often (not always) with oversized egos fed by the adoration of their ‘fans’.  This is great if you are seeking an autograph or inspiration, but not so much if you’re wanting help or validation of ideas.

    Not the case at LDAPCon.  You’ll still find the leaders and architects, but not the egos.  Rather, they understand the importance of helping others find their way and encourage interaction and collaboration.

    Sprinkle in with these leaders earnest newcomers.  Much like when I arrived in Heidelberg the pattern repeats.  These newcomers bring energy and passion that fuels the ecosystem and helps to stave off obsolescence.  There is a continuous stream of ideas coming in ensuring the products and protocols remain relevant.

    The newcomers are welcomed with open arms and not ignored or marginalized.  This creates a warm atmosphere of collaboration.  New ideas are cherished and not shunned.  Newcomers are elevated and not marginalized.

    Not a marketing conference.  You won’t find booths (like at a carnival) where passersby are cajoled and enticed by shiny lights and glitzy demos.  Where on the last day they warily pack up their rides and go to the next stop on the circuit.

    Not a competitive atmosphere, rather collaborative.  Here is where server vendors like Forgerock, Redhat, Microsoft, Symas, and others meet to work together on common goals, improving conditions for the community.  They don’t all show up to every one, but are certainly welcome when they do.

    Here, on the last day, there is some sadness.  We go and have some beer together, share war stories (one last time) and make plans for the future.

    The next LDAPCon will probably again be held in Europe.  Perhaps Berlin or Brno.

    I can hardly wait.

    20171021_155827