Quantcast
Channel: Planet Apache

Jochen Wiedmann: Announce: JSGen (Java Source Generation Framework)

$
0
0

It is with some pride, that I can announce JSGen today, as a new, and very minor, open source project, under the Apache Software License, 2.0

JSGen is short for Java Source Generation framework. As the name (hopefully) suggests, it is a tool for generating Java source code. It is the result of close to 20 years working on, and with Java source generation. That includes, in particular, the XML data binding framework JaxMe, its predecessor Apache JaxMe, and a lot of application programming in my professional work.

It is my opinion, that


  1. Java source generators have a tendency to grow into CLOB's over time, becoming less maintainable, and understandable.
  2. Java source generators do typically contain a real lot of boilerplate code, organizing things like import lists, and syntactical details, rather than their actual purpose.


In contrast, JSGen provides an object model, that aims to make source generation just yet another developers task, on which modern software engineering principles can be applied. JSGen will support you by


  • Creating import lists semiautomatically (with the exception of static imports)
  • Supporting multiple formatting code styles (An Eclipse-like format, the default, and an Apache-Maven-like alternative format.) Switching between code styles is as easy as replacing one configuration object with another.
  • Enabling a more structured approach to source code generation. (Example: Implement the case of handling a single instance, and use that to handle the case of a collection.)


Let's have a look at an example, which is quoted from the JUnit tests. We intend to create a simple HelloWorld.java. Here's, how we would do that with JSGen.

    JSGFactory factory = JSGFactory.create();
    Source jsb =
        factory.newSource("com.foo.myapp.Main").makePublic();
    Method mainMethod =
        jsb.newMethod("main").makePublic().makeStatic();
    mainMethod.parameter(JQName.STRING_ARRAY, "pArgs");
    mainMethod.body().line(System.class,
                           ".out.println(",
                           q("Hello, world!"), ");");
    File targetJavaDir = new File("target/generated-sources/mysrc");
    FileJavaSourceWriter fjsw = new
       FileJavaSourceWriter(targetJavaDir);
    // You might prefer MAVEN_FORMATTER in the next line.
    fjsw.setFormatter(DEFAULT_FORMATTER);
    fjsw.write(factory);

Some things I'd finally like to note:


  1. JSGen is a complete rewrite of two mature, and reliable predecessors, namely JaxMeJS, and Apache JaxMe JS. As such, it is based on a mature API, and a real lot of applied experience. It may be new, but it should be quite reliable, nevertheless.
  2. It provides all the features, of the predecessors, but adds
    1. Support for Generics
    2. Support for Annotations
    3. Switchable Code formatters
    4. Builders heavily used in the API

This announcement has intentionally been deferred until the point, where JSGen has succesfully been used for a professional project in my professional work. 

Finally, a few links:



  

Colm O hEigeartaigh: Deploying an Apache Camel route to Apache Karaf

$
0
0
In the previous blog post, we showed how to use Apache Camel to query an Apache Kafka broker, which is secured using kerberos. In this post, we will build on the previous blog post by showing how to deploy our Camel route to Apache Karaf. Karaf is an application runtime container that makes it incredibly easy to deploy simple applications via its "hot deploy" feature. As always, there are a few slightly tricky considerations when using kerberos, which is the purpose of this post.

As a pre-requisite to this article, please follow the previous blog post to set up Apache Kafka using kerberos, and test that the Camel route can retrieve from the topic we created successfully.

1) Configuring the Kerberos JAAS Login Module in Karaf

Download and extract the latest version of the Apache Karaf runtime (4.2.3 was used in this post). Before starting Karaf, we need to pass through a system property pointing to the krb5.conf file created in our Kerby KDC. This step is not necessary if you are using the standard location in the filesystem for krb5.conf. Open 'bin/karaf' and add the following to the list of system properties:
  • -Djava.security.krb5.conf=/path.to.kerby.project/target/krb5.conf \
Now start Karaf via "bin/karaf". Karaf uses JAAS for authentication (see the documentation here). In the console, enter "jaas:" and hit 'tab' to see the possibilities. For example, "jaas:realm-list" displays the JAAS realms that are currently configured.

Recall that our Camel route needs to configure a JAAS LoginModule for Kerberos. In the example given in the previous post, this was configured by setting the Java System property "java.security.auth.login.config" to point to the JAAS configuration file. We don't want to do that with Karaf, as otherwise we will end up overriding the other JAAS LoginModules that are installed.

Instead, we will take advantage of Karaf's "hot deploy"feature to add the Kerberos Login Module we need to Karaf. Drop the following blueprint XML file into Karaf's deploy directory, changing the keytab location with the correct path to the keytab file:

For Karaf to pick this up, we need to register the blueprint feature via "feature:install aries-blueprint". Now we should see our LoginModule configured via "jaas:realm-list":


2) Configuring the Camel route in Karaf

Next we will hot deploy our Camel route as a blueprint file in Karaf. Copy the following file into the deploy directory:

Then we need to install a few dependencies in Karaf. Add the Camel repo via "repo-add camel 2.23.1", and install the relevant Camel dependencies via: "feature:install camel camel-kafka". Our Camel route should then automatically start, and will retrieve the messages from the Kafka topic and write them to the filesystem, as configured in the route. The message payload and headers are logged in "data/log/karaf.log".

Piergiorgio Lucidi: Alfresco DevCon 2019: giving training, speaking and contributing

$
0
0

Two weeks ago I joined Alfresco DevCon 2019 in Edinburgh together with more than 300 Alfresco community members. It was a very good opportunity to meet some good and old friends around the world, exchange opinions and contributions, give the usual training courses and try to share a potential adoption of international standards among Alfresco experts.

The conference

A huge amount of contents were included in this edition and I have to confess that the sessions was absolutely amazing.

It was very nice to chat with some old friends around the world and I would like to thank some of the Order Of The Bee members for bringing new energy and outstanding sessions at this conference: Angel Borroy, Jeff Potts, Bindu Wavell, Axel Faust, Boriss Mejiass.

Other great contributions were done by Alfresco from a lot of people such as: Francesco Corti, Eugenio Romano, Mario Romano, Mauricio Salatino, Toni De La Fuente and many others.

My colleague  Giuseppe Graziano  and me. For Giuseppe this was his first Alfresco DevCon, congratulations mate ;)

My colleague Giuseppe Graziano and me. For Giuseppe this was his first Alfresco DevCon, congratulations mate ;)

Delivering Training

Every year during the Day 0 of the Alfresco DevCon I collaborate on building and deliver the Introduction to Alfresco course, this course is dedicated to new members of the Alfresco Community that want to increase their skills on the platform vision.

It was an amazing challenge because in just one day, we cover the entire Alfresco Digital Business Platform: starting from Content Services and Process Services, passing from Governance Services, Docker and ADF.

This course is arranged by ECMCoreAveri and I was involved with Shoeb Sarguroh, Oliver Laabs and Sabine Esser to deliver in the best possible way all the material updated for this event. This was possible thanks to a solid partnership between TAI Solutions and ECMCoreAveri bringing fresh new contents in terms of training material.

The class was full and we had the pleasure to teach to very interested attendees, they wanted to understand better each Alfresco component trying to think about how to use and integrate it for their needs.

The Alfresco Platform now includes many components and considering the high number of ways used to solve a specific problem, it could be very hard to anyone without experience to choose the best adoption path, or simply understand if a component should be extended or just configured.

We received a lot of questions, also after the training sessions during the conference and we received a very good feedback. We appreciate a lot this and we hope to keep high the quality of our training courses in the future, thank you very much!

After the training day we proposed a quiz to attendees where for each question they earn points and during the last day of the conference we arranged an Alfresco Training Awards where three winners received a prize!

I would like to thank ECMCoreAveri for inviting me every year in this awesome challenge and for giving me the opportunity to contribute on the latest version of the training material for the entire Alfresco Platform.

DISCUSSING about ECM Program Strategy

Another great point for me was related to introduce in just 5 minutes (lightning talk) a very complex topic: the ECM Program Strategy.

We live in a world totally dominated by information and every time there is a ECM project to build from scratch, we could have the same issues on trying to give a complete shape of the overall project. Sometimes it is just impossible to do it in one time and a good approach could be consider only the initial constraints related to the entire company (horizontally) and then go forward iterating vertically on specific needs.

The ECM Program Strategy is a set of best practices initially provided by AIIM that helps you on defining your own ECM Strategy. This can be done building a set of documents where you collect the strategic informations of your own project in terms of core business vision, metrics and KPIs.

The ECM toolkit is based on International Standards (ISO) written by ECM experts all over the world. The main goal here is to avoid the same typical issue that statistically appear in any new ECM project. Consider that the problems in this type of projects rarely depends on technology but the main issue is related to misunderstanding among people, requirements and expectations.

This typically helps on defining which parts of your project should be set with a higher priority and which results we are expecting for the next development sprint. Without a wide vision of what this project will solve, it’s impossible to understand the overall benefit in terms of economics and platform adoption.

The ECM Program Strategy allows you to define a documented strategy for:

  • Gathering requirements for all the stakeholders / departments (Templating)

  • Approaching with iterative analysis and development (weekly dev sprints)

  • Involve business people and project champions (Training and mentoring)

  • Splitting development on strategic milestones (Development and release policy)

Smart Alfresco ECM Program Strategy for Your New Success Story from Piergiorgio Lucidi
PJduringTalk-AngelBorroy.jpg

Maven module for Alfresco Process Services

In order to facilitate attendees on learning how to extend Alfresco Process Services Platform, I have implemented and shared on my GitHub account a standard Maven module that allows developers to use all the available Java services.

APS includes new specific services dedicated to add, for example, MultiTenancy and Identity Management capabilities on top of Activiti BPM Engine. This means that some Activiti services are currently disabled, considering the IdentityService, you have some alternatives to learn in APS, in this case for example we have two different services: UserService and GroupService.

This Maven module will help any developer to extend APS with specific REST APIs and so on, the main issue that we currently have is related to the lack of a supported Alfresco SDK for APS.

Included in this module you will find a standard 4 eyes principle workflow application with its own unit test implemented in the test source folder.

If you are an Alfresco Community member and you want to contribute on this project, please let me know and let’s try to improve it together, hope this helps :)

Maven module SDK for Alfresco Process Services

Maven module SDK for Alfresco Process Services

ECMCoreAveri Team
ECMCoreAveri Team

Me, Shoeb Sarguroh, Sabine Esser and Oliver Laabs

TAI Solutions Team
TAI Solutions Team

My colleague Giuseppe Graziano at his first Alfresco DevCon and me

Devcon_FeaturedSpeaker_LI_FB_PJ.png
Me talking about ECM Program Strategy
Me talking about ECM Program Strategy

Thanks to Francesco Corti for this photo.

Me talking about ECM Program Strategy - 2
Me talking about ECM Program Strategy - 2

Thanks to Angel Borroy for this photo.

Alfresco Training Awards
Alfresco Training Awards

Shoeb Sarguroh and the three winners of our quiz done after the training course

Aaron Morton: Reaper 1.4 Released

$
0
0

Cassandra Reaper 1.4 was just released with security features that now expand to the whole REST API.

Security Improvements

Reaper 1.2.0 integrated Apache Shiro to provide authentication capabilities in the UI. The REST API remained fully opened though, which was a security concern. With Reaper 1.4.0, the REST API is now fully secured and managed by the very same Shiro configuration as the Web UI. Json Web Tokens (JWT) were introduced to avoid sending credentials over the wire too often. In addition spreaper, Reaper’s command line tool, has been updated to provide a login operation and manipulate JWTs.

The documentation was updated with all the necessary information to handle authentication in Reaper and even some samples on how to connect LDAP directories through Shiro.

Note that Reaper doesn’t support authorization features and it is impossible to create users with different rights.
Authentication is now enabled by default for all new installs of Reaper.

Configurable JMX port per cluster

One of the annoying things with Reaper was that it was impossible to use a different port for JMX communications than the default one, 7199.
You could define specific ports per IP, but that was really for testing purposes with CCM.
That long overdue feature has now landed in 1.4.0 and a custom JMX can be passed when declaring a cluster in Reaper:

Configurable JMX port

TWCS/DTCS tables blacklisting

In general, it is best to avoid repairing DTCS tables, as it can generate lots of small SSTables that could stay out of the compaction window and generate performance problems. We tend to recommend not to repair TWCS tables either, to avoid replicating timestamp overlaps betwen nodes that can delay the deletion of fully expired SSTables.

When using the auto-scheduler though, it is impossible to specify blacklists, as all keyspaces and all tables get automatically scheduled by Reaper.

Based on the initial PR of Dennis Kline that was then re-worked by our very own Mick, a new configuration setting allows automatically blacklisting of TWCS and DTCS tables for all repairs:

blacklistTwcsTables: false

When set to true, Reaper will discover the compaction strategy for all tables in the keyspace and remove any table with either DTCS or TWCS, unless they are explicitely passed in the list of tables to repair.

Web UI improvements

The Web UI reported decommissioned nodes that still appeared in the Gossip state of the cluster, with a Left state. This has been fixed and such nodes are not displayed anymore.
Another bug was the number of tokens reported in the node detail panel, which was nowhere near matching reality. We now display the correct number of tokens and clicking on this number will open a popup containing the list of tokens the node is responsible for:

Tokens

Work in progress

Work in progress will introduce the Sidecar Mode, which will collocate a Reaper instance with each Cassandra node and support clusters where JMX access is restricted to localhost.
This mode is being actively worked on currently and the branch already has working repairs.
We’re now refactoring the code and porting other features to this mode like snapshots and metric collection.
This mode will also allow for adding new features and permit Reaper to better scale with the clusters it manages.

Upgrade to Reaper 1.4.0

The upgrade to 1.4 is recommended for all Reaper users. The binaries are available from yum, apt-get, Maven Central, Docker Hub, and are also downloadable as tarball packages. Remember to backup your database before starting the upgrade.

All instructions to download, install, configure, and use Reaper 1.4 are available on the Reaper website.

Holden Karau: PyData Hong Kong - Making the Big Data ecosystem work together with Python: Apache Arrow, Spark, Flink, Beam, and Dask @ PyData Hong Kong

$
0
0
Come join me on Tuesday 19 February @ 02:00 at PyData Hong Kong 2019 Hong Kong for PyData Hong Kong - Making the Big Data ecosystem work together with Python: Apache Arrow, Spark, Flink, Beam, and Dask.I'll update this post with the slides soon.Come see to the talk or comment bellow to join in the discussion :).Talk feedback is appreciated at http://bit.ly/holdenTalkFeedback

Carlos Sanchez: Serverless Jenkins Pipelines with Project Fn

$
0
0

jenkins-lambdaThe Jenkinsfile-Runner-Fn project is a Project Fn (a container native, cloud agnostic serverless platform) function to run Jenkins pipelines. It will process a GitHub webhook, git clone the repository and execute the Jenkinsfile in that git repository. It allows scalability and pay per use with zero cost if not used.

This function allows Jenkinsfile execution without needing a persistent Jenkins master running in the same way as Jenkins X Serverless, but using the Fn Project platform (and supported providers like Oracle Functions) instead of Kubernetes.

Project Fn vs AWS Lambda

The function is very similar to the one in jenkinsfile-runner-lambda with just a small change in the signature. The main difference between Lambda and Fn is in the packaging, as Lambda layers are limited in size and are expanded in /optwhile Fn allows a custom Dockerfile where you can install whatever you want in a much easier way, just need to include the function code and entrypoint from fnproject/fn-java-fdk.

Oracle Functions

Oracle Functions is a cloud service providing Project Fn function execution (currently in limited availability). jenkinsfile-runner-fn function runs in Oracle Functions, with the caveat that it needs a syslog server running somewhere to get the logs (see below).

Limitations

Current implementation limitations:

Example

See the jenkinsfile-runner-fn-example project for an example that is tested and works.

Extending

You can add your plugins to plugins.txt. You could also add the Configuration as Code plugin for configuration.

Other tools can be added to the Dockerfile.

Installation

Install Fn

Building

Build the function

mvn clean package

Publishing

Create and deploy the function locally

fn create app jenkinsfile-runner
fn --verbose deploy --app jenkinsfile-runner --local

Execution

Invoke the function

cat src/test/resources/github.json | fn invoke jenkinsfile-runner jenkinsfile-runner

Logging

Get the logs for the last execution

fn get logs jenkinsfile-runner jenkinsfile-runner $(fn ls calls jenkinsfile-runner jenkinsfile-runner | grep 'ID:' | head -n 1 | sed -e 's/ID: //')

Syslog

Alternatively, start a syslog server to see the logs

docker run -d --rm -it -p 5140:514 --name syslog-ng balabit/syslog-ng:latest
docker exec -ti syslog-ng tail -f /var/log/messages-kv.log

Update the function to send logs to the syslog server

fn update app jenkinsfile-runner --syslog-url tcp://logs-01.loggly.com:514

GitHub events

Add a GitHub json webhook to your git repo pointing to the function url.

More information in the Jenkinsfile-Runner-Fn GitHub page.

FeatherCast: CHAOSSCon EU 2019, Open Source Metrics and CHAOSS, Ildiko Vancsa

Aaron Morton: How To Set Up A Cluster With Even Token Distribution

$
0
0

Apache Cassandra is fantastic for storing large amounts of data and being flexible enough to scale out as the data grows. This is all fun and games until the data that is distributed in the cluster becomes unbalanced. In this post we will go through how to set up a cluster with predictive token allocation using the allocate_tokens_for_keyspace setting, which will help to evenly distribute the data as it grows.

Unbalanced clusters are bad mkay

An unbalanced load on a cluster means that some nodes will contain more data than others. An unbalanced cluster can be caused by the following:

  • Hot spots - by random chance one node ends up responsible for a higher percentage of the token space than the other nodes in the cluster.
  • Wide rows - due to data modelling issues, for example a partition row which grows significantly larger than the other rows in the data.

The above issues can have a number of impacts on individual nodes in the cluster, however this is a completely different topic and requires a more detailed post. In summary though, a node that contains disproportionately more tokens and/or data than other nodes in the cluster may experience one or more of the following issues:

  • Run out storage more quickly than the other nodes.
  • Serve more requests than the other nodes.
  • Suffer from higher read and write latencies than the other nodes.
  • Time to run repairs is longer than other nodes.
  • Time to run compactions is longer than other nodes.
  • Time to replace the node if it fails is longer than other nodes.

What about vnodes, don’t they help?

Both issues that cause data imbalance in the cluster (hot spots, wide rows) can be prevented by manual control. That is, specify the tokens using the initial_token setting in the casandra.yaml file for each node and ensure your data model evenly distributes data across the cluster. The second control measure (data modelling) is something we always need to do when adding data to Cassandra. The first point however, defining the tokens manually, is cumbersome to do when maintaining a cluster, especially when growing or shrinking it. As a result, token management was automated early on in Cassandra (version 1.2 - CASSANDRA-4119) through the introduction of Virtual Nodes (vnodes).

Vnodes break up the available range of tokens into smaller ranges, defined by the num_tokens setting in the cassandra.yaml file. The vnode ranges are randomly distributed across the cluster and are generally non-contiguous. If we use a large number for num_tokens to break up the token ranges, the random distribution means it is less likely that we will have hot spots. Using statistical computation, the point where all clusters of any size always had a good token range balance was when 256 vnodes were used. Hence, the num_tokens default value of 256 was the recommended by the community to prevent hot spots in a cluster. The problem here is that the performance for operations requiring token-range scans (e.g. repairs, Spark operations) will tank big time. It can also cause problems with bootstrapping due to large numbers of SSTables generated. Furthermore, as Joseph Lynch and Josh Snyder pointed out in a paper they wrote, the higher the value of num_tokens in large clusters, the higher the risk of data unavailability .

Token allocation gets smart

This paints a pretty grim picture of vnodes, and as far as operators are concerned, they are caught between a rock and hard place when selecting a value for num_tokens. That was until Cassandra version 3.0 was released, which brought with it a more intelligent token allocation algorithm thanks to CASSANDRA-7032. Using a ranking system, the algorithm feeds in the replication factor of a keyspace, the number of tokens, and the partitioner, to derive token ranges that are evenly distributed across the cluster of nodes.

The algorithm is configured by settings in the cassandra.yaml configuration file. Prior to this algorithm being added, the configuration file contained the necessary settings to configure the algorithm with the exception of the one to specify the keyspace name. When the algorithm was added, the allocate_tokens_for_keyspace setting was introduced into the configuration file. The setting allows a keyspace name to be specified so that during the bootstrap of a node we query the keyspace for its replication factor and pass that to the token allocation algorithm.

However, therein lies the problem, for existing clusters updating this setting is easy, as a keyspace already exists, but for a cluster starting from scratch we have a chicken and egg situation. How do we specify a keyspace that doesn’t exist!? And there are other caveats, too…

  • It works for only a single replication factor. As long as all the other keyspaces are using the same replication as the one specified for allocate_tokens_for_keyspace all is fine. However, if you have keyspaces with a different replication factor they can potentially cause hot spots.
  • It works when nodes are only added to the cluster. The process for token distribution when a node is removed from the cluster remains unchanged, and hence can cause hot spots.
  • It works with only the default partitioner, Murmur3Partitioner.

Additionally, this is no silver bullet for all unbalanced clusters; we still need to make sure we have a data model that evenly distributes data across partitions. Wide partitions can still be an issue and no amount of token shuffling will fix this.

Despite these drawbacks, this feature gives us the ability to allocate tokens in a more predictable way whilst leveraging the advantage of vnodes. This means we can specify a small value for vnodes (e.g. 4) and still be able to avoid hot spots. The question then becomes, in the case of starting a brand new cluster from scratch, which comes first the chicken or the egg?

One does not simply start a cluster… with evenly distributed tokens

While it might be possible to rectify an unbalance cluster due to unfortunate token allocations, it is better for the token allocation to be set up correctly when the cluster is created. To set up a brand new cluster that takes advantage of the allocate_tokens_for_keyspace setting we need to use the following steps. The method below takes into account a cluster with nodes that spread across multiple racks. The examples used in each step, assumes that our cluster will be configured as follows:

  • 4 vnodes (num_tokens = 4).
  • 3 racks with a single seed node in each rack.
  • A replication factor of 3, i.e. one replica per rack.

1. Calculate and set tokens for the seed node in each rack

We will need to set the tokens for the seed nodes in each rack manually. This is to prevent each node from randomly calculating its own token ranges. We can calculate the token ranges that we will use for the initial_token setting using the following python code:

$ python

Python 2.7.13 (default, Dec 18 2016, 07:03:39)
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.42.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> num_tokens = 4
>>> num_racks = 3
>>> print "\n".join(['[Node {}] initial_token: {}'.format(r + 1, ','.join([str(((2**64 / (num_tokens * num_racks)) * (t * num_racks + r)) - 2**63) for t in range(num_tokens)])) for r in range(num_racks)])
[Node 1] initial_token: -9223372036854775808,-4611686018427387905,-2,4611686018427387901
[Node 2] initial_token: -7686143364045646507,-3074457345618258604,1537228672809129299,6148914691236517202
[Node 3] initial_token: -6148914691236517206,-1537228672809129303,3074457345618258600,7686143364045646503

We can then uncomment the initial_token setting in the cassandra.yaml file in each of the seed nodes, set it to value generated by our python command, and set the num_tokens setting to the number of vnodes. When the node first starts the value for the initial_token setting will used, subsequent restarts will use the num_tokens setting.

Note that we need to manually calculate and specify the initial tokens for only the seed node in each rack. All other nodes will be configured differently.

2. Start the seed node in each rack

We can start the seed nodes one at a time using the following command:

$ sudo service cassandra start

When we watch the logs, we should see messages similar to the following appear:

...
INFO  [main] ... - This node will not auto bootstrap because it is configured to be a seed node.
INFO  [main] ... - tokens manually specified as [-9223372036854775808,-4611686018427387905,-2,4611686018427387901]
...

After starting the first of the seed nodes, we can use nodetool status to verify that 4 tokens are being used:

$ nodetool status
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address       Load       Tokens       Owns (effective)  Host ID                               Rack
UN  172.31.36.11  99 KiB     4            100.0%            5d7e200d-ba1a-4297-a423-33737302e4d5  rack1

We will wait for this message appear in logs, then start the next seed node in the cluster.

INFO  [main] ... - Starting listening for CQL clients on ...

Once all seed nodes in the cluster are up, we can use nodetool ring to verify the token assignments in the cluster. It should look something like this:

$ nodetool ring

Datacenter: dc1
==========
Address        Rack        Status State   Load            Owns                Token
                                                                              7686143364045646503
172.31.36.11   rack1       Up     Normal  65.26 KiB       66.67%              -9223372036854775808
172.31.36.118  rack2       Up     Normal  65.28 KiB       66.67%              -7686143364045646507
172.31.43.239  rack3       Up     Normal  99.03 KiB       66.67%              -6148914691236517206
172.31.36.11   rack1       Up     Normal  65.26 KiB       66.67%              -4611686018427387905
172.31.36.118  rack2       Up     Normal  65.28 KiB       66.67%              -3074457345618258604
172.31.43.239  rack3       Up     Normal  99.03 KiB       66.67%              -1537228672809129303
172.31.36.11   rack1       Up     Normal  65.26 KiB       66.67%              -2
172.31.36.118  rack2       Up     Normal  65.28 KiB       66.67%              1537228672809129299
172.31.43.239  rack3       Up     Normal  99.03 KiB       66.67%              3074457345618258600
172.31.36.11   rack1       Up     Normal  65.26 KiB       66.67%              4611686018427387901
172.31.36.118  rack2       Up     Normal  65.28 KiB       66.67%              6148914691236517202
172.31.43.239  rack3       Up     Normal  99.03 KiB       66.67%              7686143364045646503

We can then move to the next step.

3. Create only the keyspace for the cluster

On any one of the seed nodes we will use cqlsh to create the cluster keyspace using the following commands:

$ cqlsh NODE_IP_ADDRESS -u ***** -p *****

Connected to ...
[cqlsh 5.0.1 | Cassandra 3.11.3 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
cassandra@cqlsh>
cassandra@cqlsh> CREATE KEYSPACE keyspace_with_replication_factor_3
    WITH replication = {'class': 'NetworkTopologyStrategy', 'dc1': 3}
    AND durable_writes = true;

Note that this keyspace can be any name, it can even be the keyspace that contains the tables we will use for our data.

4. Set the number of tokens and the keyspace for all remaining nodes

We will set the num_tokens and allocate_tokens_for_keyspace settings in the cassandra.yaml file on all of the remaining nodes as follows:

num_tokens: 4
...
allocate_tokens_for_keyspace: keyspace_with_replication_factor_3

We have assigned the allocate_tokens_for_keyspace value to be the name of keyspace created in the previous step. Note that at this point the Cassandra service on all other nodes is still down.

5. Start the remaining nodes in the cluster, one at a time

We can start the remaining nodes in the cluster using the following command:

$ sudo service cassandra start

When we watch the logs we should see messages similar to the following appear to say that we are using the new token allocation algorithm:

INFO  [main] ... - JOINING: waiting for ring information
...
INFO  [main] ... - Using ReplicationAwareTokenAllocator.
WARN  [main] ... - Selected tokens [...]
...
INFO  ... - JOINING: Finish joining ring

As per step 2 when we started the seed nodes, we will wait for this message to appear in the logs before starting the next node in the cluster.

INFO  [main] ... - Starting listening for CQL clients on ...

Once all the nodes are up, our shiny, new, evenly-distributed-tokens cluster is ready to go!

Proof is in the token allocation

While we can learn a fair bit from talking about the theory for the allocate_tokens_for_keyspace setting, it is still good to put it to the test and see what difference it makes when used in a cluster. I decided to create two clusters running Apache Cassandra 3.11.3 and compare the load distribution after inserting some data. For this test, I provisioned both clusters with 9 nodes using tlp-cluster and generated load using tlp-stress. Both clusters used 4 vnodes, but one of the clusters was setup using the even token distribution method described above.

Cluster using random token allocation

I started with a cluster that uses the traditional random token allocation system. For this cluster I set num_tokens: 4 and endpoint_snitch: GossipingPropertyFileSnitch in the cassandra.yaml on all the nodes. Nodes were split across three racks by specifying the rack in the cassandra-rackdc.properties file.

Once the cluster instances were up and Cassandra was installed, I started each node one at a time. After all nodes were started, the cluster looked like this:

ubuntu@ip-172-31-39-54:~$ nodetool status
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address        Load       Tokens       Owns (effective)  Host ID                               Rack
UN  172.31.36.95   65.29 KiB  4            16.1%             4ada2c52-0d1b-45cd-93ed-185c92038b39  rack1
UN  172.31.39.79   65.29 KiB  4            20.4%             c282ef62-430e-4c40-a1d2-47e54c5c8685  rack2
UN  172.31.47.155  65.29 KiB  4            21.2%             48d865d7-0ad0-4272-b3c1-297dce306a34  rack1
UN  172.31.43.170  87.7 KiB   4            24.5%             27aa2c78-955c-4ea6-9ea0-3f70062655d9  rack1
UN  172.31.39.54   65.29 KiB  4            30.8%             bd2d745f-d170-4fbf-bf9c-be95259597e3  rack3
UN  172.31.35.165  70.36 KiB  4            25.5%             056e2472-c93d-4275-a334-e82f87c4b53a  rack3
UN  172.31.35.149  70.37 KiB  4            24.8%             06b0e1e4-5e73-46cb-bf13-626eb6ce73b3  rack2
UN  172.31.35.33   65.29 KiB  4            23.8%             137602f0-3248-459f-b07c-c0b3e647fa48  rack2
UN  172.31.37.129  99.03 KiB  4            12.9%             cd92c974-b32e-4181-9e14-fb52dd27b09e  rack3

I ran tlp-stress against the cluster using the command below. This generated a write-only load that randomly inserted 10 million unique key value pairs into the cluster. tlp-stress inserted data into a newly created keyspace and tabled called tlp_stress.keyvalue.

tlp-stress run KeyValue --replication "{'class':'NetworkTopologyStrategy','dc1':3}" --cl LOCAL_QUORUM --partitions 10M --iterations 100M --reads 0 --host 172.31.43.170

After running tlp-stress the cluster load distribution for the tlp_stress keyspace looked like this:

ubuntu@ip-172-31-39-54:~$ nodetool status tlp_stress
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address        Load       Tokens       Owns (effective)  Host ID                               Rack
UN  172.31.36.95   1.29 GiB   4            20.8%             4ada2c52-0d1b-45cd-93ed-185c92038b39  rack1
UN  172.31.39.79   2.48 GiB   4            39.1%             c282ef62-430e-4c40-a1d2-47e54c5c8685  rack2
UN  172.31.47.155  1.82 GiB   4            35.1%             48d865d7-0ad0-4272-b3c1-297dce306a34  rack1
UN  172.31.43.170  3.45 GiB   4            44.1%             27aa2c78-955c-4ea6-9ea0-3f70062655d9  rack1
UN  172.31.39.54   2.16 GiB   4            54.3%             bd2d745f-d170-4fbf-bf9c-be95259597e3  rack3
UN  172.31.35.165  1.71 GiB   4            29.1%             056e2472-c93d-4275-a334-e82f87c4b53a  rack3
UN  172.31.35.149  1.14 GiB   4            26.2%             06b0e1e4-5e73-46cb-bf13-626eb6ce73b3  rack2
UN  172.31.35.33   2.61 GiB   4            34.7%             137602f0-3248-459f-b07c-c0b3e647fa48  rack2
UN  172.31.37.129  562.15 MiB  4            16.6%             cd92c974-b32e-4181-9e14-fb52dd27b09e  rack3

I verified the data load distribution by checking the disk usage on all nodes using pssh (parallel ssh).

ubuntu@ip-172-31-39-54:~$ pssh -ivl ... -h hosts.txt "du -sh /var/lib/cassandra/data"
[1] ... [SUCCESS] 172.31.35.149
1.2G    /var/lib/cassandra/data
[2] ... [SUCCESS] 172.31.43.170
3.5G    /var/lib/cassandra/data
[3] ... [SUCCESS] 172.31.36.95
1.3G    /var/lib/cassandra/data
[4] ... [SUCCESS] 172.31.39.79
2.5G    /var/lib/cassandra/data
[5] ... [SUCCESS] 172.31.35.33
2.7G    /var/lib/cassandra/data
[6] ... [SUCCESS] 172.31.35.165
1.8G    /var/lib/cassandra/data
[7] ... [SUCCESS] 172.31.37.129
564M    /var/lib/cassandra/data
[8] ... [SUCCESS] 172.31.39.54
2.2G    /var/lib/cassandra/data
[9] ... [SUCCESS] 172.31.47.155
1.9G    /var/lib/cassandra/data

As we can see from the above results, there was large load distribution across nodes. Node 172.31.37.129 held the smallest amount of data (roughly 560 MB), whilst node 172.31.43.170 held six times that amount of data (~ roughly 3.5 GB). Effectively the difference between the smallest and largest data load is 3.0 GB!!

Cluster using predictive token allocation

I then moved on to setting up the cluster with predictive token allocation. Similar to the previous cluster, I set num_tokens: 4 and endpoint_snitch: GossipingPropertyFileSnitch in the cassandra.yaml on all the nodes. These settings were common to all nodes in this cluster. Nodes were again split across three racks by specifying the rack in the cassandra-rackdc.properties file.

I set the initial_token setting for each of the seed nodes and started the Cassandra process on them one at a time. One seed node allocated to each rack in the cluster.

The initial keyspace that would be specified in the allocate_tokens_for_keyspace setting was created via cqlsh using the following command:

CREATE KEYSPACE keyspace_with_replication_factor_3 WITH replication = {'class': 'NetworkTopologyStrategy', 'dc1': '3'} AND durable_writes = true;

I then set allocate_tokens_for_keyspace: keyspace_with_replication_factor_3 in the cassandra.yaml file for the remaining non-seed nodes and started the Cassandra process on them one at a time. After all nodes were started, the cluster looked like this:

ubuntu@ip-172-31-36-11:~$ nodetool status keyspace_with_replication_factor_3
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address        Load       Tokens       Owns (effective)  Host ID                               Rack
UN  172.31.36.47   65.4 KiB   4            32.3%             5ece457c-a3af-4173-b9c8-e937f8b63d3b  rack2
UN  172.31.43.239  117.45 KiB  4            33.3%             55a94591-ad6a-48a5-b2d7-eee7ea06912b  rack3
UN  172.31.37.44   70.49 KiB  4            33.3%             93054390-bc83-487c-8940-b99e7b85e5c2  rack3
UN  172.31.36.11   104.3 KiB  4            35.4%             5d7e200d-ba1a-4297-a423-33737302e4d5  rack1
UN  172.31.39.186  65.41 KiB  4            31.2%             ecd00ff5-a90a-4d33-b7ab-bdd22e3e50b8  rack1
UN  172.31.38.137  65.39 KiB  4            33.3%             64802174-885a-4c04-b530-a9b4685b1b96  rack1
UN  172.31.40.56   65.39 KiB  4            33.3%             0846effa-e4ac-4a19-845e-2162cd2b7680  rack3
UN  172.31.36.118  104.32 KiB  4            35.4%             5ad47bc0-9bcc-4fc5-b5b0-0a15ad63345f  rack2
UN  172.31.41.196  65.4 KiB   4            32.3%             4128ca20-b4fa-4173-88b2-aac62539a6d8  rack2

I ran tlp-stress against the cluster using the same command that was used to test the cluster with random token allocation. After running tlp-stress the cluster load distribution for the tlp_stress keyspace looked like this:

ubuntu@ip-172-31-36-11:~$ nodetool status tlp_stress
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address        Load       Tokens       Owns (effective)  Host ID                               Rack
UN  172.31.36.47   2.16 GiB   4            32.3%             5ece457c-a3af-4173-b9c8-e937f8b63d3b  rack2
UN  172.31.43.239  2.32 GiB   4            33.3%             55a94591-ad6a-48a5-b2d7-eee7ea06912b  rack3
UN  172.31.37.44   2.32 GiB   4            33.3%             93054390-bc83-487c-8940-b99e7b85e5c2  rack3
UN  172.31.36.11   1.84 GiB   4            35.4%             5d7e200d-ba1a-4297-a423-33737302e4d5  rack1
UN  172.31.39.186  2.01 GiB   4            31.2%             ecd00ff5-a90a-4d33-b7ab-bdd22e3e50b8  rack1
UN  172.31.38.137  2.32 GiB   4            33.3%             64802174-885a-4c04-b530-a9b4685b1b96  rack1
UN  172.31.40.56   2.32 GiB   4            33.3%             0846effa-e4ac-4a19-845e-2162cd2b7680  rack3
UN  172.31.36.118  1.83 GiB   4            35.4%             5ad47bc0-9bcc-4fc5-b5b0-0a15ad63345f  rack2
UN  172.31.41.196  2.16 GiB   4            32.3%             4128ca20-b4fa-4173-88b2-aac62539a6d8  rack2

I again verified the data load distribution by checking the disk usage on all nodes using pssh.

ubuntu@ip-172-31-36-11:~$ pssh -ivl ... -h hosts.txt "du -sh /var/lib/cassandra/data"
[1] ... [SUCCESS] 172.31.36.11
1.9G    /var/lib/cassandra/data
[2] ... [SUCCESS] 172.31.43.239
2.4G    /var/lib/cassandra/data
[3] ... [SUCCESS] 172.31.36.118
1.9G    /var/lib/cassandra/data
[4] ... [SUCCESS] 172.31.37.44
2.4G    /var/lib/cassandra/data
[5] ... [SUCCESS] 172.31.38.137
2.4G    /var/lib/cassandra/data
[6] ... [SUCCESS] 172.31.36.47
2.2G    /var/lib/cassandra/data
[7] ... [SUCCESS] 172.31.39.186
2.1G    /var/lib/cassandra/data
[8] ... [SUCCESS] 172.31.40.56
2.4G    /var/lib/cassandra/data
[9] ... [SUCCESS] 172.31.41.196
2.2G    /var/lib/cassandra/data

As we can see from the above results, there was little variation in the load distribution across nodes compared to a cluster that used random token allocation. Node 172.31.36.118 held the smallest amount of data (roughly 1.83 GB) and nodes 172.31.43.239, 172.31.37.44, 172.31.38.137, and 172.31.40.56 held the largest amount of data (roughly 2.32 GB each). The difference between the smallest and largest data load being roughly 400 MB which is significantly less than the data size difference in the cluster that used random token allocation.

Conclusion

Having a perfectly balanced cluster takes a bit of work and planning. While there are some steps to set up and caveats to using the allocate_tokens_for_keyspace setting, the predictive token allocation is a definite must use when setting up a new cluster. As we have seen from testing, it allows us to take advantage of num_tokens being set to a low value without having to worry about hot spots developing in the cluster.


Claus Ibsen: Camel in Action / Apache Camel is not for everyone

$
0
0
Writing books is really hard, and you may think that when the book is finally published, you are all done. Well that is not the case for technical books, where our industry and the software stack the book covers is evolving.



For the Camel in Action books where we cover Apache Camel (no shit sherlock) is the same. We are from time to time, updating the source code examples to use newer version of Camel. This is curtesy to our readers so they have examples that runs on the latest Camel version at the time, and also used as a test-bed for the Apache Camel project, when we are testing/voting on new release candidates.

As an author you are also contacted from time to time by readers, whom have question to the book, its examples, and about Apache Camel in general. Thank you for the feedback and we love to get in touch with our readers, but sometimes we have to point people to use the general upstream community for the general Camel questions, when the subject is not specific about our book. And I would also like to thank the readers whom take the time to say thank you to us, that the book helped them get onboard Apache Camel, and other stories they can tell that made a difference.

We also get reports from the observable readers whom have found small mistakes in the book. We are very grateful for getting these to our attention so we can update the errata.

Then finally we have people doing book reviews (awesome, thank you for doing this, we appreciate all honest reviews). Most often we get positive feedback, and that the book has a lot to offer the readers. However we occasionally get the reviews from readers whom very likely have the book for the wrong reason.

For our second edition, we have two such examples from Amazon US and Germany. I have to the best of my ability responded to those reviews, with our point of view, and trying to explain/address their critique. All we want is for potential new readers to have the fact at their hands, and can make a better decision whether to purchase the book or not.

And I want to end this blog with me officially saying

Apache Camel is not for everyone

( but its still the most used open source integration framework in the universe )


Amazon.com review - Not very useful




Amazon.de review - Nahezu nutzlos..



Mark Miller: No public posts available.

Mark Miller: No public posts available.

Jay D. McHugh: 100 Days of Code...formalizing learning and practice

$
0
0
Over the years, I've tried to make sure that I dip my toes in the new technologies that come up pertaining to programming.

Of course - there are far too many new avenues to be able to keep up with everything. And, for anyone that's been in tech for a while you know that not every shiny new thing lasts. So, there is a bit of fear that I'll head down a road that everyone else shuns.

Those two things (too many things to learn and fear) have made my efforts to stay 'current' less than stellar. Add on top of that the fact that I have a full time job in IT that has its own demands on my brain and time and it has been difficult to make the kind of progress that I would like to.

Well, last week I stumbled onto the 100 Days of Code challenge wandering around in Twitter.

And - I'm in...

Today is my third (should have been fourth but I missed a day) day and I have already worked on three separate topics:

  • Practicing Python by working through the exercises in the book 'Exercises for Programmers' by Brian P. Hogan
  • Going through the React course on Scrimba (about halfway through with this)
  • Digging into code that I wrote at least six years ago at work to refactor/improve it


Hopefully, I'll be able to finish these three things quickly and then make a more concerted and focused learning journey.

I've got many other topics that I want to study - so I have in the back of my head the plan to move on to round 2, round 3, round...

First things first though - this is day three.

Aaron Morton: Apache Cassandra - Data Center Switch

$
0
0

Did you ever wonder how to change the hardware efficiently on your Apache Cassandra cluster? Or did you maybe read this blog post from Anthony last week about how to set up a balanced cluster, found it as exciting as we do, but are still unsure how to change the number of vnodes?

A Data Center “Switch” might be what you need.

We will go through this process together hereafter. Additionally, If you’re just looking at adding or removing a data center from an existing cluster, these operations are described as part of the data center switch. I hope the structure of the post will make it easy for you to find the part you need!

Warning/Note: If you are unable to get hardware easily, this won’t work for you. The Data Center (DC) Switch is well adapted to cloud environments or for teams with spare/available hardware, as we will need to create a new data center before we can actually free up the previously used machines. Generally we use the same number of nodes in the new data center, but this is not mandatory. Thus, you’ll want to use a proportional number of machines to keep performance unchanged in case the hardware changes.

Definition

First things first, what is a “Data Center Switch” in our Apache Cassandra context?

The idea is to transition to a new data center, freshly added for this operation, and then to remove the old one. In between, clients need to be switched to the new data center.

Logical isolation / topology between data centers in Cassandra helps keep this operation safe and allows you to rollback the operation at almost any stage and with little effort. This technique does not generate any downtime when it is well executed.

Use Cases

A DC Switch can be used for changes that cannot be easily or safely performed within an existing data center, through JMX, or even with a rolling restart. It can also allow you to make changes such as modifying the number of vnodes in use for a cluster, or to change the hardware without creating unwanted transitional states where some nodes would have distinct hardware (or operating system).

It could also be used for a very critical upgrade. Note that with an upgrade it’s important to keep in mind that streaming in a cluster running mixed versions of Casandra is not recommended. In this case, it’s better to add the new Data Center using the same Cassandra version, feed it with the data from the old data center, and only then upgrade the Cassandra version in the new data center. When the new data center is considered stable, the clients can be routed to the new datacenter and the old one can be removed.

It might even fit your needs in some case I cannot think about right now. Once you know and understand this process, you might find yourself making original use of it.

Limitations and Risks

  • Again, this is a good fit for a cloud environment or when getting new servers. In some cases it might not be possible (or is not worth it) to double the hardware in use, even for a short period of time.
  • You cannot use the DC Switch to change global settings like the cluster_name, as this value is unique for the whole cluster.

How-to - DC Switch

This section provides a detailed runbook to perform a Data Center Switch.

We will cover the following aspects of a data center switch.

Phases described here are mostly independent and at the end of each phase the cluster should be in a stable state. You can run only phase 1 (add a DC) or phase 3 (remove a DC) for example. You should always read about and pay attention to the phase 0 though.

Rollback

Before we start, it is important to note that until the very last step, anything can be easily rolled back. You can always safely and quickly go back to the previous state during the procedure. Simply stated, you should be able to run the opposite commands to the one shown here, in the reverse order until the cluster is in an acceptable state. It’s good to check (and store!) the value for the configuration we are about to change, before any step.

Keep in mind that after each Phase below the cluster is in stable state and can remain like this for a long time or forever without problem.

Phase 0 - Prepare Configuration for Multiple Data Centers

First, the preparation phase. We want to make sure Cassandra and Clients will react as expected in a Multi-DC environment.

Server Side (Cassandra)

Step 1: All the keyspaces use NTS

  • Confirm that each user keyspace is using the NetworkTopologyStrategy:
$ cqlsh -e ​"DESCRIBE KEYSPACES;"​
$ cqlsh -e ​"DESCRIBE KEYSPACE <my_ks>;"​ | grep replication
$ cqlsh -e ​​"DESCRIBE KEYSPACE <my_other_ks>;"​ | grep replication
...

If not, change it to be NetworkTopologyStrategy.

  • Confirm that all the keyspaces (including system_*, possibly opscenter, …) also use either the NetworkTopologyStrategy or the LocalStrategy. To be clear:
    • keyspaces using SimpleStrategy (i.e. possibly system_auth, system_distributed, …) should be switched to NetworkTopologyStrategy.
    • system keyspace or any other keyspace, using LocalStrategy should not be changed.

Note: SimpleStrategy is not good in most cases and none of the keyspaces should use it in a Multi-DC context. This is because client operations would touch the distinct data centers to answer reads, breaking the expected isolation between the data centers.

$ cqlsh -e ​"DESCRIBE KEYSPACE system_auth;"​ | grep replication
...

In case you need to change this, do something like:

# Custom Keyspace
ALTER KEYSPACE tlp_labs WITH replication = {
  'class': 'NetworkTopologyStrategy',
  'old-dc': '3'
};
[...]

# system_* keyspaces from SimpleStrategy to NetworkTopologyStrategy
ALTER KEYSPACE system_auth WITH replication = {
  'class': 'NetworkTopologyStrategy',
  'old-dc': '3' # Or more... But it's not today's topic ;-)
};
ALTER KEYSPACE system_distributed WITH replication = {
  'class': 'NetworkTopologyStrategy',
  old-dc': '3': '3'
};
ALTER KEYSPACE system_traces WITH replication = {
  'class': 'NetworkTopologyStrategy',
  'old-dc': '3': '3'
};
[...]

Warning:

This process might have consequences on data availability, be sure to understand the consequences on the token ownership not to break the service availability.

You can avoid this problem by mirroring the previous distribution SimpleStrategy was producing. Using NetworkTopologyStrategy in combination with GossipingPropertyFileSnitch copying the previous SimpleStrategy/SimpleSnitch behaviour (data center and rack names), you should be able to make this change a ‘non-event’, where actually nothing happens from the Cassandra topology perspective.

In other words, if both topologies result in the same logical placement of the nodes, then there is no movement and no risk. If the operation results in a topology change, (ie 2 clusters considered previously as one for example) it’s good to consider the consequences ahead and to run a full repair after the transition.

Step 2:

Make sure the existing nodes are not using the SimpleSnitch.

Instead, the snitch must be one that considers the data center (and racks).

For example:

 endpoint_snitch: Ec2Snitch

or

 endpoint_snitch: GossipingPropertyFileSnitch

If your cluster is using SimpleSnitch at the moment, be careful changing this value, as you might induce a change in the topology or where the data belongs. It is worth reading in detail about this specific topic if that is the case.

That’s it for now on Cassandra, we are done configuring the server side.

Client Side (Cassandra)

With this, since there are many clients out there, you might have to adapt to your specific case or driver. I take the Datastax Java driver as an example here.

All the clients using the cluster should go through the checks/changes below.

Step 4: Use a ​DCAware​ policy and disable remote connections

Cluster cluster = Cluster.builder()
				  .addContactPoint(​<ip_contact_list_from_old_dc>​)
				  .withLoadBalancingPolicy(
					  DCAwareRoundRobinPolicy.builder()
					  .withLocalDc(​"<old_dc_name>"​)
					  .withUsedHostsPerRemoteDc(​0​)
				  	  .build()
				  ).build();

Step 5: Pin connections to the existing data center

It’s important to use a consistency level that aims at retrieving the data from the current data center and not across the whole cluster. In general, use a consistency of the form: LOCAL_*. If you were using QUORUM before, change it to LOCAL_QUORUM for example.

At this point, all the clients should be ready to receive the new data center. Clients should now ignore the new data center’s nodes and only contact the local data center as defined above.

Phase 1 - Add a New Data Center

Step 6: Create and configure new Cassandra nodes

Choose the right hardware and number of nodes for the new data center, then bring the machines up.

Configure Cassandra nodes exactly like the old nodes except for those configuration that you intended to change with the new DC along with the data center name. The data center name is defined depending on the Snitch you picked. It can either be determined by the IP address, a File or AWS region name.

To perform this change using GossipingPropertyFileSnitch, edit the cassandra-rackdc.properties file on all nodes:

dc=<new_dc>
...

To create a new data center in the same region in AWS, you have to set the dc_suffix option in the cassandra-rackdc.properties file on all nodes:

# to have us-east-1-awesome-cluster for example
dc_suffix=-awesome-cluster

Step 7: Add new nodes to the cluster

Nodes can now be added, one at the time, just start Cassandra using the service start method that is specific to your operating system. For example on my linux systems we can run:

service cassandra ​start

Notes:

  • Start with the seed nodes for this new data center. Using two or three nodes as seeds per DC is a standard recommendation.
    -seeds: “<​old_ip1​>, <​​old_​ip2​>, <​old_​ip3​>, <​new_ip1​>, <​new_​ip2​>, <​new_​ip3​>”
    
  • There should be no streaming, adding a node should be quick - check the logs, to make sure of it. tail ​-fn 100 /​var​/​log​/cassandra/system.log
  • Due to the previous point, the nodes should join quickly as part of the new data center, check nodetool status. Make sure a node appears as UN before moving to the next node.

Step 8: Start accepting writes on the new data center

The next step is to accept writes for this data center by changing the topology so the new DC is also part of replication strategy:

ALTER​ KEYSPACE <my_ks> ​WITH​ ​replication​ = {
	'class': 'NetworkTopologyStrategy',
	'<old_dc_name>': '<replication_factor>',
	'<my_new_dc>': '<replication_factor>'
};
ALTER​ KEYSPACE <my_other_ks> ​WITH​ ​replication​ = {
	'class': 'NetworkTopologyStrategy',
	'<old_dc_name>': '<replication_factor>',
	'<my_new_dc>': '<replication_factor>'
};
[...]

Include system keyspaces that should now be using the NetworkTopologyStrategy for replication:

ALTER KEYSPACE system_auth WITH replication = {
  'class': 'NetworkTopologyStrategy',
  '<old_dc_name>': '<replication_factor>',
  '<new_dc_name>': '<replication_factor>'
};
ALTER KEYSPACE system_distributed WITH replication = {
  'class': 'NetworkTopologyStrategy',
  '<old_dc_name>': '<replication_factor>',
  '<new_dc_name>': '<replication_factor>'
};
ALTER KEYSPACE system_traces WITH replication = {
  'class': 'NetworkTopologyStrategy',
  '<old_dc_name>': '<replication_factor>',
  '<new_dc_name>': '<replication_factor>'
};
[...]

Note: Here again, do not modify tables using LocalStrategy.

To make sure that the keyspaces were altered as expected, you can see the ownership with:

nodetool​ status <my_ks>

Note: this is a a good moment to detect any imbalances in ownership and fix any issue there, before actually streaming the data to the new data center. We detailed this part in the post “How To Set Up A Cluster With Even Token Distribution” that Anthony wrote. You should really check this if you plan to use a low number of vnodes, if not, you might go through an operational nightmare trying to handle imbalances.

Step 9: Stream historical data to the new data center

Stream the historical data to the new data center to fill the gap, as the new cluster is now receiving writes, but is still missing all the past data. This is done by running this on all the nodes of the new data center:

nodetool​ rebuild old_dc_name

The output (in the logs) should look like:

INFO  [RMI TCP Connection(8)-192.168.1.31] ... - rebuild from dc: ..., ..., (All tokens)
INFO  [RMI TCP Connection(8)-192.168.1.31] ... - [Stream ...] Executing streaming plan for Rebuild
INFO  [StreamConnectionEstablisher:1] ... - [Stream ...] Starting streaming to ...
...
INFO  [StreamConnectionEstablisher:2] ... - [Stream ...] Beginning stream session with ...

Note:

  • nodetool​ setstreamthroughput X can help reducing the burden caused by the streaming on the nodes answering requests or, the other wait around, to make the transfer faster.
  • A good way to know the query finished is to run the command above from a screen or using tmux for example screen -R rebuild.

Phase 2 - Switch Clients to the new DC

At this point the new data center can be tested and should be a mirror of the previous one, except for the things you changed of course.

Step 10: Client Switch

The clients can now be routed to the new data center. To do so, change the contact point and the data center name. Doing this one client at the time, while observing impacts, is probably the safest way when there are many clients plugged to a single cluster. Back to our Java driver example, it would now look like this:

Cluster cluster = Cluster.builder()
				  .addContactPoint(​<ip_contact_list_from_new_dc>​)
				  .withLoadBalancingPolicy(
					  DCAwareRoundRobinPolicy.builder()
					  .withLocalDc(​"<new_dc_name>"​)
					  .withUsedHostsPerRemoteDc(​0​)
				  	  .build()
				  ).build();

Note: Before going forward you can (and probably should) make sure that no client is connected to old nodes anymore. You can do this in a number of ways:

  • Look at netstats for opened (native or thrift) connections:
    netstat -tupawn | grep -e 9042 -e 9160
    
  • Check that the node does not receive local reads (i.e. ReadStage should not increase) and does not act as a coordinator (i.e. RequestResponseStage should not increase).
    watch -d "nodetool tpstats"
  • Monitoring system/Dashboards: Ensure there are no local reads heading to the old data center.

Phase 3 - Remove the old DC

Step 11: Stop replicating data in the old data center

Alter the keyspaces so they no longer reference the old data center:

ALTER​ KEYSPACE <my_ks> ​WITH​ ​replication​ = {
	'class': 'NetworkTopologyStrategy',
	'<my_new_dc>': '<replication_factor>'
};
ALTER​ KEYSPACE <my_other_ks> ​WITH​ ​replication​ = {
	'class': 'NetworkTopologyStrategy',
	'<my_new_dc>': '<replication_factor>'
};

[...]

ALTER KEYSPACE system_auth WITH replication = {
  'class': 'NetworkTopologyStrategy',
  '<new_dc_name>': '<replication_factor>'
};
ALTER KEYSPACE system_distributed WITH replication = {
  'class': 'NetworkTopologyStrategy',
  '<new_dc_name>': '<replication_factor>'
};
ALTER KEYSPACE system_traces WITH replication = {
  'class': 'NetworkTopologyStrategy',
  '<new_dc_name>': '<replication_factor>'
};

Step 12: Decommission old nodes

Finally we need to get rid of the old nodes. Stopping the nodes is not enough as Cassandra will continue to expect them to come back to life anytime. We want them out, safely, but once and for all. This is what a decommission does. To cleanly remove the old data center we need to decommission all the nodes in this data center, one by one.

On the bright side, the operation should be almost instantaneous (at least very quick), because this datacenter is not owning any data anymore from a Cassandra perspective. Thus, there should be no data to stream to other nodes. If streaming happens, you probably forgot about a keyspace using SimpleStrategy or NetworkTopologyStrategy that still uses the old data center.

Sequentially, on each node of the old data center, run:

$ nodetool decommission

This should be fast, not to say immediate as this command should trigger no streaming at all due to the changes we made in the keyspaces replication configuration. This data center should not own any token ranges anymore as we removed the data center from all the keyspaces, in the previous step.

Step 13: Remove old nodes from the seeds

To remove any reference to the old data center, we need to update the cassandra.yaml file.

-seeds: “<​new_ip1​>, <​new_​ip2​>, <​new_​ip3​>”

And that’s it! You have now successfully switched over to a new Data Center. Of course during the process, ensure that the changes you just made are actually acknowledged in this new data center.

Carlos Sanchez: Progressive Delivery with Jenkins X: Automatic Canary Deployments

$
0
0

jenkins-x

This is the third post in a Progressive Delivery series, see the previous ones:

Progressive Delivery is used by Netflix, Facebook and others to reduce the risk of deployments. But you can now adopt it when using Jenkins X.

Progressive Delivery is the next step after Continuous Delivery, where new versions are deployed to a subset of users and are evaluated in terms of correctness and performance before rolling them to the totality of the users and rolled back if not matching some key metrics.

In particular we focused on Canary releases and made it really easy to adopt them in your Jenkins X applications. Canary releases consist on sending a small percentage of traffic to the new version of your application and validate there are no errors before rolling it out to the rest of the users. Facebook does it this way, delivering new versions first to internal employees, then a small percentage of the users, then everybody else, but you don’t need to be Facebook to take advantage of it!

facebook-canary-strategy.jpg

You can read more on Canaries at Martin Fowler’s website.

Jenkins X

If you already have an application in Jenkins X you know that you can promote it to the “production” environment with jx promote myapp --version 1.0 --env production. But it can also be automatically and gradually rolled it out to a percentage of users while checking that the new version is not failing. If that happens the application will be automatically rolled back. No human intervention at all during the process.

NOTE: this new functionality is very recent and a number of these steps will not be needed in the future as they will also be automated by Jenkins X.

As the first step three Jenkins X addons need to be installed:

  • Istio: a service mesh that allows us to manage traffic to our services.
  • Prometheus: the most popular monitoring system in Kubernetes.
  • Flagger: a project that uses Istio to automate canarying and rollbacks using metrics from Prometheus.

The addons can be installed (using a recent version of the jx cli) with

jx create addon istio
jx create addon prometheus
jx create addon flagger

This will enable Istio in the jx-production namespace for metrics gathering.

Now get the ip of the Istio ingress and point a wildcard domain to it (e.g. *.example.com), so we can use it to route multiple services based on host names. The Istio ingress provides the routing capabilities needed for Canary releases (traffic shifting) that the traditional Kubernetes ingress objects do not support.

kubectl -n istio-system get service istio-ingressgateway \
-o jsonpath='{.status.loadBalancer.ingress[0].ip}'

The cluster is configured, and it’s time to configure our application. Add a canary.yaml to your helm chart, under charts/myapp/templates.

{{- if eq .Release.Namespace "jx-production" }}
{{- if .Values.canary.enable }}
apiVersion: flagger.app/v1alpha2
kind: Canary
metadata:
  name: {{ template "fullname" . }}
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: {{ template "fullname" . }}
  progressDeadlineSeconds: 60
  service:
    port: {{.Values.service.internalPort}}
{{- if .Values.canary.service.gateways }}
    gateways:
{{ toYaml .Values.canary.service.gateways | indent 4 }}
{{- end }}
{{- if .Values.canary.service.hosts }}
    hosts:
{{ toYaml .Values.canary.service.hosts | indent 4 }}
{{- end }}
  canaryAnalysis:
    interval: {{ .Values.canary.canaryAnalysis.interval }}
    threshold: {{ .Values.canary.canaryAnalysis.threshold }}
    maxWeight: {{ .Values.canary.canaryAnalysis.maxWeight }}
    stepWeight: {{ .Values.canary.canaryAnalysis.stepWeight }}
{{- if .Values.canary.canaryAnalysis.metrics }}
    metrics:
{{ toYaml .Values.canary.canaryAnalysis.metrics | indent 4 }}
{{- end }}
{{- end }}
{{- end }}

Then append to the charts/myapp/values.yaml the following, changing myapp.example.com to your host name or names:

canary:
  enable: true
  service:
    # Istio virtual service host names
    hosts:
    - myapp.example.com
    gateways:
    - jx-gateway.istio-system.svc.cluster.local
  canaryAnalysis:
    # schedule interval (default 60s)
    interval: 60s
    # max number of failed metric checks before rollback
    threshold: 5
    # max traffic percentage routed to canary
    # percentage (0-100)
    maxWeight: 50
    # canary increment step
    # percentage (0-100)
    stepWeight: 10
    metrics:
    - name: istio_requests_total
      # minimum req success rate (non 5xx responses)
      # percentage (0-100)
      threshold: 99
      interval: 60s
    - name: istio_request_duration_seconds_bucket
      # maximum req duration P99
      # milliseconds
      threshold: 500
      interval: 60s

Soon, both the canary.yaml and values.yaml changes won’t be needed when you create your app from one of the Jenkins X quickstarts, as they will be Canary enabled by default.

That’s it! Now when the app is promoted to the production environment with jx promote myapp --version 1.0 --env production it will do a Canary rollout. Note that the first time it is promoted it will not do a Canary as it needs a previous version data to compare to, but it will work from the second promotion on.

With the configuration in the values.yaml file above it would look like:

  • minute 1: send 10% of the traffic to the new version
  • minute 2: send 20% of the traffic to the new version
  • minute 3: send 30% of the traffic to the new version
  • minute 4: send 40% of the traffic to the new version
  • minute 5: send 100% of the traffic to the new version

If the metrics we have configured (request duration over 500 milliseconds or more than 1% responses returning 500 errors) fail, Flagger then will note that failure, and if it is repeated 5 times it will rollback the release, sending 100% of the traffic to the old version.

To get the Canary events run

$ kubectl -n jx-production get events --watch \
  --field-selector involvedObject.kind=Canary
LAST SEEN   FIRST SEEN   COUNT   NAME                                                  KIND     SUBOBJECT   TYPE     REASON   SOURCE    MESSAGE
23m         10d          7       jx-production-myapp.1584d8fbf5c306ee   Canary               Normal   Synced   flagger   New revision detected! Scaling up jx-production-myapp.jx-production
22m         10d          8       jx-production-myapp.1584d89a36d2e2f2   Canary               Normal   Synced   flagger   Starting canary analysis for jx-production-myapp.jx-production
22m         10d          8       jx-production-myapp.1584d89a38592636   Canary               Normal   Synced   flagger   Advance jx-production-myapp.jx-production canary weight 10
21m         10d          7       jx-production-myapp.1584d917ed63f6ec   Canary               Normal   Synced   flagger   Advance jx-production-myapp.jx-production canary weight 20
20m         10d          7       jx-production-myapp.1584d925d801faa0   Canary               Normal   Synced   flagger   Advance jx-production-myapp.jx-production canary weight 30
19m         10d          7       jx-production-myapp.1584d933da5f218e   Canary               Normal   Synced   flagger   Advance jx-production-myapp.jx-production canary weight 40
18m         10d          6       jx-production-myapp.1584d941d4cb21e8   Canary               Normal   Synced   flagger   Advance jx-production-myapp.jx-production canary weight 50
18m         10d          6       jx-production-myapp.1584d941d4cbc55b   Canary               Normal   Synced   flagger   Copying jx-production-myapp.jx-production template spec to jx-production-myapp-primary.jx-production
17m         10d          6       jx-production-myapp.1584d94fd1218ebc   Canary               Normal   Synced   flagger   Promotion completed! Scaling down jx-production-myapp.jx-production

Dashboard

Flagger includes a Grafana dashboard for visualization purposes as it is not needed for the Canary releases. It can be accessed locally using Kubernetes port forwarding

kubectl --namespace istio-system port-forward deploy/flagger-grafana 3000

Then accessing http://localhost:3000 using admin/admin, selecting the canary-analysis dashboard and

  • namespace: jx-production
  • primary: jx-production-myapp-primary
  • canary: jx-production-myapp

would provide us with a view of different metrics (cpu, memory, request duration, response errors,…) of the incumbent and new versions side by side.

Caveats

Note that Istio by default will prevent access from your pods to the outside of the cluster (a behavior that is expected to change in Istio 1.1). Learn how to control the Istio egress traffic.

If a rollback happens automatically because the metrics fail, the Jenkins X GitOps repository for the production environment becomes out of date, still using the new version instead of the old one. This is something planned to be fixed in next releases.

Aaron Morton: Virtual tables are coming in Cassandra 4.0

$
0
0

One of the exciting features coming in Cassandra 4.0 is the addition of Virtual Tables. They will expose elements like configuration settings, metrics, or running compactions through the CQL interface instead of JMX for more convenient access. This post explains what Virtual Tables are and walks through the various types that will be available in version 4.0.

Virtual Tables

The term “Virtual Tables” can be confusing, as a quick Google search may leave one under the impression that they are views that can be created through an DDL statement. In the context of Cassandra, however, Virtual Tables will be created and managed by Cassandra itself, with no possibility of creating custom ones through CQL.

They are not to be confused with Materialized Views either, which persist data from a base table into another table with a different primary key.

For Cassandra 4.0, virtual tables will be read only, trivially exposing data as CQL rows. Such data was (and will still be) accessible through JMX, which can be cumbersome to interact with and secure.

Two new keyspaces were added in Cassandra 4.0 to support Virtual Tables: system_views and system_virtual_schema.
The latter will contain schema information on the Virtual Tables, while the former will contain the actual tables.

cqlsh> select * from system_virtual_schema.tables;

 keyspace_name         | table_name    | comment
-----------------------+---------------+------------------------------
          system_views |        caches |                system caches
          system_views |       clients |  currently connected clients
          system_views |      settings |             current settings
          system_views | sstable_tasks |        current sstable tasks
          system_views |  thread_pools |                             
 system_virtual_schema |       columns |   virtual column definitions
 system_virtual_schema |     keyspaces | virtual keyspace definitions
 system_virtual_schema |        tables |    virtual table definitions

Neither of these keyspaces can be described through the DESCRIBE KEYSPACE command, so listing the rows in system_virtual_schema.tables is the only way to discover the Virtual Tables.

The tables themselves can be described as shown here:

cqlsh> describe table system_views.caches

CREATE TABLE system_views.caches (
    capacity_bytes bigint PRIMARY KEY,
    entry_count int,
    hit_count bigint,
    hit_ratio double,
    name text,
    recent_hit_rate_per_second bigint,
    recent_request_rate_per_second bigint,
    request_count bigint,
    size_bytes bigint
) WITH compaction = {'class': 'None'}
    AND compression = {};

Available Tables in 4.0

Since Apache Cassandra 4.0 was feature freezed in September 2018, we already have the definitive list of Virtual Tables that will land in that release.

caches

The caches virtual table displays the list of caches involved in Cassandra’s read path. It contains all the necessary information to get an overview of their settings, usage, and efficiency:

cqlsh> select * from system_views.caches;

 name     | capacity_bytes | entry_count | hit_count | hit_ratio | recent_hit_rate_per_second | recent_request_rate_per_second | request_count | size_bytes
----------+----------------+-------------+-----------+-----------+----------------------------+--------------------------------+---------------+------------
   chunks |       95420416 |          16 |       134 |  0.864516 |                          0 |                              0 |           155 |    1048576
 counters |       12582912 |           0 |         0 |       NaN |                          0 |                              0 |             0 |          0
     keys |       25165824 |          18 |        84 |  0.792453 |                          0 |                              0 |           106 |       1632
     rows |              0 |           0 |         0 |       NaN |                          0 |                              0 |             0 |          0

This information is currently available through the nodetool info command.

clients

The clients virtual tables will list all connected clients, with information such as the number of issued requests or what username it is using:

cqlsh> select * from system_views.clients;

 address   | port  | connection_stage | driver_name | driver_version | hostname  | protocol_version | request_count | ssl_cipher_suite | ssl_enabled | ssl_protocol | username
-----------+-------+------------------+-------------+----------------+-----------+------------------+---------------+------------------+-------------+--------------+-----------
 127.0.0.1 | 61164 |            ready |        null |           null | localhost |                4 |           146 |             null |       False |         null | anonymous
 127.0.0.1 | 61165 |            ready |        null |           null | localhost |                4 |           155 |             null |       False |         null | anonymous 

settings

The settings virtual table will list all configuration settings that are exposeable in the cassandra.yaml config file:

cqlsh> select * from system_views.settings limit 100;

@ Row 1
-------+-----------------------------------------------------------------------------------------------------------------------------------------------
 name  | allocate_tokens_for_keyspace
 value | null

@ Row 2
-------+-----------------------------------------------------------------------------------------------------------------------------------------------
 name  | audit_logging_options_audit_logs_dir
 value | /Users/adejanovski/.ccm/trunk/node1/logs/audit/

@ Row 3
-------+-----------------------------------------------------------------------------------------------------------------------------------------------
 name  | audit_logging_options_enabled
 value | false

@ Row 4
-------+-----------------------------------------------------------------------------------------------------------------------------------------------
 name  | audit_logging_options_excluded_categories
 value | 

@ Row 5
-------+-----------------------------------------------------------------------------------------------------------------------------------------------
 name  | audit_logging_options_excluded_keyspaces
 value | 
...
...
...
@ Row 17
-------+-----------------------------------------------------------------------------------------------------------------------------------------------
 name  | back_pressure_strategy
 value | org.apache.cassandra.net.RateBasedBackPressure{high_ratio=0.9, factor=5, flow=FAST}

@ Row 18
-------+-----------------------------------------------------------------------------------------------------------------------------------------------
 name  | batch_size_fail_threshold_in_kb
 value | 50

@ Row 19
-------+-----------------------------------------------------------------------------------------------------------------------------------------------
 name  | batch_size_warn_threshold_in_kb
 value | 5

@ Row 20
-------+-----------------------------------------------------------------------------------------------------------------------------------------------
 name  | batchlog_replay_throttle_in_kb
 value | 1024
...
...

Here, I’ve truncated the output, as there 209 settings exposed currently. There are plans to make this table writeable so that some settings can be changed at runtime as can currently be done through JMX. Such changes, of course, would need to be persisted in cassandra.yaml to survive a restart of the Cassandra process.

sstable_tasks

The sstable_tasks virtual table will expose currently running operations on SSTables like compactions, upgradesstables, or cleanup. For example:

cqlsh> select * from system_views.sstable_tasks ;

 keyspace_name | table_name  | task_id                              | kind       | progress | total     | unit
---------------+-------------+--------------------------------------+------------+----------+-----------+-------
    tlp_stress | sensor_data | f6506ec0-3064-11e9-95e2-b3ac36f635bf | compaction | 17422218 | 127732310 | bytes

These informations are currently available through the nodetool compactionstats command.

thread_pools

The thread_pools virtual table will display the metrics for each thread pool in Cassandra:

cqlsh> select * from system_views.thread_pools ;

 name                         | active_tasks | active_tasks_limit | blocked_tasks | blocked_tasks_all_time | completed_tasks | pending_tasks
------------------------------+--------------+--------------------+---------------+------------------------+-----------------+---------------
             AntiEntropyStage |            0 |                  1 |             0 |                      0 |               0 |             0
         CacheCleanupExecutor |            0 |                  1 |             0 |                      0 |               0 |             0
           CompactionExecutor |            0 |                  2 |             0 |                      0 |            3121 |             0
         CounterMutationStage |            0 |                 32 |             0 |                      0 |               0 |             0
                  GossipStage |            0 |                  1 |             0 |                      0 |           17040 |             0
              HintsDispatcher |            0 |                  2 |             0 |                      0 |               0 |             0
        InternalResponseStage |            0 |                  8 |             0 |                      0 |               0 |             0
          MemtableFlushWriter |            0 |                  2 |             0 |                      0 |              20 |             0
            MemtablePostFlush |            0 |                  1 |             0 |                      0 |              21 |             0
        MemtableReclaimMemory |            0 |                  1 |             0 |                      0 |              20 |             0
               MigrationStage |            0 |                  1 |             0 |                      0 |               0 |             0
                    MiscStage |            0 |                  1 |             0 |                      0 |               0 |             0
                MutationStage |            0 |                 32 |             0 |                      0 |               8 |             0
    Native-Transport-Requests |            1 |                128 |             0 |                      0 |             717 |             0
       PendingRangeCalculator |            0 |                  1 |             0 |                      0 |               6 |             0
 PerDiskMemtableFlushWriter_0 |            0 |                  2 |             0 |                      0 |              20 |             0
              ReadRepairStage |            0 |                  8 |             0 |                      0 |               0 |             0
                    ReadStage |            0 |                 32 |             0 |                      0 |              22 |             0
                  Repair-Task |            0 |         2147483647 |             0 |                      0 |               0 |             0
         RequestResponseStage |            0 |                  8 |             0 |                      0 |              22 |             0
                      Sampler |            0 |                  1 |             0 |                      0 |               0 |             0
     SecondaryIndexManagement |            0 |                  1 |             0 |                      0 |               0 |             0
           ValidationExecutor |            0 |         2147483647 |             0 |                      0 |               0 |             0
            ViewBuildExecutor |            0 |                  1 |             0 |                      0 |               0 |             0
            ViewMutationStage |            0 |                 32 |             0 |                      0 |               0 |             0

This information is currently available through the nodetool tpstats command.

Locality

Virtual Tables, regardless of the type, contain data that is specific to each node. They are not replicated, have no associated SSTables, and querying them will return the values of the coordinator (the node that the driver chooses to coordinate the request). They will also ignore the consistency level of the queries they are sent.

When interacting with Virtual Tables through cqlsh, results will come from the node that cqlsh connected to:

cqlsh> consistency ALL
Consistency level set to ALL.
cqlsh> select * from system_views.caches;

 name     | capacity_bytes | entry_count | hit_count | hit_ratio | recent_hit_rate_per_second | recent_request_rate_per_second | request_count | size_bytes
----------+----------------+-------------+-----------+-----------+----------------------------+--------------------------------+---------------+------------
   chunks |       95420416 |          16 |       134 |  0.864516 |                          0 |                              0 |           155 |    1048576
 counters |       12582912 |           0 |         0 |       NaN |                          0 |                              0 |             0 |          0
     keys |       25165824 |          18 |        84 |  0.792453 |                          0 |                              0 |           106 |       1632
     rows |              0 |           0 |         0 |       NaN |                          0 |                              0 |             0 |          0

(4 rows)

Tracing session: 06cb2100-3060-11e9-95e2-b3ac36f635bf

 activity                                                                 | timestamp                  | source    | source_elapsed | client
--------------------------------------------------------------------------+----------------------------+-----------+----------------+-----------
                                                       Execute CQL3 query | 2019-02-14 14:54:20.048000 | 127.0.0.1 |              0 | 127.0.0.1
 Parsing select * from system_views.caches; [Native-Transport-Requests-1] | 2019-02-14 14:54:20.049000 | 127.0.0.1 |            390 | 127.0.0.1
                        Preparing statement [Native-Transport-Requests-1] | 2019-02-14 14:54:20.049000 | 127.0.0.1 |            663 | 127.0.0.1
                                                         Request complete | 2019-02-14 14:54:20.049424 | 127.0.0.1 |           1424 | 127.0.0.1

When interacting through the driver, there is no simple way of selecting a single node as coordinator. The load balancing policy is responsible for this and it is set on the Cluster object, not on a per query basis.
For the Datastax Java Driver, a new feature was introduced to support selecting a specific node to ease up Virtual Tables access through JAVA-1917. It adds a setNode(Node node) method to the Statement class in order to forcefully designate the node responsible for the query, and “voilà”.
For the record, the same feature was added to the Python driver.

Beyond Apache Cassandra 4.0

The data that is currently missing from Virtual Tables are global and table level metrics such as latencies and throughputs (Cassandra exposes A LOT of table specific metrics beyond those two).
Rest assured that these are being worked on from two different approaches in CASSANDRA-14670 and CASSANDRA-14572, which were not ready in time for the feature freeze.

It will probably take some time for Virtual Tables to match the amount of data available through JMX but we are confident it will catch up eventually.
Convenient and secure CQL access to runtime metrics in Apache Cassandra will tremendously ease building tools like Reaper which currently rely on JMX.


Claus Ibsen: I am now a Java Champion

$
0
0
At the end of November last year I was officially welcomed as a new JavaChampion on twitter.



I was among 30 new champions in 2018 and I am very humble to be among all these great champions. As you most likely know my work with Java is heavily focused on the Apache Camel project, and its good to know that other influencers in the Java eco-system that are not invested in general Java, can be recognized for their hard work and be nominated as a champion.

You can find a list of the champions on the official list (updated yearly) and on github (updated frequently).

This weekend I received a welcome gift, which is the official JavaChampion t-shirt and jacket.






Timothy Chen: The power of choice in data-aware cluster scheduling

$
0
0

In this post we’ll cover a scheduler called KMN that is looking to solve scheduling I/O intensive tasks in distributed compute frameworks like Spark or MapReduce. This scheduler is different than the ones we discussed previously, as it’s emphasizing on a data-aware scheduling which we’ll cover in this post.

Background

In today’s batch computing frameworks like Hadoop and Spark, they run a number of stages and tasks for each job which builds into a DAG (directed acyclic graph) dependency graph. If we assume a large portion of these jobs are I/O intensive, then a scheduler job will be to try to minimize the time it takes for tasks to read their data. However, in a large multi-tenant cluster, the perfect node with data locality can be often unavailable.

Data applications and algorithms today are also having the option to only choose a subset of source data for approximating the answer instead of requiring the full set of data.

Spark & MapReduce frameworks typically has input tasks that reads source data and intermediate tasks that has data forwarded from the input tasks to further processing. For a task scheduler, what it can optimize for input tasks is to try to place tasks closer to the source data (locality). For intermediate tasks, the scheduler instead will optimize for minimizing the network transfer from the input tasks. One of the main bottlenecks for in-cluster network bandwidth is over-saturated cross rack links. The authors simulated if network contention and data locality is achieved using past Facebook traces and estimated a 87.6% performance increase.

KMN Scheduler

The KMN scheduler is implemented in Spark that provides an application interface that allows users to choose what ratio of input data that the query will be selecting (1-100%).Capture

What the KMN scheduler will do is based on all the available N inputs and locality choices, choose to launch input tasks (one-to-one transfers) on a random sample of K available blocks with memory locality.

Capture

For intermediate tasks that does many-to-one transfers, the main insight that the authors found is that the key to avoid skews in cross rack network bandwidth is to allow more than K inputs tasks to be launched (M tasks), since this allows more choices to transfer data from in the downstream tasks that can avoid skewing. While finding the optimal rack placement for tasks is a NP-hard problem, the authors suggested either using greedy search that works best for small jobs or a variant of round-robin for larger jobs works quite well in their setup.

Capture

One important decision here is certainly how many additional tasks should we launch. Too many more tasks will cause longer job wait time (also taking in account stragglers), but too little additional tasks can potentially cause network imbalance problems. Finding the balance allows you maximize the balance between the two. One strategy here is that the scheduler can decide how long it’s going to wait for upstream tasks to launch and complete before firing the downstream tasks, so when you do encounter stragglers you won’t be waiting for all of them to complete in your sample.

Thoughts

Cross rack network congestion is still a real problem when I chatted with several companies operating large on-prem clusters. While the importance of data locality is decreasing over time given the faster speed available in the cloud, I think cross-AZ and also network congestion is still a problem that I see companies often run into in the cloud.

Certainly can see all distributed data frameworks start to be more aware of the cluster resource bottleneck while making tasks and distribution decisions.

FeatherCast: CHAOSSCon EU 2019, Why it’s Important for Open Source Metrics to Tell a Story, Brian Proffitt

$
0
0

CHAOSS stands for Community Health Analytics Open Source Software and recently at CHAOSSCon EU in Brussels, we spoke briefly to Brian Proffitt, one of the CHAOSS Board members and also Senior Principal Community Architect for Open Source and Standards team at Red Hat. He tells us why it’s important for metrics to tell a story, why previous metrics may not have been as impartial as people would want, and why increased mailing list traffic could indicate a potential community crisis!

Holden Karau: Contributing to Spark 3 @ Spark BCN Meetup

Sergey Beryozkin: Quarkus is all about Developer Joy

$
0
0
No doubt that you, being an open source developer, have already browsed through the QuarkusIO GitHub projects, been impressed by the cool Quarkus web site, and seen a lot of very positive feedback about this truly innovative project which will disrupt the current application runtime ecosystem.

So, I'm not going to go into the details into why Quarkus is such a brilliant project from the technical point of view. I'd only touch on why the Developer Joy is what Quarkus is all about IMHO.

I'm not a member of the Quarkus core team, but I was privileged to witness the launch of the project and observe the team doing the final preparations for the few days. It was obvious that what was really driving this talented team was the desire to make the features and the documentation as simple and accessible as possible from the very start.

Back in the Apache CXF days, I tried very hard with my dear Apache CXF colleagues, to make things as simple as possible for the users. Those of you who followed our progress will hopefully agree.

So, please believe me, and hope you don't mind me saying it, I do see when making the developers happy is what really drives a team. I trust you will confirm it yourself after working with Quarkus.

Get engaged with the project which will fly, become the part of its community now.

Finally, what about a link to some nice music ? Back in May 2018, I blogged about Thorntail v4. As you know by now, Thorntail v4 has not materialized. But, consider Thorntail v4 be an early version of Quarkus, given the core Thorntail v4 contributors helped to shape Quarkus. So that blog post of mine was not a complete flop :-). Besides I had a link there to the most beautiful piece of music, albeit that was a mix one.

So here is a real one. Because it is real now, here comes another new day, here comes Quarkus !

Enjoy !

Colm O hEigeartaigh: Using authorization with JWT tokens in Apache CXF

$
0
0
JSON Web Tokens (JWT) have been covered extensively on this blog (for example here). In this post we will cover how JWT tokens can be used for authorization when sent to a JAX-RS web service in Apache CXF. In particular, we will show how Apache CXF 3.3.0 supports claims based access control with JWT tokens.

1) JWT with RBAC

JWT tokens can be used for the purpose of authentication in a web service context, by verifying the signature on the token and taking the "sub" claim as the authenticated principal. This assumes no proof of possession of the token, something we will revisit in a future blog post. Once this is done we have the option of performing an authorization check on the authenticated principal. This can be done easily via RBAC by using a claim in the token to represent a role.

Apache CXF has a SimpleAuthorizingInterceptor class, which can map web service operations to role names. If the authenticated principal is not associated with the role that is required to access the operation, then an exception is thrown. Here is an example of how to configure a JAX-RS web service in CXF with the SimpleAuthorizingInterceptor for JWT:
Here the JwtAuthenticationFilter has been configured with a "roleClaim" property of "role". It then extracts the configured claim from the authenticated token and uses it for the RBAC authorization decision. To see this functionality in action, look at the corresponding test-case in my github repo.

2) JWT with CBAC

Since CXF 3.3.0, we can also use the Claims annotations in CXF (that previously only worked with SAML tokens) to perform authorization checks on requests that contain JWT tokens. This allows us to specify more fine-grained authorization requirements on the token, as opposed to the RBAC approach above. For example, we can annotate our service endpoint as follows:

Here we can see a "role" claim is required which must match either the value "boss" or "ceo". We can enable claims based authorization by adding the ClaimsAuthorizingFilter as a provider of the endpoint, with the "securedObject" parameter being the service implementation:

We can specify multiple claims annotations and combine them in different ways, please see the CXF webpage for more information. To see this functionality in action, look at the corresponding test-case in my github repo.

Holden Karau: PySpark on Kuberntes @ Python Barcelona March Meetup


Community Over Code: Shane’s Director Position Statement 2019

$
0
0

The ASF is holding its annual Member’s Meeting next week to elect a new board and new Members to the ASF. I’m honored to have been nominated to stand for the board election, and I’m continuing my tradition of publiclypostingmyvision for Apache eachyear.

I’ll keep this short(er); if you want to know more, please read my past thoughts on how Apache works and where we’re going (see end of this post).


After 20 years of growth, the ASF is a successful open-source community providing software to the world and a community framework to dedicated volunteers. At this time in our community development, we need to focus on efficiently scaling our organization to keep up with growth in project communities who need services and mentoring. We also need to make it easier for Members (whose numbers are rapidly increasing!) to participate in ways that provide consistent and positive guidance to our projects and podlings.

As a director, I will (continue) work on making it easier for our volunteers to make meaningful and personally fulfilling contributions to the ASF, both in their coding work and through participation in operations and project/podling governance. My priorities for the coming year include:

1) Revitalizing our internal mentoring programs, especially for newer
ASF Members and for the IPMC and Incubator mentors.

A framework for personal mentoring and consistent guidelines will encourage potential mentors to get involved. These mentors will in turn be able to support other volunteers (and committers!) by fostering a welcoming and friendly environment for those looking to step up to do more of our work.

2) Updating our policies, best practices, and processes.

We’ve made great strides with ensuring our processes are documented in
the past few years, but often in piecemeal and scattered pages. Mailing list discussions often address individuals’ immediate questions about a particular process, but don’t always provide the full background. A link to the supporting documentation for the answer (at a stable URL) will help the whole community learn for the future.

In our documentation, we also need to provide the rationale for our processes and policies. We need to clearly specify why a process is important, as well as what process to follow. Documentation that is clear, direct, and comprehensive will facilitate understanding and alignment across the ASF.

3) Maximizing volunteer enthusiasm and retention by facilitating the work they enjoy.

As the ASF grows, we need to encourage volunteers to work on as many organized improvements as we can, while allocating budget for work that volunteers can’t or won’t do. By building the frameworks needed with thoughtfully funded tools or frameworks, we can help our volunteers succeed and keep them engaged for the long term.

The ASF is lucky: our finances are doing great, and we have money in the
bank with a positive budget forecast. We also have officers and operations volunteers coming up with smart ways we can use our budget to provide better services to our communities. I want the board to ensure
that well-documented funding requests benefiting our mission get a
thoughtful and fact-based evaluation – something only the board (which
approves budgets) can do.


Other useful things I’ve written about Apache governance and the board:

The post Shane’s Director Position Statement 2019 appeared first on Community Over Code.

Community Over Code: How Apache Runs Annual Member Meetings

$
0
0

As a Delaware non-stock membership corporation, the ASF’s bylaws specify that we hold an annual meeting of the membership. Since the ASF is also a distributed organization of volunteers, we hold our meetings a little differently than most companies – meeting on IRC over three days, and voting securely online.

Since we’re in the middle of our meeting this week, here are some answers to common questions. If these are valuable, we can add them to the ASF’s official member’s meeting process. I’ve also written a timeline of pre-meeting setup, as well as about the work after the meeting.

How do you run a meeting with participants in every timezone?

We use IRC (not voice calls), we have proxies for attendance, and we make all the important decisions asynchronously. The meeting on IRC is held in two parts, with a 40+ hour recess in the middle, allowing part of the meeting to be realtime, and part handled over email.

What’s the agenda? What actually happens at the member meeting?

It’s probably a lot more boring than you imagine. Since we recess from the meeting to run balloting via email, the interactive IRC portions of the meeting are actually quite dry. Since most of our discussions happen asynchronously over mailing lists anyway, the interactive time is usually pretty limited. Typically the few questions we get serve mostly as prompts for the right people to have a longer discussion over email later.
More specifically: the Member Meeting happens in two parts on IRC: Tuesday we review the first half of the agenda and announce the special orders for a vote. We then recess the meeting and handle voting via email over the next 40+ hours. We reconvene on Thursday for the second half to hear the results of the vote… and that’s it!

How does the first half of the IRC meeting work?

The Chairman formally opens the meeting, and then essentially runs through the agenda interactively on IRC.
After a Call To Order, we hold a Roll Call, where members mark attendance at the meeting. This is important to comply with Delaware corporate law: we need to show that a quorum of members are ‘present’ at the meeting to know that we can conduct official business.
Our ASFBot code logs the IRC channel, and scans for everyone’s reply to the Roll Call. We have custom code that cross-checks the IRC lines with our membership roll. It also can cross-check both people who attend live, as well as people who are proxying attendance for another member.
Once we count that we’ve reached quorum, the meeting continues. The executive officers each provide a very brief State of the Foundation from their areas – a high-level note of major accomplishments in the past year. We post much more detailed Foundation overview reports separately.
The member’s meeting does offer a brief Q&A period, although since most conversations happen on our mailing lists, there are rarely questions. Most questions here get a simple answer, which is partly a prompt for a deeper discussion amongst the members later on our lists.
Then we announce the Special Orders, which are the three sets of ballots that Members will be asked to vote on during the recess. They include an “Omnibus resolution” about basic corporate affairs, and then two elections: one for the board, and one for new members. All data for the elections was available to the members ahead of time, so this is merely a formal announcement during the meeting.
Then… we take a recess!

What happens during the member meeting recess?

Once the Chairman formally calls the IRC meeting to a recess, our vote counting volunteers fire up the Apache STeVe code! STeVe is a complete election running system that sends an email to each member with a secure voting link. That way, members who can’t attend interactively on IRC (or, who live someplace where the meeting is in the middle of the night!) can simply read their email the next day to get their ballot. Members have about 40 hours to cast ballots during the recess.
All voting happens on the STeVe website with a simple UI, and the STeVe system also validates all ballots privately, with a small team of volunteer vote monitors.

So… there’s a second half of the meeting?

Yes – on Thursday, after the 40+hour recess, the IRC meeting is formally reconvened. At the end of the recess our vote monitors have tallied the ballots and present them to the Chairman to announce during the meeting.
Important: at this point, we can announce the new board: as soon as they’re elected, they become the new directors of the ASF. However, newly elected members are not announced, because they haven’t been told yet.
Member candidates are not told ahead of time, in case the vote doesn’t elect them in. Once they have been formally elected, the member who nominated them reaches out to welcome them and send them the membership application. Newly elected members have 30 days to sign and return the application, at which point they become voting members of the ASF.

How do you keep corporate records of your meetings?

As our official corporate meetings, we need to keep records to ensure we’re fully compliant with Delaware law. Since we’re also software people, we store all our records in a private Apache Subversion repository. This allows for strong access control and tracking of all changes to our records as well.
Foundation Members all have access to our /foundation repository, where the Meetings directory holds the agenda, README.txt, board and member ballots, a full raw IRC log, and more data about attendance and proxies. We save all data about membership meetings perpetually. Members can see the agenda and information about the March 2000 member’s meeting held at ApacheCon in Orlando!

Are membership meetings public?

No. While we publicly announce the newly elected board, and the list of any members who have been elected (and who accept the honor in the next weeks), the details of Member meetings are not public. Those interested in understanding ASF governance should instead see our published board meeting minutes, which is where the action is anyway.

How is attendance tracked?

Both to show compliance with corporate law, and for our internal records, we track attendance in a couple of ways.
During the first half of the IRC meeting, we take Roll Call. This lasts as long as it takes to show that we have a quorum (1/3) of members in the meeting – either who respond directly, or who have a proxy respond on their behalf. Note that members can announce their attendance at either half of the meeting (Tuesday or Thursday) to be ‘present’ during the meeting.
Separately, every member gets a ballot to vote in special orders, even if they don’t attend the meeting. Any member who actually submits their vote(s) is marked as voting at the meeting. Either attending, sending a proxy, or voting counts as participating in the meeting, which is what’s important for us internally.
Note that Emeritus members do not have a vote in our meetings, and don’t count for quorum. Some members – after realizing they haven’t participated in a meeting for years – will voluntarily choose to go Emeritus, and will send in the form to our corporate Secretary.

The post How Apache Runs Annual Member Meetings appeared first on Community Over Code.

Community Over Code: The Monktoberfest conference today and in history

$
0
0

This week is the Monktoberfest, the most interesting conference I’ve ever attended, and one of my must-attend events each year in October.  Not only are the talks thought-provoking and the attendees are awesome, but the location in Portland, ME and the food and events are top-notch. The ideas I get each year are a big inspiration, and it’s a long wait until next year each time.

Monktoberfest is a different kind of event: a two-day single-track conference.  Everyone is in the same room, watching the same great talks. Topics are on how technology shapes humanity, and bring some data. That means the speakers come from an incredible array of different backgrounds, industries, and perspectives.

Watching the first talk by a regular speaker here, I wondered how the history of Monktoberfest got to where we are.  Run by RedMonk – a technology analyst firm– the talks historically focused on the effects technology have on the business world, since that’s where their interests lie.  But in the past two years, the focus on social effects and change from the pervasiveness of all sorts of technology is clear – and makes the thought-provoking even more important for us.

Sessions are never announced ahead of time; the next speaker and their topic is always a surprise we learn as soon as they walk on stage; no sooner.  But Stephen O’Grady and the RedMonk team writeup their annual post-conference report, and a number of past videos are online, so you can follow along with the history.

Anyone else have great Monktoberfest history out there?

The post The Monktoberfest conference today and in history appeared first on Community Over Code.

Community Over Code: Apache How-To: Communicating Effectively

$
0
0

How can you be effective at asking questions or proposing changes at Apache or in an Apache project?  Besides making sure you use the right mailing list, and asking smart questions or following the several email etiquette guidelines and tips at Apache, what should you do to be effective at communicating your ideas?

There are plenty of times you’ve been polite and formulated a good question, but you either don’t get useful responses – or you get too many responses and tangents and complaints and and and…  What are some of the other factors to consider in being effective at communicating with Apache’s many communities?

Is this a question about one Apache project?

If your question or idea is just about a single Apache project, that’s pretty easy: write your thoughtful email to the dev@projectname mailing list; that’s where the primary community for most projects does their work.  Sometimes you may want to email the user@ mailing list if it’s a question about functionality, events, or user-based questions instead.

Make sure to understand that project’s expectations for discussions: do they use [SUBJECT-TAGS], or perhaps use JIRAs for organizing technical work?  Do they expect all proposals to be spelled out in your email, or do they often write up a proposal in their wiki, and then point to it from the email?  Following the project’s normal way of working is important to have maximum chance that other project volunteers will see and respond to your message.  Be sure to reply to any questions as well!

I can’t get a final decision on my question! What next?

The first thing is to be a little patient, and see if you can work out a good enough consensus.  That often takes time, because various other project participants may not see this as their urgent priority; you need to allow sufficient time for feedback.  You may need to adjust plans and make it clear that you’ve taken feedback and changed your proposal.

Sometimes, you need to call for a [VOTE].  The ASF has some very broad requirements around voting,  but really most of the details of votes are up to each individual PMC.  In most cases, a majority of +1 votes carries the day, unless -1 voters have a technically valid veto that can be shown to make the project worse (for a code modification).

Sometimes, if there is a larger issue at stake – where the dev@ list isn’t helping you at least get closure (even if it’s not necessarily agreement), you may want to escalate the issue to the PMC.  Every Apache project has a private@projectname mailing list – that’s privately archived – where the PMC discusses only issues that require privacy – typically security issues, voting in new committers, and rarely personnel issues or code of conduct type issues.

What if I have a question about the ASF itself?

The Apache Community Development project – with its own PMC and everything – is here to help guide newcomers and guide unusual questions to the right place.  If you have a public question about where to find someting, or that crosses multiple projects, start on the dev@community.apache.org mailing list.

Apache has a handful of other cross-project mailing lists as well for conferences, infrastructure, and legal questions.  Note that those lists are also publicly archived unless they specifically state otherwise.  Any potential security vulnerabilities should go directly to the Security Team, which obviously uses private archives.

What if an Apache project is not responding to me? How do I escalate concerns?

The ASF is designed to give project PMCs maximum freedom in governing their own projects.  While the board does expect to see a certain set of behaviors – especially working by consensus and welcoming any newcomers with good work, regardless of employer – the board of the ASF rarely gets involved directly in project operations.

However, if there is a serious governance or community issue that a PMC is not addressing, you can work to contact the ASF Board of DirectorsPMCs report quarterly directly to the board, separately from other corporate operations (infra, legal, trademarks, accounting, etc.)

Vulnerability escalations go to the Security team; and legal issues or communications from counsel go to the Legal Affairs committee.

How do I ask about Foundation governance and corporate affairs?

Most corporate organization happens on privately archived mailing lists.  While project work is done in the open as much as possible, internal corporate work like paying the bills, signing legal documents, and the like are done either by volunteers or in a few cases by paid staff or contractors.  The Membership of the ASF has oversight and visibility to these processes, and it’s usually Members who are volunteering to help with the work.

For Members, the first thing to remember is to use the right list!  Each area of corporate operations has their own mailing lists: legal, trademarks@, infrastructure, treasurer, fundraising, or secretary.  There’s also an overall operations mailing list that’s a great place to ask where you should take your question since the operations volunteers there usually have helpful answers.

PMC Members and committers don’t have access to those list archives, but we certainly accept emails from them. Be sure to send in a clear question, so we know you’re waiting for an answer, and we’ll get back to you.  If your question is about a specific Apache project, it’s a best practice to always cc: private@projectname as well, so the whole PMC there is aware.

If your question doesn’t require privacy, then the best bet is to ask Community Development to point you in the right direction.

What else can I do to be effective at communicating at Apache?

Remember: Apache mailing lists often have hundreds of subscribers – so you’re sending email to a lot of people, all of whom are (for Apache purposes) part-time volunteers.  Knowing your audience is one of the key points when writing – and that’s doubly true when communicating with so many people from different backgrounds.

  • Write helpful and descriptive subject lines; make sure list readers understand if they need to read your email – or not.
  • Keep email threads on topic – preferably, a single topic per email thread.  If you need to add a new topic or feel the urge to hijack someone’s thread, please don’t – start a new thread, or at least change the Subject line.
  • Pause. Take your time. Most project decisions should not be made in a rush, and overwhelming the list with your posts in a short period often backfires when other community members get overwhelmed and stop participating.
  • Keep focused on the issues, and the value to the project or to the ASF as a whole. Even about code we can sometimes feel emotional; keep it based on facts and focused on the big picture.
  • For significant decisions, re-post a recap of the final consensus.  This is best done with a new email or at least a changed Subject.
  • For complex issues, lay out the big picture very clearly.  Sometimes it’s best to post “Hey, I/we are thinking of $big_change_like_X.  If anyone already knows they’ll want to veto it, let us know before we investigate!”  Then write up a detailed proposal, and send that to the list for discussion.  Similarly: start a [DISCUSS] thread with the expectation to gauge interest and possible consensus, before doing lots of coding or planning work and assuming the project will accept it.

Still wondering where to ask something?  See my FAQ of FAQs about everything Apache too!

The post Apache How-To: Communicating Effectively appeared first on Community Over Code.

Mukul Gandhi: XSLT 1.0 transformations for large input documents

$
0
0
I thought that, this topic could be of interest to XML community.

I've discovered an interesting aspect of JAXP API (Java API for XML Processing) that seems to have relations to streaming that we talk with XSLT 3.0. Please see following document, and an example given in its section 4.12 (that explains JAXP's StAX API and using it with JAXP's transformation APIs)


Using the cited JAXP code in above document, one can transform very large input XML documents (I tried an input XML document with size of about 700 MB, that worked) using XSLT 1.0 (the JDK's built in JAXP implementation can do this. I've tried with JDK 1.8 which works fine for this). It can do certain kinds of XSLT 1.0 transformations with very large input XML documents, very well. When doing the same transformations with XSLT 2.0, or with XSLT 3.0 in non streaming way, we would usually get following errors 'java.lang.OutOfMemoryError: Java heap space'.

Colm O hEigeartaigh: HTTP Signature support in Apache CXF

$
0
0
Apache CXF provides support for the HTTP Signatures draft spec since the 3.3.0 release. Up to this point, JAX-RS message payloads could be signed using either XML Security or else using JOSE. In particular, the JOSE functionality can be used to also sign HTTP headers. However it doesn't allow the possibility to sign the HTTP method and Path, something that HTTP Signature supports. In this post we'll look at how to use HTTP Signatures with Apache CXF.

I uploaded a sample project to github to see how HTTP Signature can be used with CXF:
  • cxf-jaxrs-httpsig: This project contains a test that shows how to use the HTTP Signature functionality in Apache CXF to sign a message to/from a JAX-RS service.

1) Client configuration

The client configuration to both sign the outbound request and verify the service response is configured in the test code:

Two JAX-RS providers are added - CreateSignatureClientFilter creates a signature on the outbound request, and VerifySignatureClientFilter verifies a signature on the response. The keys used to sign the request and verify the response are configured in properties files, that are referenced via the"rs.security.signature.out.properties" and "rs.security.signature.in.properties" configuration tags:

Here we can see that a keystore is being used to retrieve the private key for signing the outbound request. If you wish to retrieve keys from some other source, then instead of using configuration properties it's best to configure the MessageSigner class directly on the CreateSignatureClientFilter.

By default CXF will add all HTTP headers to the signature. In addition, a client will also include the HTTP method and path using the "(request-target)" header. By default, the signature algorithm is "rsa-sha256", of course it is possible to configure this. A secured request using HTTP signature looks like the following:


2) Service configuration

The service configuration is defined in spring. Two different JAX-RS providers are used on the service side - VerifySignatureFilter is used to verify a signature on the client request, and CreateSignatureFilter is used to sign the response message.

For more information on how to use HTTP Signatures with Apache CXF, refer to the CXF documentation.

Shawn McKinney: On Becoming a Member

$
0
0

A couple of days ago, an unexpected message arrived in my inbox, inviting me to become a member of the Apache Software Foundation.

After the initial surprise wore off I began to process what it meant.  Obviously, it’s an honor.  But there’s more to it than that.

About five years ago we began having discussions with a colleague, Emmanuel Lécharny, about moving the OpenLDAP Fortress project into the ASF, as a sub-project of the Apache Directory, and that topic is covered here.

Since that time, the typical path of escalating involvement within a particular project was followed.  Contributor->Committer->PMC, …

What I learned during this period of time can’t be catalogued into a single blog post.  Careers are made (and sometimes broken) on transitional paths such as these.  There were challenges, pressures, (personal) shortcomings to be addressed, highs, lows and everything between.

It would take another post to cover all of the people involved, including family, fellow project members (both at ASF and OpenLDAP), business partners, work colleagues and the many other shoulders upon which I stood.  Thankful doesn’t begin to cover the feelings, I’m still processing, trying to make sense of it all.

Now, after having satisfied those original technology goals, it’s time to broaden the perspective to a wider field.  The elements contained within this new field of vision have yet to come into a sharp focus.

What I do know, it will be societal rather than technological.

For example, having a daughter just now starting her career in technology, what will it be like when she enters into the workplace?  Will organizations such as the Apache Software Foundation be inclusive to her (as it was to me) or will there be barriers put in place barring or slowing down entry?

What must change and what do we leave alone?  How do we ensure the essential characteristic of the ASF remains in place while making targeted changes (planting/pruning/weeding) to clear out space for new growth, allowing new opportunities for new segments of society?

These are the types of questions I’m asking myself.  An incredible opportunity to follow a new course alongside an unmistakable concern of not rising to the occasion.

 


Apache Velocity news: Velocity Engine 2.1 released

$
0
0

The Velocity developers are pleased to announce the release of Velocity Engine 2.1.

Main changes in this release:

+ New VTL syntax: alternate reference values: ${foo|'foo'} evaluates to 'foo' whenever boolean evaluation of $foo is false.

+ New VTL syntax: Default block for empty loops: #foreach($i in $collection) ... #else nothing to display #end

+ Two more Engine 1.7 backward compatibility flags, parser.allow_hyphen_in_identifier and velocimacro.arguments.literal

+ Velocity Engine 2.1 now requires Java 1.8+.

For a full list of changes, consult Velocity Engine 2.1 Changes section and JIRA changelog.

For notes on upgrading, see Velocity Engine 2.1 Upgrading section.

Downloads of 2.1 are available here.

Nick Kew: Placing the Blame

$
0
0

When David Cameron resigned, I said here that his successor would come in for a lot of blame.  And indeed, it has come to pass: Mrs May is getting the greater part of the blame for the mess brexit inevitably became.  Much of her party wants her to resign, and she’s indicated she may do so – albeit as a form of bribe to her party.

But who would want her job now?  There’s still a lot of blame to come, and our next prime minister won’t be popular for long either, no matter what he or she may do.  There might be someone among the more swivel-eyed loons with delusions, but the Party Establishment can surely see them off.

There’s one obvious candidate.  He’s in a position somewhat akin to May in 2016: of an age where if he doesn’t get the job now, he’ll be too old to be considered for it.  And every party in parliament – including his own – would just love to see him fall flat on his face, and take the major share of the blame for brexit fallout.  He is of course opposition leader Jeremy Corbyn.

And he’s also in a corner.  Give him an election and, unlike the tories, he really can’t afford not to fight it to win.

So the question is, how to engineer it, and leave him (and the country) the most poisonous legacy possible.  Well, they’re doing that by demonstrating that the tory party is just too dysfunctional and cannot govern.  That’s three-birds-with-one-stone: it leads us by default to the worst possible brexit to poison the future; it helps precipitate an election, and it helps avoid winning that election.  Genius!

Bryan Pendleton: Alex Honnold breaks it down

Holden Karau: Powering Tensorflow with Big Data @ CERN Computing Seminar

Colm O hEigeartaigh: Performance gain for web service requests in Apache CXF

$
0
0
In this post I want to talk about a recent performance gain for JAX-WS web service requests I made in Apache CXF. It was prompted by a mail to the CXF users list. The scenario was for a JAX-WS web service where certain requests are secured using WS-SecurityPolicy, and other requests are not. The problem was that the user observed that the security interceptors were always invoked in CXF, even for the requests that had no security applied to the message, and that this resulted in a noticeable performance penalty for large requests.

The reason for the performance penalty is that CXF needs to convert the request into a Document Object Model to apply WS-Security (note there is also a streaming WS-Security implementation available, but the performance is roughly similar). CXF needs to perform this conversion as it requires access to the full Document to perform XML Signature verification, etc. on the request. So even for the insecure request, it would apply CXF's SAAJInInterceptor. Then it would iterate through the security headers of the request, find that there was none present, and skip security processing.

However when thinking about this problem, I realised that before invoking the SAAJInInterceptor, we could check to see whether a security header is actually present in the request (and whether it matches the configured "actor" if one is configured). CXF makes the message headers available in DOM form, but not the SOAP Body (unless SAAJInInterceptor is called). If no matching security header is available, then we can skip security processing, and instead just perform WS-SecurityPolicy assertion using a set of empty results.

This idea is implemented in CXF for the 3.3.2 release via the task CXF-8010. To test what happens, I added a test-case to github here. This creates a war file with a service with two operations, one that is not secured, and one that has a WS-SecurityPolicy asymmetric binding applied to the operations. Both operations contain two parameters, an integer and a String description.

To test it, I added a JMeter test-case here. It uses 10 threads to call the insecure operation 30,000 times. The description String in each request contains the URL encoded version of the WS-Security specification to test what happens with a somewhat large request.

Here are the results using CXF 3.3.1:
and here are the results using the CXF 3.3.2-SNAPSHOT code with the fix for CXF-8010 applied:
Using CXF 3.3.1 the throughput is 1604.25 requests per second, whereas with CXF 3.3.2 the throughput is 1795.26 requests per second, a gain of roughly 9%. For a more complex SOAP Body I would expect the gain to be a lot greater.

Bryan Pendleton: A sense of fullness

$
0
0

Here's one take: Trump, following border trip, says country is full: 'We can't do it anymore'

President Trump, fresh off a trip to the U.S. southern border, doubled-down on his message that “the country is full”

...

“The country is full. We have ... our system is full. We can't do it anymore,” Trump said

...

The president shared the same message earlier in the day at a roundtable with law enforcement and immigration officials, telling any potential migrants to “turn around” because the U.S. “can’t take you anymore.”

And here's another: Heartland Visas Report

  • U.S. population growth has fallen to 80-year lows. The country now adds approximately 900,000 fewer people each year than it did in the early 2000s.
  • The last decade marks the first time in the past century that the United States has experienced low population growth and low prime working age growth on a sustained basis at the same time.
  • Uneven population growth is leaving more places behind. 86% of counties now grow more slowly than the nation as a whole, up from 64% in the 1990s.
  • In total, 61 million Americans live in counties with stagnant or shrinking populations and 38 million live in the 41% of U.S. counties experiencing rates of demographic decline similar to Japan’s.
  • 80% of U.S. counties, home to 149 million Americans, lost prime working age adults from 2007 to 2017, and 65% will again over the next decade.
  • By 2037, two-thirds of U.S. counties will contain fewer prime working age adults than they did in 1997, even though the country will add 24.1 million prime working age adults and 98.8 million people in total over that same period.
  • Population decline affects communities in every state. Half of U.S. states lost prime working age adults from 2007-2017. 43% of counties in the average state lost population in that same time period, and 76% lost prime working age adults.
  • Shrinking places are also aging the most rapidly. By 2027, 26% of the population in the fastest shrinking counties will be 65 and older compared to 20% nationwide.
  • Population loss is hitting many places with already weak socioeconomic foundations. The share of the adult population with at least a bachelor’s degree in the bottom decile of population loss is half that in the top decile of population growth. Educational attainment in the fastest shrinking counties is on average equivalent to that of Mexico today or the United States in 1978.
  • Population loss itself perpetuates economic decline. Its deleterious effects on housing markets, local government finances, productivity, and dynamism make it harder for communities to bounce back. For example, this analysis found that a 1 percentage point decline in a county’s population growth rate is associated with a 2-3 percentage point decline in its startup rate over the past decade.

Happily for me, I live in one of those areas where immigrants are welcomed; nearly everyone that I spend my waking hours with is either an immigrant or a child of an immigrant, and my part of the country is experiencing the most breathtaking growth, both cultural growth and economic growth, since the pre-Civil-War "Gold Rush" years of 1849-1850.

But I understand that other areas of the country are different.

Nick Kew: Passion

$
0
0

Time to mention our next concert: one of the greatest of all Easter works.  Bach’s St Matthew Passion, at the Guildhall, Plymouth, a week today (Sunday April 14th).

This work should need no introduction, and I have no hesitation recommending it to readers within evening-out distance of Plymouth.  I’m looking forward to it.

Just one downside.  As with our performance of the St John’s Passion three years ago, this is a “new” Novello translation.  I think if I’d come to these (translations) in reverse order my criticisms might have been a little different, but the underlying point remains: these are about money.  A rentier publisher contemptuously saying screw the art.  And I can now answer the question I posed then: with ISIS no longer having the earthly power to destroy more great heritage, Novello score a clear victory in the cultural vandalism stakes.


Bryan Pendleton: We are stardust, we are golden; we are billion year old carbon

$
0
0

Every year, as we approach Earth Day, it's good to remember, and good to consider, this magical aggregate of dust upon which we all survive:

  • This Woman Paddled 730 miles up the Green River - to save our water systems.
    I’m paddling the length of the river, to try and understand that risk, my own and other people’s, and to see, from river level, what we could stand to lose if we don’t change how we use and allocate water. “Throughout the whole last century, if you needed more water it always worked out somehow, but it doesn’t work when you get to the point where you’re storing every last drop,” Doug Kenney, Director of the Western Water Policy program at the University of Colorado, tells me before I set out on the river. “You have to talk people through it, and explain that for every new reservoir you try and fill you’re putting more stress on the other parts of the system. Things are changing and we should behave in a way that limits our risk.”

    (See also: Heather Hansman: The Dam Problem in the West)

  • Letter From a Drowned Canyon
    On a map, Glen Canyon before its submersion looks like a centipede: a 200-mile-long central canyon bending and twisting with a host of little canyons like legs on either side. Those side canyons were sometimes hundreds of feet deep; some were so cramped you could touch both walls with your outstretched hands; some had year-round running water in them or potholes that explorers had to swim across. Sometimes in the cool shade of side-canyon ledges and crevices, ferns and other moisture-loving plants made hanging gardens. Even the names of these places are beautiful: Forbidding Canyon, Face Canyon, Dove Canyon, Red Canyon, Twilight Canyon, Balanced Rock Canyon, Ribbon Canyon. Like Dungeon Canyon, they are now mostly underwater.

    When the Sierra Club pronounced Glen Canyon dead in 1963, the organization’s leaders expected it to stay dead under Lake Powell. But this old world is re-emerging, and its fate is being debated again. The future we foresee is often not the one we get, and Lake Powell is shriveling, thanks to more water consumption and less water supply than anyone anticipated. Beneath it lies not just canyons but spires, crests, labyrinths of sandstone, Anasazi ruins, petroglyphs, and burial sites, an intricate complexity hidden by water, depth lost in surface. The uninvited guest, the unanticipated disaster, reducing rainfall and snowmelt and increasing drought and evaporation in the Southwest, is climate change.

  • How the Flint River got so toxic
    Why did Flint’s river pose so many problems? Before processing, the water itself is polluted from four sources: natural biological waste; treated industrial and human waste; untreated waste intentionally or accidentally dumped into the river; and contaminants washed into the river by rain or snow. The river is also warmer than Lake Huron, and its flow is less constant, particularly in the summer. All of this raises levels of bacteria and organic matter and introduces a wide range of other potential contaminants, whether natural or human-made.

    In fact, while the Flint River had been improving thanks to the new regulations, the departure of heavy industry, and local cleanup efforts, it had long been known as an exceptionally polluted river. Until very recently, it had been repeatedly ruled out as a primary source for the city’s drinking water. It is hard to imagine why anyone familiar with the river’s history would ever decide to use it even as a temporary water source. But they did.

  • Looking Again at the Chernobyl Disaster
    A neglected step caused the reactor’s power to plunge, and frantic attempts to revive it created an unexpected power surge. Poorly trained operators panicked. An explosion of hydrogen and oxygen sent Elena into the air “like a flipped coin” and destroyed the reactor. Operators vainly tried to stop a meltdown by planning to shove control rods in by hand. Escaping radiation shot a pillar of blue-white phosphorescent light into the air.

    The explosion occurs less than 100 pages into this 366-page book (plus more than 100 pages of notes, glossary, cast of characters and explanation of radiation units). But what follows is equally gripping. Radio-controlled repair bulldozers became stuck in the rubble. Exposure to radiation made voices grow high and squeaky. A dying man whispered to his nurse to step back because he was too radioactive. A workman’s radioactive shoe was the first sign in Sweden of a nuclear accident 1,000 miles upwind. Soviet bigwigs entered the area with high-tech dosimeters they didn’t know how to turn on. Investigations blamed the accident on six tweakers, portrayed them as “hooligans” and convicted them.

Ortwin Glück: [Code] On Gentoo sshd crashes after gcc update

$
0
0
I have updated from gcc-7.3 to gcc-8.2. On most of my Gentoo boxes this lead to continous crashing of sshd. The crashes don't actually look like crashes (not sefault or anything) but rather look like normal process exit or sigkill. Sshd would crash at connection attempts and also when I run grub-install (which is really freaking weird). The problem persists across reboots.

After I rebuilt ssh openssl and pam with the new compiler the problem went away:
emerge -1av openssh openssl pam

Justin Mason: Links for 2019-04-08

$
0
0

Justin Mason: Links for 2019-04-09

$
0
0

Stefan Bodewig: XMLUnit.NET 2.7.0-beta-01 Released

$
0
0

This is the very first release of XMLUnit.NET that supports .NET Standard 2.0 in addition to the traditional .NET Framework 3.5. Only the build process and the nuget packages have changed, there is no other difference between this release and XMLUnit.NET 2.6.0. If you don't need support for .NET Standard there is no reason to upgrade.

This release has been labeled beta as the packaging needs more thorough testing by the community.

Many thanks to @shatl who performed most of the heavy lifting which has made this release possible.

James Duncan

$
0
0

With the completion of our event in New York yesterday, the Create tour is well and truly underway.

We’ve got 11 more stops planned before the end of June around the world, but it feels really good to get this first one done, to connect with the attendees in New York, and to get their feedback on how we did. We’ll be taking that feedback and rolling it in, adjusting as we go. Iteration is the key, right?

Next stop: tomorrow in Toronto.

James Duncan


Justin Mason: Links for 2019-04-10

$
0
0
  • At wit’s end with my preschooler : Parenting

    This /r/parenting thread has some good advice on dealing with kids’ meltdowns. I wish I had this a few years ago

    (tags: parentingkidstantrumsangerredditadvice)

  • Spark memory tuning on EMR

    ‘Best practices for successfully managing memory for Apache Spark applications on Amazon EMR’, on the AWS Big Data blog. ‘In this blog post, I detailed the possible out-of-memory errors, their causes, and a list of best practices to prevent these errors when submitting a Spark application on Amazon EMR. My colleagues and I formed these best practices after thorough research and understanding of various Spark configuration properties and testing multiple Spark applications. These best practices apply to most of out-of-memory scenarios, though there might be some rare scenarios where they don’t apply. However, we believe that this blog post provides all the details needed so you can tweak parameters and successfully run a Spark application.’

    (tags: sparkemrawstuningmemoryoomsjava)

Claus Ibsen: Short Apache Camel K video

$
0
0
You may have seen the work we are doing in the Apache Camel community around Camel K.
Nicola introduced Camel K on his blog a half year ago, with the words
Just few months ago, we were discussing about a new project that we could start as part of Apache Camel. A project with the potential to change the way people deal with integration. That project is now here and it’s called “Apache Camel K”.
The Apache Camel K is in active development and its progressing nicely. Yesterday I gave a talk at the KMD Steam conference in Copenhagen, Denmark about Serverless Integration with Knative and Camel K on Kubernetes. As the talk was only 30 minutes I decided not to do any live demos and quickly recorded a 45 second short video of a quick Camel K demo.


In the top left corner you have a Camel route in a single Sample.java source file. On the top right corner we have an openshift web console, as I am running a local minishift cluster (Camel K also runs nicely on vanilla Kubernetes, but their web console is not as great as the one from openshift).
In the bottom we have the terminal where I run the Camel K integration with the Camel K CLI tool and the output of the integration is logged in the console. Notice how quickly the rolling upgrade is when I edit and save the Java source code.


Justin Mason: Links for 2019-04-11

$
0
0
  • Amazon workers call for zero carbon emissions and cancellation of an AWS fossil-fuel friendly program

    nice one.

    Then the activists saw an article in Gizmodo, a technology news site, that outlined how Amazon’s cloud computing division was building special offerings for oil and gas companies. On its website, Amazon says its customers include BP and Royal Dutch Shell, and its products can “find oil faster,” “recover more oil” and “reduce the cost per barrel.” In a second meeting with Amazon, the workers raised the oil industry connections with the company’s sustainability team; its members did not seem to be aware of the business, according to several employees at the meeting. “That really showed us Amazon is not taking climate change seriously if the highest levels of the sustainability team are not even aware that we have an oil and gas business,” said Ms. Cunningham, who was at the meeting.

    (tags: amazonawsfossil-fuelszero-carbonemissionsclimate-changesustainability)

  • Using 6 Page and 2 Page Documents To Make Organizational Decisions

    Ian Nowland has written up the Amazon 6-pager strategy:

    A challenge of organizations is the aggregation of local information to a point where a globally optimal decision can be made in a way all stakeholders have seen their feedback heard and so can “disagree and commit” on the result. This document describes the “6 pager” and “2 pager” document and review meeting process, as a mechanism to address this challenge, as practiced by the document’s author in his time in the EC2 team at Amazon, and then at Two Sigma. […] The major variant I have also seen is 2 pages with 30 minute review; when the decision is smaller in terms of stakeholders, options or impact. That being said, there is nothing magical about 2 pages, i.e., a 3 page document is fine, it just should be expected to take more than 30 minutes to review.

    (tags: amazonbusinessdecisionsteamsdocumentsplanning)

  • Europol Tells Internet Archive That Much Of Its Site Is ‘Terrorist Content’ | Techdirt

    ‘The Internet Archive has a few staff members that process takedown notices from law enforcement who operate in the Pacific time zone. Most of the falsely identified URLs mentioned here (including the report from the French government) were sent to us in the middle of the night – between midnight and 3am Pacific – and all of the reports were sent outside of the business hours of the Internet Archive. The one-hour requirement essentially means that we would need to take reported URLs down automatically and do our best to review them after the fact. It would be bad enough if the mistaken URLs in these examples were for a set of relatively obscure items on our site, but the EU IRU’s lists include some of the most visited pages on archive.org and materials that obviously have high scholarly and research value.’

    (tags: eueuropolpolicingfrancearchive.orgarchivalwebfreedomcensorshipfail)

Bryan Pendleton: The (un)happy Medium

$
0
0

Generally speaking, I feel like I ought to like Medium

So why is it that I seem to be actively avoiding every medium link that shows up in my various feeds, nowadays.

I can't clearly express the unease I have about the platform.

What do you think? Is Medium to be avoided, embraced, or is it just "meh"?

Claus Ibsen: Long 2h Apache Camel video (sorry it's in danish)

$
0
0
A couple of days ago I was back in Copenhagen, at the capital region IT division for health care, where my Apache Camel journey started in 2008. So it was great being back at that magical place ;)


The event was hosted by Javagruppen and they had video equipment so they streamed the event live on youtube.

The agenda of my 2 hour session was:

  • What is Apache Camel?
  • Apache Camel v3
  • Apache Camel K
  • Knative & Camel
  • Quarkus & Camel

The main topic of the session was the new Apache Camel K project, but I gave a good extended coverage of what's coming in the upcoming Apache Camel v3.

For anyone curious a bit what is coming in Apache Camel v3, then you can take a look at the slides as they are in english.

The slides of the talk is here:



... and the video is online at youtube (danish):




James Duncan

James Duncan

$
0
0

View from the twenty-second floor of the Sheraton Hotel in Toronto

Traveling west means waking up early and enjoying sunrises, as long as there’s a good view and you have time to enjoy it. I wanted to watch this sunrise unfold and watch people start their day, but I had to start my own day, get a move on, and go to the venue for Create in Toronto.






Latest Images