Fluentsee: Fluentd Log Parser

I wrote previously about using fluentd to collect logs as a quick solution until the “real” solution happened.  Well, like many “temporary” solutions, it settled in and took root. I was happy with it, but got progressively more bored of coming up with elaborate command pipelines to parse the logs.

Fluentsee

So in the best DevOps tradition, rather than solve the initial strategic problem, I came up with an another layer of paint to slap on as a tactical fix, and fluentsee was born.  Fluentsee is written in Java, and lets you filter the logs, and print out different format outputs:

$ java -jar fluentsee-1.0.jar --help
Option (* = required)          Description
---------------------          -----------
--help                         Get command line help.
* --log <String: filename>     Log file to use.
--match <String: field=regex>   Define a match for filtering output. May pass in
                                 multiple matches.
--tail                         Tail the log.
--verbose                      Print verbose format entries.

So, for example, to see all the log entries from the nginx container, with a POST you would:

$ java -jar fluentsee-1.0.jar --log /fluentd/data.log \
--match 'json.container_name=.*nginx.*' --match 'json.log=.*POST.*'

The matching uses Java regex’s. The parsing isn’t wildly efficient but keeps up generally.

Grab it on Github

There’s a functional version now on github, and you can expect enhancements, as I continue to ignore the original problem and focus on the tactical patch.

Advertisements

Collecting Docker Logs With Fluentd

I’m working on a project involving a bunch of services running in docker containers.  We are working on a design and implementation of our full blown log gathering and analysis solution, but what was I to do till then?  Having to bounce around to all the hosts and look at them there was getting tiresome, but I didn’t want to expend much energy on a stopgap measure either.

Enter Fluentd

Docker offers support for various different logging drivers, so I ran down the list and gave each choice about ten minutes of attention, and sure enough, one choice only needed ten minutes to get up and running – fluentd.

What it Took

  1. Pick a machine to host logs
  2. Run a docker image of fluentd on that host
  3. Add a couple of additional options on my docker invocations.

What That Got Me

With the above done, all my docker containers logs aggregated on the designated host in an orderly format, with log rolling etc.

But…

The orderly format in the aggregated log,  was well structured but maybe not friendly.  Its format is:

TIMESTAMP HOST_ID JSON_BLOB

So an example might look like:

20170804T140852+0000 9c501a9baf61 {"container_id":"...","container_name":"...","source":"stdout","log":"..."}

Everything in its place but…

How To Deal

So with everything going into one file, and a mix of text and JSON, I settled on the following approach.   First I installed jq to help format the JSON.  Then I just employed tried and true command line tools.

For example, lets say you just want to look at the log entries for an nginx container:

grep /nginx /fluentd/data.20170804.log | cut -c 35- | jq -C . | less -r

That’s all it takes!  Use grep to pull the lines with the container name, cut out the JSON, have jq format it, and view it.

Maybe you just want the log field, rather then the entire entry:

grep /nginx /fluentd/data.20170804.log | cut -c 35- | jq -C .log | less -r

Just have jq pull out the single field.

It’s Low Tech But…

For about ten minutes setup work, and a little command line magic, I’ve got a good solution until the real answer arrives.

Tech Notes

There were a couple of specifics worth noting in the process here.  First, there are at least two ways to direct docker to use a specific log driver. One is via the command line on a run. The other is to configure the docker daemon via its /etc/docker/daemon.json file.  The command line is more granular, you can pick and choose which containers log to which driver. That’s flexible and nice, but unfortunately docker “compose” and “cloud” don’t support setting the driver for a container.  Setting at the docker daemon level as a default solves the compose/cloud issue, but, creates a circular dependency if you’re running fluentd in docker, because that container won’t start unless fluentd is running, but fluentd is in that container.  I went with setting it at the daemon level, and I made sure to run the fluentd container first thing, with a command line option indicating the traditional log driver.

The second noteworthy point was that the fluentd container provides a data.log link that was supposed to always point to the newest log… for me it doesn’t.  I have to look into the log area and find the newest log myself because data.log doesn’t update correctly through some log rotations.

Information Graveyard

I’m trying to learn how to write a skill for Amazon’s Alexa, taking the tried and true approach of searching for tutorials on the internet.  At this point it’s been only frustration.  I’ve found both Amazon written tutorials and third party ones.  Not a single one yet has provided instructions that correspond to the current Amazon tools.  Some are relatively recent, or at least claim to have been recently updated, but not a one has actually provided a working example.  It’s not a matter of slight differences that can be worked around, each one has had at least one step that didn’t seem to correspond to anything in the AWS console as it is today.

Keeping posts up to date is work, I realize. I’m guilty too, of leaving out of date documentation out in the wild, but I make an effort to be responsible, and I’m not expecting revenue from my posts.  How is it that even Amazon’s own tutorials are completely borked?  I tried this about two months back and it was the same story. Since then both the tutorials and AWS tools have been updated, but the new combination is no more workable than the prior.

Some products are notably bad on this point.  Amazon’s SDKs and tools are a consistent pain point. The Spring ecosystem too is bad.  JBoss a mess.  The problem also is made worse by how the developers refactor code and API.  Making changes and improvements in a way that facilitated migrations is a skill.  I wish Amazon acquired that skill.

Compromises

I hit on a really good article on the Law of Demeter. If you’re not familiar with it read that article, and if you do you may find my discussion with the author in the comments. The discussion was around the how rigidly you took the term Law.

Why Quibble The Word Law?

I code mostly in Java, classified as an object-oriented language, and I’ve coded in the OO paradigm in C++, Objective-C and Smalltalk too.  But I started in procedural (Pascal and C) and I’ve worked with functional (Lisp, SML), and smatterings of other languages with their styles too.  They all have their strong points. I’ve learned tactics and patterns from them all and when I’m encounter a situation where one applies, if the current tools can implement the tactic I use it.  I’m not saying anything astonishing here, modern tools are rarely purist in their approaches anymore.

The Law of Demeter is a good OO maxim, but if you’re writing code that handles serialized data, whether if be a RESTful service, or data store persistence, etc. you’ll likely be dealing with composited data (addresses, inside accounts, inside portfolios etc.).  Accessing portfolio.account.address.state violates the Law of Demeter. There are patterns to mitigate some of the issues here, like Data Transfer Objects, the Visitor Pattern, or Facade Pattern,  but depending on the situation some of these cures are worse than the problem.

In Summary

Keep the Law of Demeter in mind as you write/review your code. If it’s been rampantly ignored that certainly is a code smell.  But paradigm “laws” are for purists, and writing software is a pragmatic process… so… yeah… it’s a maxim.

Revisiting Immutables

A while back I looked into the immutables project for Java. It lets you define immutable objects in a very simple way.  Depending on your programming background you might not even see the role of an immutable object, but as more languages and patterns concern themselves with more rigorous and safer practices you may find the call for immutability in your Java.

When I looked into the project I was impressed, but at the time, the current build tools and IDEs struggled with the preprocessing requirements, and Java 7 didn’t complement it either.  All that has changed, and this project is now one I’m adding to my short list of favorites.

What You Get

I’m not going to go into a lot of detail here, there are plenty of good sources for that just a google away, but here’s the concept in brief.  The project lets you define an immutable object by simply adding annotations to an interface or abstract class, for example:

@Value.Immutable
publice interface Item {
  String getName();
  Set<String> getTags();
  Optional<String> getDescription();
}

With that definition in place, a preprocessor will create you:

  • A immutable class that implements your interface
  • A very clean and powerful builder implementation with slick support for collections, defaults, optionals and more.

With the example above you can start using the immutable easily:

Item item = ImmutableItem.builder()
                         .name("Veggie Pizza")
                         .addTag("mushrooms")
                         .addTag("peppers")
                         .description("A pizza with vegetables.")
                         .build();

System.out.println("Order " + item.getName() + " it's a "
                       item.getDescritpion().orElse("nice choice"));

The Mechanics

The mechanics of using the project, at least with gradle, are now as simple as:

buildscript {
    dependencies {
        classpath 'gradle.plugin.org.inferred:gradle-processors:1.2.11'
    }
}

apply plugin: 'org.inferred.processors'

dependencies {
    processor 'org.immutables:value:2.4.6'
}

And the generated code appeared in IntelliJ seamlessly!

With Java 8

Java 8 complements the project in at least a couple of ways.  Java now has its own Optional class so you don’t need to pull in another projects for that. But what I found nicer was using default methods.  The project has long (always?) supported abstract classes as well as interfaces and so there was a way to add pure implementation code to the immutables. But interfaces provide more inheritance flexibility and now with Java 8’s default methods you can get the best if both, interfaces benefits, with accompanying method implementations!

So…

I highly recommend giving the project a test run and incorporating immutables in your toolbox.

GraphQL: A Java Server in Ratpack

I previously wrote about an implementation of a GraphQL server in Java.  That post is showing age because the code is part of a kata project and constantly evolves.

A Concise Example

So, I’ve created a new concise example in GitHub that exemplifies using:

The example is just the code needed to manage a list of companies in memory. It implements basic CRUD operations but with an extensible pattern.

Grab The Code

It’s all covered in ~300 lines of code:

  1. A package with a GraphQL Schema, using a Query and a Mutation class
  2. A GraphQLHandler that dispatches POST requests to GraphQL
  3. An Application that creates the Ratpack server for the GraphQLHandler

The README covers how to run it, and there are a series sample requests included.

First Project With Ratpack

One of my trade skills is server side Java, implying writing services in Java.  Recently they’ve mostly been servlet based microservices. I’ve used the Spark Framework a lot, but as clean as that framework is, there’s no denying that servlets are the man behind the curtain, and you can’t avoid paying him attention any time you do anything of substance. Servlets work well enough, but they are showing their age, and so I always keep an eye open for other light weight service architectures.

Ratpack

When I saw Ratpack I decided to give it a go. It bills itself as “Simple, lean & powerful HTTP apps”.  It’s built on newer and a carefully curated selection of technologies: Java 8, Netty, Guice, asynchronicity, event driven … looked promising.

Giving it a Try

I took my standard approach to any existing evaluation, and migrated one on my kata projects over to it.  The obvious choice was my snippets service.  I created a feature branch, and damned if just a few hours later I didn’t have a version of the service that I felt was cleaner and faster, and the branch was merged to master.

Likes

What I liked about Ratpack:

  • Appeared to live up to its credo, simple, lean and powerful.
  • Seemed to produce a quick app.
  • Clearly not a servlets facade. The APIs are largely consistent and you don’t immediately hit the “and here’s where it becomes servlets” boundaries over and over.
  • Documented, both an online manual and Javadocs.

The Imperfections

Here are the rough edges from my perspective:

  • The documentation. Yup, it’s a like and a dislike. The manual is useful, but it leans towards “here’s a cool way to do this cool thing.” When I wanted to do the pedestrian “serve static content from inside a fat jar”,  I had to hit Google and hunt around various boards.
  • The Gradle syntactic sugar that magically pulls in dependencies.  I wish they just listed the dependencies needed and left it to you to include them.  I really don’t want magic in my gradle build, where some dependencies are implied, but most you need to list. I prefer less magic, a little more work, but consistency.

To Keep in Mind

I was migrating Java 8 code which had lambdas sprinkled throughout.  In at least one place things started happening out of order. I had to keep in mind that Ratpack leans towards asynchronous/reactive paradigms and that some of my lambdas where now off in other threads happening in parallel and I had to make sure they had completed before using their results.

The Bits

If you’re curious to see what the Ratpack based version of my snippets server looks like it’s in my github.