Java


The key here is using Location (instead of Directory) for your Deny directive (otherwise your JkUnmount’d calls will not be blocked)

  SetEnvIfNoCase User-Agent "Firefox" bad_bot

  <Location "/">
    Deny from env=bad_bot
  </Location>

Amusing use of the ternary from our codebase..

log.debug( ( randomize ? "R" : "Not r" ) + "andomizing content" );

:)

I haven’t been using “traditional” singletons for a couple of years now (Spring craze and all), so the Initialization On Demand Holder idiom, which allows for lazy instantiation of singletons, has escaped me until now.

private static class LazySomethingHolder {
    public static Something something = new Something();
}

public static Something getInstance() {
    return LazySomethingHolder.something;
}

Here’s the explanation, courtesy of Bill Pugh, Brian Goetz and friends.

Found via TheServerSide.

I was writing a multi-line batch script with multiple call-outs to Ant and I was confused when it kept exiting after the first call-out.

I’m proud to be rusty at batch scripting, so I’m not ashamed to share the (obvious) fix…

To run a batch program from within the current batch program, use the call command followed by the name of the batch program you wish to run. After the second program is finished, it will return to the command which follows the call command.

If you invoke a batch file without using the call command (from within a batch file that is), control passes over to the new batch program and does not return.

If you would like Spring to do the heavy-lifting for your Factory class, you can simply inject ApplicationContext into your factory and delegate your create() methods to Spring’s getBean() methods.

public class YourServiceFactory implements ApplicationContextAware {
    private ApplicationContext context;

    public void setApplicationContext(ApplicationContext context) {
        this.context= context;
    }

    public YourService createYourService() {
        return (YourService ) context.getBean(serviceName);
    }
}

Here’s a simple recipe I’ve used over and over (wherever auto-wiring constructors was inappropriate). Validating at container startup time is certainly better than relying on NPEs at runtime.

import org.springframework.beans.factory.InitializingBean;
import org.springframework.util.Assert;

public class FooService implements InitializingBean {
    private Dependency dependency;

    public void afterPropertiesSet() throws IllegalArgumentException {
        Assert.notNull(dependency, "Dependency cannot be null");
    }

    public void setDependency(Dependency dependency) {
        this.dependency = dependency;
    }
}

As I’m working on an API that returns matrices of solutions in its responses, I’m finding Wikipedia’s definitions for some of the concepts I’m trying to express quite handy.
Methods for ordering & storing multidimensional arrays fall into the bucket, as I’ve struggled to figure out how to express a matrix in JDK 1.5 (1-D Array? 2-D Array? List of Lists? Map?).

It turns out that there are two fundamental ways to store multidimensional matrices in linear memory - Row-major & Column-major. They’re not that different from each other (in fact, they’re transpositions of each other), but it’s nice to attach a proper name to the concept.

1  2  3
4  5  6

In row-major storage, a multidimensional array in linear memory is accessed such that rows are stored one after the other. It is the approach used by the C programming language as well as many other languages, with the notable exception of Fortran.

1  2  3  4  5  6

Column-major order is a similar method of flattening arrays onto linear memory, but the columns are listed in sequence. The programming language Fortran uses column-major ordering.

1  4  2  5  3  6

While studying Lisp documentation on S-expressions and their use of prefix notation, I accidentaly picked up the proper name for the notation I have used all these years (in imperative language land) - infix notation. As always, Wikipedia definition comes in handy…

Infix notation is the common arithmetic and logical formula notation, in which operators are written infix-style between the operands they act on (e.g. 2 + 2). It is not as simple to parse by computer as prefix notation ( e.g. + 2 2 ) or postfix notation ( e.g. 2 2 + ), but many programming languages use it to take advantage of its familiarity.

In infix notation, unlike in prefix or postfix notations, parentheses surrounding groups of operands and operators are necessary to indicate the intended order in which operations are to be performed. In the absence of parentheses, certain precedence rules determine the order of operations.

One of my co-workers recently called TreeMaps “hashes”, which is imprecise at best, but can be easily excused as a by-product of his Pythonista upbringing.

java.util.TreeMap, in fact, is implemented as a Red-Black tree - a data structure that has not had a definition in my brain for years. As a refresher course, I decided to Wikipedia the B-Tree family and blog the highlights.

First, the BINARY TREE:

A binary tree is a tree data structure in which each node has at most two children.

  • A directed edge connects the parent to the child.
  • A node that has no children is called a leaf.
  • The depth of a node n is the length of the path from the root to the node. The set of all nodes at a given depth is sometimes called a level of the tree.
  • The height of a node n is the length of the path from the node n to its furthest leaf.
  • Nodes that share parents are called siblings.
  • If a path exists from node p to node q, then p is an ancestor of q and q is a descendant of p.
  • The size of a node is the number of descendants it has including itself.

Second, a BINARY SEARCH TREE:

In computer science, a binary search tree (BST) is a binary tree which has the following properties:

  • Each node has a value.
  • A total order is defined on these values.
  • The left subtree of a node contains only values less than or equal to the node’s value.
  • The right subtree of a node contains only values greater than or equal to the node’s value.

The major advantage of binary search trees is that the related sorting algorithms and search algorithms such as in-order traversal can be very efficient.

Binary search trees are a fundamental data structure used to construct more abstract data structures such as sets, multisets, and associative arrays.

If a BST allows duplicate values, then it represents a multiset. This kind of tree uses non-strict inequalities, so everything in the left subtree of a node is less than or equal to the value of the node, and everything in the right subtree is greater than or equal to the value of the node.

If a BST doesn’t allow duplicate values, then the tree represents a set with unique values, like the mathematical set. Trees without duplicate values use strict inequalities, meaning that the left subtree of a node only contains nodes with values that are less than the value of the node, and the right subtree only contains values that are greater.

Third, a SELF-BALANCING BINARY SEARCH TREE:

In computer science, a self-balancing binary search tree or height-balanced binary search tree is a binary search tree that attempts to keep its height, or the number of levels of nodes beneath the root, as small as possible at all times, automatically. They are one of the most efficient ways of implementing associative arrays, sets, and other data structures.

Most operations on a binary search tree take time directly proportional to the tree’s height, so it is desirable to keep the height small. Ordinary binary search trees have the primary disadvantage that they can attain very large heights in rather ordinary situations, such as when the keys are inserted in order. The result is a data structure similar to a linked list, making all operations on the tree expensive. If we know all the data ahead of time, we can keep the height small on average by adding values in a random order, but we don’t always have this luxury, particularly in on-line algorithms.

Self-balancing binary trees instead solve this problem by performing transformations on the tree, such as tree rotations, at key times, in order to reduce the height. Although a certain overhead is involved, it is justified in the long run by drastically decreasing the time of later operations. Note that the height must always be at least the ceiling of log n, since there are at most 2k nodes on the kth level; a complete or full binary tree has exactly this many levels. Balanced BSTs are not always so precisely balanced, since it can be expensive to keep a tree at minimum height at all times; instead, they keep the height within a constant factor of this lower bound.

Finally, a RED-BLACK TREE:

A red-black tree is a binary search tree where each node has a color attribute, the value of which is either red or black. In addition to the ordinary requirements imposed on binary search trees, we make the following additional requirements of any valid red-black tree:

  1. A node is either red or black.
  2. The root is black.
  3. All leaves are black.
  4. Both children of every red node are black. (i.e. Every red node must have a black parent.)
  5. All paths from any given node to its leaf nodes contain the same number of black nodes.

These constraints enforce a critical property of red-black trees: that the longest possible path from the root to a leaf is no more than twice as long as the shortest possible path. The result is that the tree is roughly balanced. Since operations such as inserting, deleting, and finding values requires worst-case time proportional to the height of the tree, this theoretical upper bound on the height allows red-black trees to be efficient in the worst-case, unlike ordinary binary search trees.

To see why these properties guarantee this, it suffices to note that no path can have two red nodes in a row, due to property 4. The shortest possible path has all black nodes, and the longest possible path alternates between red and black nodes. Since all maximal paths have the same number of black nodes, by property 5, this shows that no path is more than twice as long as any other path.

My two primary languages, Java and Ruby, don’t have explicit support for HOFs, but they both have some related functionality.

Let’s first look at Wikipedia’s definition of Higher Order Functions:

In mathematics and computer science, higher-order functions are functions which do at least one of the following:

  • take one or more functions as an input
  • output a function

The map function found in many functional programming languages is one example of a higher-order function. It takes a function f as an argument, and returns a new function which takes a list and applies the f to each element.

Ruby supports a HOF-like mechanism through the yield keyword. Here’s a snippet from the Bill Venners/Matz interview on Artima:

Bill Venners: Ruby supports blocks and closures. What are blocks and closures, and how are they used?

Yukihiro Matsumoto
: Blocks are basically nameless functions. You may be familiar with the lambda from other languages like Lisp or Python. Basically, you can pass a nameless function to another function, and then that function can invoke the passed-in nameless function. For example, a function could perform iteration by passing one item at a time to the nameless function. This is a common style, called higher order function style, among languages that can handle functions as first class objects. Lisp does it. Python does it .Even C does it with function pointers. Many other languages do this style of programming. In Ruby, the difference is mainly a different kind of syntax for higher order functions. In other languages, you have to specify explicitly that a function can accept another function as an argument. But in Ruby, any method can be called with a block as an implicit argument. Inside the method, you can call the block using the yield keyword with a value.

Apache has an effort under way to bring partial HOF support to the Java world, the Commons Functor project. The project looks pretty interesting, but might end up being pretty verbose to use. If its usability is similar to that of the CollectionUtils/Predicate functionality in the Apache Commons, I’m willing to give it a try. Either way, I hope Commons Functor comes of the Sandbox soon and starts playing with the big boys.

Keeping tabs on the latest continuations debate:

I have a bunch of disorganized thoughts on the subject, roughly along these lines:

  • whether we use them for the web or not, continuations make for a great feature in any language
  • AJAX is great for any interaction that doesn’t need to be bookmarked
  • REST is great for any interaction that does need to be bookmarked
  • carrying state in hidden parameters (Meggison’s idea) would be an error-prone way to solve a lot of use cases I’m familiar with
  • continuations can have a major impact on an application’s memory footprint
  • deserializing continuation state can be tricky when dealing with versioned objects

I was curious why “disabling nagle’s” to improve performance is such a popular fix these days…

Nagle’s algorithm is a means of improving the efficiency of TCP/IP networks by reducing the number of packets that need to be sent over the network. It is named after John Nagle, then at Ford Aerospace and lately at Animats.

Nagle’s document, Congestion Control in IP/TCP Internetworks (RFC896) describes what he called the ’small packet problem’, where an application repeatedly emits data in small chunks, frequently only 1 byte in size. Since TCP packets have a 40 byte header, this results in a 41 byte packet for 1 byte of useful information, a huge overhead. This situation occurs in Telnet sessions, where keypresses generate a single byte of data which is transmitted immediately. Worse, over slow links, many such packets can be in transit at the same time, potentially leading to congestion collapse.

The Nagle algorithm works by coalescing a number of small outgoing messages, and sending them all at once. Specifically, as long as there is a sent packet for which the sender has received no acknowledgement, the sender should keep buffering its output until it has a full packet’s worth of output, so that output can be sent all at once.

In the Java world, java.net.Socket.setTcpNoDelay() is used to enable/disable TCP_NODELAY which disables/enables Nagle’s algorithm.

Next Page »