org.hbase.async
Class HBaseClient

java.lang.Object
  extended by org.hbase.async.HBaseClient

public final class HBaseClient
extends Object

A fully asynchronous, thread-safe, modern HBase client.

Unlike the traditional HBase client (HTable), this client should be instantiated only once. You can use it with any number of tables at the same time. The only case where you should have multiple instances is when you want to use multiple different clusters at the same time.

If you play by the rules, this client is (in theory :D) completely thread-safe. Read the documentation carefully to know what the requirements are for this guarantee to apply.

This client is fully non-blocking, any blocking operation will return a Deferred instance to which you can attach a Callback chain that will execute when the asynchronous operation completes.

Note regarding HBaseRpc instances passed to this class

Every HBaseRpc passed to a method of this class should not be changed or re-used until the Deferred returned by that method calls you back. Changing or re-using any HBaseRpc for an RPC in flight will lead to unpredictable results and voids your warranty.

Data Durability

Some methods or RPC types take a durable argument. When an edit requests to be durable, the success of the RPC guarantees that the edit is safely and durably stored by HBase and won't be lost. In case of server failures, the edit won't be lost although it may become momentarily unavailable. Setting the durable argument to false makes the operation complete faster (and puts a lot less strain on HBase), but removes this durability guarantee. In case of a server failure, the edit may (or may not) be lost forever. When in doubt, leave it to true (or use the corresponding method that doesn't accept a durable argument as it will default to true). Setting it to false is useful in cases where data-loss is acceptable, e.g. during batch imports (where you can re-run the whole import in case of a failure), or when you intend to do statistical analysis on the data (in which case some missing data won't affect the results as long as the data loss caused by machine failures preserves the distribution of your data, which depends on how you're building your row keys and how you're using HBase, so be careful).

Bear in mind that this durability guarantee holds only once the RPC has completed successfully. Any edit temporarily buffered on the client side or in-flight will be lost if the client itself crashes. You can control how much buffering is done by the client by using setFlushInterval(short) and you can force-flush the buffered edits by calling flush(). When you're done using HBase, you must not just give up your reference to your HBaseClient, you must shut it down gracefully by calling shutdown(). If you fail to do this, then all edits still buffered by the client will be lost.

NOTE: This entire section assumes that you use a distributed file system that provides HBase with the required durability semantics. If you use HDFS, make sure you have a version of HDFS that provides HBase the necessary API and semantics to durability store its data.

throws clauses

None of the asynchronous methods in this API are expected to throw an exception. But the Deferred object they return to you can carry an exception that you should handle (using "errbacks", see the javadoc of Deferred). In order to be able to do proper asynchronous error handling, you need to know what types of exceptions you're expected to face in your errbacks. In order to document that, the methods of this API use javadoc's @throws to spell out the exception types you should handle in your errback. Asynchronous exceptions will be indicated as such in the javadoc with "(deferred)".

For instance, if a method foo pretends to throw an UnknownScannerException and returns a Deferred<Whatever>, then you should use the method like so:

   HBaseClient client = ...;
   Deferred<Whatever> d = client.foo();
   d.addCallbacks(new Callback<Whatever, SomethingElse>() {
     SomethingElse call(Whatever arg) {
       LOG.info("Yay, RPC completed successfully!");
       return new SomethingElse(arg.getWhateverResult());
     }
     String toString() {
       return "handle foo response";
     }
   },
   new Callback<Exception, Object>() {
     Object call(Exception arg) {
       if (arg instanceof UnknownScannerException) {
         LOG.error("Oops, we used the wrong scanner?", arg);
         return otherAsyncOperation();  // returns a Deferred<Blah>
       }
       LOG.error("Sigh, the RPC failed and we don't know what to do", arg);
       return arg;  // Pass on the error to the next errback (if any);
     }
     String toString() {
       return "foo errback";
     }
   });
 
This code calls foo, and upon successful completion transforms the result from a Whatever to a SomethingElse (which will then be given to the next callback in the chain, if any). When there's a failure, the errback is called instead and it attempts to handle a particular type of exception by retrying the operation differently.


Field Summary
static byte[] EMPTY_ARRAY
          An empty byte array you can use.
 
Constructor Summary
HBaseClient(String quorum_spec)
          Constructor.
HBaseClient(String quorum_spec, String base_path)
          Constructor.
HBaseClient(String quorum_spec, String base_path, ClientSocketChannelFactory channel_factory)
          Constructor for advanced users with special needs.
HBaseClient(String quorum_spec, String base_path, Executor executor)
          Constructor for advanced users with special needs.
 
Method Summary
 Deferred<Boolean> atomicCreate(PutRequest edit)
          Atomically insert a new cell in HBase.
 Deferred<Long> atomicIncrement(AtomicIncrementRequest request)
          Atomically and durably increments a value in HBase.
 Deferred<Long> atomicIncrement(AtomicIncrementRequest request, boolean durable)
          Atomically increments a value in HBase.
 Deferred<Long> bufferAtomicIncrement(AtomicIncrementRequest request)
          Buffers a durable atomic increment for coalescing.
 Deferred<Boolean> compareAndSet(PutRequest edit, byte[] expected)
          Atomic Compare-And-Set (CAS) on a single cell.
 Deferred<Boolean> compareAndSet(PutRequest edit, String expected)
          Atomic Compare-And-Set (CAS) on a single cell.
 long contendedMetaLookupCount()
          Deprecated. This method will be removed in release 2.0. Use stats().contendedMetaLookups() instead.
 Deferred<Object> delete(DeleteRequest request)
          Deletes data from HBase.
 Deferred<Object> ensureTableExists(byte[] table)
          Ensures that a given table really exists.
 Deferred<Object> ensureTableExists(String table)
          Ensures that a given table really exists.
 Deferred<Object> ensureTableFamilyExists(byte[] table, byte[] family)
          Ensures that a given table/family pair really exists.
 Deferred<Object> ensureTableFamilyExists(String table, String family)
          Ensures that a given table/family pair really exists.
 Deferred<Object> flush()
          Flushes to HBase any buffered client-side write operation.
 Deferred<ArrayList<KeyValue>> get(GetRequest request)
          Retrieves data from HBase.
 short getFlushInterval()
          Returns the maximum time (in milliseconds) for which edits can be buffered.
 int getIncrementBufferSize()
          Returns the capacity of the increment buffer.
 Timer getTimer()
          Returns the timer used by this client.
 Deferred<RowLock> lockRow(RowLockRequest request)
          Acquires an explicit row lock.
 Scanner newScanner(byte[] table)
          Creates a new Scanner for a particular table.
 Scanner newScanner(String table)
          Creates a new Scanner for a particular table.
 Deferred<Object> put(PutRequest request)
          Stores data in HBase.
 long rootLookupCount()
          Deprecated. This method will be removed in release 2.0. Use stats().rootLookups() instead.
 short setFlushInterval(short flush_interval)
          Sets the maximum time (in milliseconds) for which edits can be buffered.
 int setIncrementBufferSize(int increment_buffer_size)
          Changes the size of the increment buffer.
 Deferred<Object> shutdown()
          Performs a graceful shutdown of this instance.
 ClientStats stats()
          Returns a snapshot of usage statistics for this client.
 long uncontendedMetaLookupCount()
          Deprecated. This method will be removed in release 2.0. Use stats().uncontendedMetaLookups() instead.
 Deferred<Object> unlockRow(RowLock lock)
          Releases an explicit row lock.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

EMPTY_ARRAY

public static final byte[] EMPTY_ARRAY
An empty byte array you can use. This can be useful for instance with Scanner.setStartKey(byte[]) and Scanner.setStopKey(byte[]).

Constructor Detail

HBaseClient

public HBaseClient(String quorum_spec)
Constructor.

Parameters:
quorum_spec - The specification of the quorum, e.g. "host1,host2,host3".

HBaseClient

public HBaseClient(String quorum_spec,
                   String base_path)
Constructor.

Parameters:
quorum_spec - The specification of the quorum, e.g. "host1,host2,host3".
base_path - The base path under which is the znode for the -ROOT- region.

HBaseClient

public HBaseClient(String quorum_spec,
                   String base_path,
                   Executor executor)
Constructor for advanced users with special needs.

NOTE: Only advanced users who really know what they're doing should use this constructor. Passing an inappropriate thread pool, or blocking its threads will prevent this HBaseClient from working properly or lead to poor performance.

Parameters:
quorum_spec - The specification of the quorum, e.g. "host1,host2,host3".
base_path - The base path under which is the znode for the -ROOT- region.
executor - The executor from which to obtain threads for NIO operations. It is strongly encouraged to use a Executors.newCachedThreadPool() or something equivalent unless you're sure to understand how Netty creates and uses threads. Using a fixed-size thread pool will not work the way you expect.

Note that calling shutdown() on this client will NOT shut down the executor.

Since:
1.2
See Also:
NioClientSocketChannelFactory

HBaseClient

public HBaseClient(String quorum_spec,
                   String base_path,
                   ClientSocketChannelFactory channel_factory)
Constructor for advanced users with special needs.

Most users don't need to use this constructor.

Parameters:
quorum_spec - The specification of the quorum, e.g. "host1,host2,host3".
base_path - The base path under which is the znode for the -ROOT- region.
channel_factory - A custom factory to use to create sockets.

Note that calling shutdown() on this client will also cause the shutdown and release of the factory and its underlying thread pool.

Since:
1.2
Method Detail

stats

public ClientStats stats()
Returns a snapshot of usage statistics for this client.

Since:
1.3

flush

public Deferred<Object> flush()
Flushes to HBase any buffered client-side write operation.

Returns:
A Deferred, whose callback chain will be invoked when everything that was buffered at the time of the call has been flushed.

Note that this doesn't guarantee that ALL outstanding RPCs have completed. This doesn't introduce any sort of global sync point. All it does really is it sends any buffered RPCs to HBase.


setFlushInterval

public short setFlushInterval(short flush_interval)
Sets the maximum time (in milliseconds) for which edits can be buffered.

This interval will be honored on a "best-effort" basis. Edits can be buffered for longer than that due to GC pauses, the resolution of the underlying timer, thread scheduling at the OS level (particularly if the OS is overloaded with concurrent requests for CPU time), any low-level buffering in the TCP/IP stack of the OS, etc.

Setting a longer interval allows the code to batch requests more efficiently but puts you at risk of greater data loss if the JVM or machine was to fail. It also entails that some edits will not reach HBase until a longer period of time, which can be troublesome if you have other applications that need to read the "latest" changes.

Setting this interval to 0 disables this feature.

The change is guaranteed to take effect at most after a full interval has elapsed, using the previous interval (which is returned).

Parameters:
flush_interval - A positive time interval in milliseconds.
Returns:
The previous flush interval.
Throws:
IllegalArgumentException - if flush_interval < 0.

setIncrementBufferSize

public int setIncrementBufferSize(int increment_buffer_size)
Changes the size of the increment buffer.

NOTE: because there is no way to resize the existing buffer, this method will flush the existing buffer and create a new one. This side effect might be unexpected but is unfortunately required.

This determines the maximum number of counters this client will keep in-memory to allow increment coalescing through bufferAtomicIncrement(org.hbase.async.AtomicIncrementRequest).

The greater this number, the more memory will be used to buffer increments, and the more efficient increment coalescing can be if you have a high-throughput application with a large working set of counters.

If your application has excessively large keys or qualifiers, you might consider using a lower number in order to reduce memory usage.

Parameters:
increment_buffer_size - The new size of the buffer.
Returns:
The previous size of the buffer.
Throws:
IllegalArgumentException - if increment_buffer_size < 0.
Since:
1.3

getTimer

public Timer getTimer()
Returns the timer used by this client.

All timeouts, retries and other things that need to "sleep asynchronously" use this timer. This method is provided so that you can also schedule your own timeouts using this timer, if you wish to share this client's timer instead of creating your own.

The precision of this timer is implementation-defined but is guaranteed to be no greater than 20ms.

Since:
1.2

getFlushInterval

public short getFlushInterval()
Returns the maximum time (in milliseconds) for which edits can be buffered.

The default value is an unspecified and implementation dependant, but is guaranteed to be non-zero.

A return value of 0 indicates that edits are sent directly to HBase without being buffered.

See Also:
setFlushInterval(short)

getIncrementBufferSize

public int getIncrementBufferSize()
Returns the capacity of the increment buffer.

Note this returns the capacity of the buffer, not the number of items currently in it. There is currently no API to get the current number of items in it.

Since:
1.3

shutdown

public Deferred<Object> shutdown()
Performs a graceful shutdown of this instance.

Not calling this method before losing the last reference to this instance may result in data loss and other unwanted side effects

Returns:
A Deferred, whose callback chain will be invoked once all of the above have been done. If this callback chain doesn't fail, then the clean shutdown will be successful, and all the data will be safe on the HBase side (provided that you use durable edits). In case of a failure (the "errback" is invoked) you may want to retry the shutdown to avoid losing data, depending on the nature of the failure. TODO(tsuna): Document possible / common failure scenarios.

ensureTableFamilyExists

public Deferred<Object> ensureTableFamilyExists(String table,
                                                String family)
Ensures that a given table/family pair really exists.

It's recommended to call this method in the startup code of your application if you know ahead of time which tables / families you're going to need, because it'll allow you to "fail fast" if they're missing.

Both strings are assumed to use the platform's default charset.

Parameters:
table - The name of the table you intend to use.
family - The column family you intend to use in that table.
Returns:
A deferred object that indicates the completion of the request. The Object has not special meaning and can be null (think of it as Deferred<Void>). But you probably want to attach at least an errback to this Deferred to handle failures.
Throws:
TableNotFoundException - (deferred) if the table doesn't exist.
NoSuchColumnFamilyException - (deferred) if the family doesn't exist.

ensureTableFamilyExists

public Deferred<Object> ensureTableFamilyExists(byte[] table,
                                                byte[] family)
Ensures that a given table/family pair really exists.

It's recommended to call this method in the startup code of your application if you know ahead of time which tables / families you're going to need, because it'll allow you to "fail fast" if they're missing.

Parameters:
table - The name of the table you intend to use.
family - The column family you intend to use in that table.
Returns:
A deferred object that indicates the completion of the request. The Object has not special meaning and can be null (think of it as Deferred<Void>). But you probably want to attach at least an errback to this Deferred to handle failures.
Throws:
TableNotFoundException - (deferred) if the table doesn't exist.
NoSuchColumnFamilyException - (deferred) if the family doesn't exist.

ensureTableExists

public Deferred<Object> ensureTableExists(String table)
Ensures that a given table really exists.

It's recommended to call this method in the startup code of your application if you know ahead of time which tables / families you're going to need, because it'll allow you to "fail fast" if they're missing.

Parameters:
table - The name of the table you intend to use. The string is assumed to use the platform's default charset.
Returns:
A deferred object that indicates the completion of the request. The Object has not special meaning and can be null (think of it as Deferred<Void>). But you probably want to attach at least an errback to this Deferred to handle failures.
Throws:
TableNotFoundException - (deferred) if the table doesn't exist.

ensureTableExists

public Deferred<Object> ensureTableExists(byte[] table)
Ensures that a given table really exists.

It's recommended to call this method in the startup code of your application if you know ahead of time which tables / families you're going to need, because it'll allow you to "fail fast" if they're missing.

Parameters:
table - The name of the table you intend to use.
Returns:
A deferred object that indicates the completion of the request. The Object has not special meaning and can be null (think of it as Deferred<Void>). But you probably want to attach at least an errback to this Deferred to handle failures.
Throws:
TableNotFoundException - (deferred) if the table doesn't exist.

get

public Deferred<ArrayList<KeyValue>> get(GetRequest request)
Retrieves data from HBase.

Parameters:
request - The get request.
Returns:
A deferred list of key-values that matched the get request.

newScanner

public Scanner newScanner(byte[] table)
Creates a new Scanner for a particular table.

Parameters:
table - The name of the table you intend to scan.
Returns:
A new scanner for this table.

newScanner

public Scanner newScanner(String table)
Creates a new Scanner for a particular table.

Parameters:
table - The name of the table you intend to scan. The string is assumed to use the platform's default charset.
Returns:
A new scanner for this table.

atomicIncrement

public Deferred<Long> atomicIncrement(AtomicIncrementRequest request)
Atomically and durably increments a value in HBase.

This is equivalent to atomicIncrement (request, true)

Parameters:
request - The increment request.
Returns:
The deferred long value that results from the increment.

bufferAtomicIncrement

public Deferred<Long> bufferAtomicIncrement(AtomicIncrementRequest request)
Buffers a durable atomic increment for coalescing.

This increment will be held in memory up to the amount of time allowed by getFlushInterval() in order to allow the client to coalesce increments.

Increment coalescing can dramatically reduce the number of RPCs and write load on HBase if you tend to increment multiple times the same working set of counters. This is very common in user-facing serving systems that use HBase counters to keep track of user actions.

If client-side buffering is disabled (getFlushInterval() returns 0) then this function has the same effect as calling atomicIncrement(AtomicIncrementRequest) directly.

Parameters:
request - The increment request.
Returns:
The deferred long value that results from the increment.
Since:
1.3, 1.4 This method works with negative increment values.

atomicIncrement

public Deferred<Long> atomicIncrement(AtomicIncrementRequest request,
                                      boolean durable)
Atomically increments a value in HBase.

Parameters:
request - The increment request.
durable - If true, the success of this RPC guarantees that HBase has stored the edit in a durable fashion. When in doubt, use atomicIncrement(AtomicIncrementRequest).
Returns:
The deferred long value that results from the increment.

put

public Deferred<Object> put(PutRequest request)
Stores data in HBase.

Note that this provides no guarantee as to the order in which subsequent put requests are going to be applied to the backend. If you need ordering, you must enforce it manually yourself by starting the next put once the Deferred of this one completes successfully.

Parameters:
request - The put request.
Returns:
A deferred object that indicates the completion of the request. The Object has not special meaning and can be null (think of it as Deferred<Void>). But you probably want to attach at least an errback to this Deferred to handle failures. TODO(tsuna): Document failures clients are expected to handle themselves.

compareAndSet

public Deferred<Boolean> compareAndSet(PutRequest edit,
                                       byte[] expected)
Atomic Compare-And-Set (CAS) on a single cell.

Note that edits sent through this method cannot be batched, and won't be subject to the flush interval. This entails that write throughput will be lower with this method as edits have to be sent out to the wire one by one.

This request enables you to atomically update the value of an existing cell in HBase using a CAS operation. It's like a PutRequest except that you also pass an expected value. If the last version of the cell identified by your PutRequest matches the expected value, HBase will atomically update it to the new value.

If the expected value is the empty byte array, HBase will atomically create the cell provided that it doesn't exist already. This can be used to ensure that your RPC doesn't overwrite an existing value. Note however that this trick cannot be used the other way around to delete an expected value atomically.

Parameters:
edit - The new value to write.
expected - The expected value of the cell to compare against. This byte array will NOT be copied.
Returns:
A deferred boolean, if true the CAS succeeded, otherwise the CAS failed because the value in HBase didn't match the expected value of the CAS request.
Since:
1.3

compareAndSet

public Deferred<Boolean> compareAndSet(PutRequest edit,
                                       String expected)
Atomic Compare-And-Set (CAS) on a single cell.

Note that edits sent through this method cannot be batched.

Parameters:
edit - The new value to write.
expected - The expected value of the cell to compare against. This string is assumed to use the platform's default charset.
Returns:
A deferred boolean, if true the CAS succeeded, otherwise the CAS failed because the value in HBase didn't match the expected value of the CAS request.
Since:
1.3
See Also:
compareAndSet(PutRequest, byte[])

atomicCreate

public Deferred<Boolean> atomicCreate(PutRequest edit)
Atomically insert a new cell in HBase.

Note that edits sent through this method cannot be batched.

This is equivalent to calling compareAndSet(edit, EMPTY_ARRAY)

Parameters:
edit - The new value to insert.
Returns:
A deferred boolean, true if the edit got atomically inserted in HBase, false if there was already a value in the given cell.
Since:
1.3
See Also:
compareAndSet(PutRequest, byte[])

lockRow

public Deferred<RowLock> lockRow(RowLockRequest request)
Acquires an explicit row lock.

For a description of what row locks are, see RowLock.

Parameters:
request - The request specify which row to lock.
Returns:
a deferred RowLock.
See Also:
unlockRow(org.hbase.async.RowLock)

unlockRow

public Deferred<Object> unlockRow(RowLock lock)
Releases an explicit row lock.

For a description of what row locks are, see RowLock.

Parameters:
lock - The lock to release.
Returns:
A deferred object that indicates the completion of the request. The Object has not special meaning and can be null (think of it as Deferred<Void>).

delete

public Deferred<Object> delete(DeleteRequest request)
Deletes data from HBase.

Parameters:
request - The delete request.
Returns:
A deferred object that indicates the completion of the request. The Object has not special meaning and can be null (think of it as Deferred<Void>). But you probably want to attach at least an errback to this Deferred to handle failures.

rootLookupCount

@Deprecated
public long rootLookupCount()
Deprecated. This method will be removed in release 2.0. Use stats().rootLookups() instead.

Returns how many lookups in -ROOT- were performed.

This number should remain low. It will be 1 after the first access to HBase, and will increase by 1 each time the .META. region moves to another server, which should seldom happen.

This isn't to be confused with the number of times we looked up where the -ROOT- region itself is located. This happens even more rarely and a message is logged at the INFO whenever it does.

Since:
1.1

uncontendedMetaLookupCount

@Deprecated
public long uncontendedMetaLookupCount()
Deprecated. This method will be removed in release 2.0. Use stats().uncontendedMetaLookups() instead.

Returns how many lookups in .META. were performed (uncontended).

This number indicates how many times we had to lookup in .META. where a key was located. This only counts "uncontended" lookups, where the thread was able to acquire a "permit" to do a .META. lookup. The majority of the .META. lookups should fall in this category.

Since:
1.1

contendedMetaLookupCount

@Deprecated
public long contendedMetaLookupCount()
Deprecated. This method will be removed in release 2.0. Use stats().contendedMetaLookups() instead.

Returns how many lookups in .META. were performed (contended).

This number indicates how many times we had to lookup in .META. where a key was located. This only counts "contended" lookups, where the thread was unable to acquire a "permit" to do a .META. lookup, because there were already too many .META. lookups in flight. In this case, the thread was delayed a bit in order to apply a bit of back-pressure on the caller, to avoid creating .META. storms. The minority of the .META. lookups should fall in this category.

Since:
1.1