Since Hydra tasks can be arbitrary programs there is no single architecture they have to follow. However, most treebuilder & split jobs will behave summarily regardless of sources or sinks.

A single SourceReader thread is the keystone. It polls a queue for new bundles and puts them on an outbound queue for processor threads. It’s possible that there may be several threads pre-fetching, decompressing, or do other work to make data available to the SourceReader.

Processor threads do the actual “work” of running filters before passing the bundles to an outbound sink queue. This queue will also likely have multiple threads draining from it (for example, to compress and write out bundles to files).

If profiling, the processor threads are usually a good place to start:
  • If they are blocked waiting for data, the source is not pulling new data in fast enough.
  • If they are blocked on the sink (or doing work for the sink; a “caller runs” policy is typically used), the sink is not draining fast enough.
  • If they are running all the time applying filters the job may benefit from using more threads.

Table Of Contents

Previous topic

Maven Modules

Next topic

Code Formatting & Development

This Page