Hydra is a distributed data processing and storage system originally developed at AddThis. It ingests streams of data (think log files) and builds trees that are aggregates, summaries, or transformations of the data. These trees can be used by humans to explore (tiny queries), as part of a machine learning pipeline (big queries), or to support live consoles on websites (lots of queries).

You can run hydra from the command line to slice and dice that Apache access log you have sitting around (or that mysterious gargantuan spreadsheet). Or if terabytes per day is your cup of tea run a Hydra Cluster that supports your job with resource sharing, job management, distributed backups, data partitioning, and efficient bulk file transfer.

