CloudPelican

CloudPelican is the name of an operational / business intelligence software package that analyzes large amounts of data (commonly referred to as Big data). Typically this data is log files from servers. The main goal of the software is to reduce the Total Cost of Ownership involved in running cloud environments.
Development
The software is currently still being developed. The CloudPelican website states that there are some users running alpha versions, however these are not publicly available.
Milestones
# Release of public beta version that has all core features (index, search, alert) in it. Minimal web interface;
# Release of web interface version that supports all core features;
# Release of centralized management console in interface. Used for changing configurations on all nodes from one single interface;
# Release of Chrome app in the Chrome_Web_Store;
# Release of mobile web application that supports modern Smartphone;
# Release of CloudPelican v1.0
Logical modules
Different modules of CloudPelican take care of separate logic.
Readers
In parallel lots of files are being monitored and divided into raw events. These raw events are nothing more than plain text put in a queue for further processing.
Processors
The raw input of the "Readers" is being fed into the processors that apply logic to it from configuration settings. Events that do not match any whitelist, or match a blacklist, will be dropped. On the remaining events date and other parameters for identifying similar events are being gathered based on the source of the raw event (e.g. webserver log).
Indexers
The events from the "Processors" are ready for storage and will be put in an ElasticSearch index. Depending on the configuration each bucket (certain selected time span) has a separate index.
Search
This is a gateway to the actual data indices created in ElasticSearch. This applies the real logic of finding events, aggregating them and prepare it for visualization in the web application.
Alert service
Based on configuration alerts will be send to the end-users directly on a certain event, chain of events, or other criteria. An example might be: 100 events that have a 404 error.
Guard
This takes care of all the statistics gathering, metadata storage (and recovery) and auto-scaling of the thread pools.