Big Data refers to the massive volume of data in variety of formats and is coming in with high velocity.
- What is Big Data?
Data keeps growing over many years but it is now exploding after social media tools like Facebook, Twitter, mobile phones, and sensors like RFID started to pour in data.
- Why You Need to Care about Big Data?
Big data includes crucial information for many customer-centric applications. For example, customers’ social data would affects their purchasing behavior and loyalty. The data collected for machine statuses are needed to improve the operational efficiency. Data on the Internet, structured or unstructured is more useful now for security analysis/fraud detection, risk management, and data warehouse analytics. To get a complete view of data, there is a need to discover and access Big Data and the data has to be correct, up-to-date, and in high quality.
- What is Hadoop?
Hadoop is a Java-based distributed file system build from an open source project in Apache. The system is based on the map-reduce technology. The key benefits of Hadoop are the parallel processing of massive data in variety of formats (structured/unstructured) and the usage of commodity hardware. - Where to Start with Big Data?
The action starts with the business opportunity evaluation and research on the technology. An important thing is to make sure Big Data is combining with traditional transactional data to bring values. Due to the volume, visualization and reports will plan an important role.
- Storage Due to the high volume and variety of data formats, the popular big storage are either Hadoop-based distributed storage systems (Cloudra, MapR, HortonWorks, HIVE) or the NoSQL databases (MongoDB, Cassandra, Couchbase).
- Processing To handle high volume of data with high velocity, the data processing needs to be distributed and in real-time. The technology include Storm, Flume, Kafka.
- Analytics Splunk, R-Project
- Big Data University (IBM)