Blog Archives

What's is Big Data?

11/17/2015

Big Data refers to the massive volume of data in variety of formats and is coming in with high velocity.

What is Big Data?
Data keeps growing over many years but it is now exploding after social media tools like Facebook, Twitter, mobile phones, and sensors like RFID started to pour in data.

Well, there is a lot of data, but why do I need to care about it?

Why You Need to Care about Big Data?
Big data includes crucial information for many customer-centric applications. For example, customers’ social data would affects their purchasing behavior and loyalty. The data collected for machine statuses are needed to improve the operational efficiency. Data on the Internet, structured or unstructured is more useful now for security analysis/fraud detection, risk management, and data warehouse analytics. To get a complete view of data, there is a need to discover and access Big Data and the data has to be correct, up-to-date, and in high quality.

Due to its volume, formats and velocity, it’s difficult for traditional database and software to deal with Big Data. Then, this is where new technology comes in to help and build new information infrastructure to embrace Big Data. These include the distributed file system, big data analytic tools and many others.

What is Hadoop?
Hadoop is a Java-based distributed file system build from an open source project in Apache. The system is based on the map-reduce technology. The key benefits of Hadoop are the parallel processing of massive data in variety of formats (structured/unstructured) and the usage of commodity hardware.
Where to Start with Big Data?
The action starts with the business opportunity evaluation and research on the technology. An important thing is to make sure Big Data is combining with traditional transactional data to bring values. Due to the volume, visualization and reports will plan an important role.

The key technology in Big Data includes the followings:

Storage Due to the high volume and variety of data formats, the popular big storage are either Hadoop-based distributed storage systems (Cloudra, MapR, HortonWorks, HIVE) or the NoSQL databases (MongoDB, Cassandra, Couchbase).
Processing To handle high volume of data with high velocity, the data processing needs to be distributed and in real-time. The technology include Storm, Flume, Kafka.
Analytics Splunk, R-Project

Resources

Big Data University (IBM)

0 Comments

GoldenGate for Big Data

Oracle GoldenGate for Big Data enables streaming transactional data from relational databases into Big Data systems in real time. The functionality integrates the critical OLTP data to big data to gain complete business insights.

This blog discusses features and best practices of Oracle GoldenGate for Big Data.

Links
- What is GoldenGate for Big Data?
- Documenation (12.3.1.1)
- Oracle University Training

Buzzwords

Lamda DataFlow
HBase HIVE Flume Kafka
HDFS Real-Time Native Kerbero Spark Kappa

What's is Big Data?

GoldenGate for Big Data

Archives

Categories