Machine Learning is a critical part of extracting value from Big Data.
Choosing proper model, preparing data and getting usable results on large
scale data is non-trivial exercise. Typically process consists of model
prototyping using higher level, (mostly) single machine based tool like R,
Matlab, Weka, then coding in Java or some other language for large scale
deployment. This process is fairly involved, error prone, slow and
Existing tools aiming at automating and improving this process are still
somewhat immature and wide scale Machine Learning enterprise adoption is
still low. Efforts are under way to address this gap i.e. to make enterprise
class Machine Learning more accessible and easier.
Spark is new, purpose-built, distributed, in-memory engine that makes it
possible to perform compute intensive jobs on commodity hardware clusters.
One of ap... (more)
All major relational database vendors are developing or already shipping
in-memory, columnar databases.
The next release of Oracle 12c - an in-memory, columnar database, will be
available next year. It will feature simultaneous transaction-level updates
to both row and column stores i.e. data will be stored in both formats at
the same time, in the same transaction. This is quite an improvement over SAP
Hana's awfully clumsy delta merge process ( data changes in SAP Hana are
first accumulated in delta store, then periodically merged into column store
- process which locks targe... (more)
( For latest information on Oracle 12c database update please refer to the
following article: Oracle 12c Database and How It Relates to SAP Hana )
RDBMSs are stable and mature products. While there is nothing radically new
on horizon that would challenge Codd's relational theory and related advances
in data processing there are some developments that force established vendors
like Oracle to come up with new features and products.
Column Stores and Oracle
Column store concept has been around for quite a while. Vendors like HP
Vertica grabbed some market share in data warehousing seg... (more)
Big Data and its most prominent technical ingredient, Machine Learning, are
all the rage these days, as IT industry is trying to convince companies
technology revolution is underway. ( "If you are not doing it, your
competitors sure are, and by the time you realize it, it will be too late" ).
Data fracking, i.e. Big Data, is 21 century new oil of that will power and
grease stalled industries and reignite growth, or so the story goes.
While advanced analytics (it comes under various names - predictive
analytics, data mining, and data science, more recently) is great and in use
(Please refer to the following article: Oracle 12c In-Memory, Columnar
Database & How It Relates to SAP Hana for update on IMDB/Columnar databases)
Contemporary large servers are routinely configured with 2TB of RAM. It is
thus possible to fit an entire average size OLTP database in memory directly
accessible by CPU. There is a long history of academic research on how to
best utilize relatively abundant computer memory. This research is becoming
increasingly relevant as databases serving business applications are heading
towards memory centric design and implementation.
If you si... (more)