Machine Learning is a critical part of extracting value from Big Data.
Choosing proper model, preparing data and getting usable results on large
scale data is non-trivial exercise. Typically process consists of model
prototyping using higher level, (mostly) single machine based tool like R,
Matlab, Weka, then coding in Java or some other language for large scale
deployment. This process is fairly involved, error prone, slow and
Existing tools aiming at automating and improving this process are still
somewhat immature and wide scale Machine Learning enterprise adoption is
still low. Efforts are under way to address this gap i.e. to make enterprise
class Machine Learning more accessible and easier.
Spark is new, purpose-built, distributed, in-memory engine that makes it
possible to perform compute intensive jobs on commodity hardware clusters.
One of ap... (more)
( For latest information on Oracle 12c database update please refer to the
following article: Oracle 12c Database and How It Relates to SAP Hana )
RDBMSs are stable and mature products. While there is nothing radically new
on horizon that would challenge Codd's relational theory and related advances
in data processing there are some developments that force established vendors
like Oracle to come up with new features and products.
Column Stores and Oracle
Column store concept has been around for quite a while. Vendors like HP
Vertica grabbed some market share in data warehousing seg... (more)
Vertica is high-performing, advanced RDBMS that is very simple to install and
administer, thanks to the its modern design and purpose built architecture.
Once we execute all preparatory steps on database servers and download
Vertica software as per Installation Guide, we are starting installation
process on a two node cluster (host01, host02):
/opt/vertica/sbin/install_vertica -s host01,host02 -r
We initiate database creation process using dbadmin tool:
We will pick option 6 ( Configuration Menu ), then option 1 to... (more)
(Please refer to the following article: Oracle 12c In-Memory Database is Out
- Hardly Anybody Notices for update on Oracle 12c databases)
Contemporary large servers are routinely configured with 2TB of RAM. It is
thus possible to fit an entire average size OLTP database in memory directly
accessible by CPU. There is a long history of academic research on how to
best utilize relatively abundant computer memory. This research is becoming
increasingly relevant as databases serving business applications are heading
towards memory centric design and implementation.
If you simply place... (more)
Oracle database is a relational database management system that mostly
complies with ACID transaction requirements ( atomicity, consistency,
isolation, durability ). It means that each database transaction will be
executed in a reliable, safe and integral manner. In order to comply with
ACID Oracle database software implements fairly complex and expensive (in
terms of computing resources, i.e., CPU, disk, memory) set of processes like
redo and undo logging, memory latching, meta data maintenance etc. that make
concurrent work possible, while maintaining data integrity. Any databa... (more)