Machine Learning is a critical part of extracting value from Big Data.
Choosing proper model, preparing data and getting usable results on large
scale data is non-trivial exercise. Typically process consists of model
prototyping using higher level, (mostly) single machine based tool like R,
Matlab, Weka, then coding in Java or some other language for large scale
deployment. This process is fairly involved, error prone, slow and
Existing tools aiming at automating and improving this process are still
somewhat immature and wide scale Machine Learning enterprise adoption is
still low. Efforts are under way to address this gap i.e. to make enterprise
class Machine Learning more accessible and easier.
Spark is new, purpose-built, distributed, in-memory engine that makes it
possible to perform compute intensive jobs on commodity hardware clusters.
One of ap... (more)
Hadoop is designed to store extremely large volumes of data. HBase, an open
source NoSQL data store, makes it possible to randomly access such large data
sets. HBase is included in Cloudera's Hadoop distribution.
One of the major obstacles to a wider adoption of NoSQL databases is the lack
of query languages, i.e., lack of comprehensive non-programmatic interfaces
to data inside NoSQL data store. We expect NoSQL databases to come up with
such query languages in near future. In meantime, Quest's Toad for Cloud
fills this gap and makes it easy to seamlessly access NoSQL, Cloud and ... (more)
Data processing power is likely to continue growing. Are contemporary IT
development methods, processes and procurement practices properly positioned
to take advantage of increasing capabilities?
CPU/Memory/Storage can today be provisioned in a few clicks. Limitations
presented by processing power and physical infrastructure will continue to be
of less importance, as was the case in the past.
We are gradually coming close to a situation where constraints determining
present corporate IT standards are not an issue any more. For example, it is
current practice that OLTP and analyti... (more)
The traditional way of performing backups includes using Oracle RMAN in
combination with media management layer software ( typically Netbackup,
Tivoli or similar ), which writes backup data to remote robotic tape unit.
Tapes are then stored offsite to a secure location. It is well known fact
that tape media poses certain challenges in reliability and physical
Cloud-based backups' main attraction is that they are inherently disk based,
always accessible, offsite and there are no capex expenditures. All tape
related costs are thus eliminated. On the other hand ... (more)
DR sites are typically built as an exact replica of the primary site.
Application and database software is installed on DR site and sits there
mostly unused, waiting for a disaster to happen. DR site is very expensive
proposition than only large companies are able to afford. Amazon AWS is an
interesting alternative to having your own DR site.
Oracle databases on the DR side are in Data Guard configuration with a
primary site and actively apply archive log files shipped from there. Pay per
use, scalable Amazon Cloud model makes it an attractive alternative to
creating and maintai... (more)