Typical Oracle VLDB is multi terabyte megalith running on big, expensive
hardware. It is hard or impossible to back up, adding or modifying columns
can take days and query optimization is very difficult. Database sharding is
a well known method of breaking up a large database into smaller, manageable
pieces ( database shards ). It is data warehouses i.e. VLDBs that can best
take advantage of AWS database sharding capabilities. Basic premise is:
manage huge volume of data by splitting it into multiple databases instead of
creating table partitions.
Database sharding provides a method for scalability across independent
servers, each with their own CPU, memory and disk. A database shard is a
horizontal partition in a database. AWS quick instance
creation/decommissioning capabilities make it quite easy to implement
database sharding in very flexible fashion.
Fact tab... (more)
Machine Learning is a critical part of extracting value from Big Data.
Choosing proper model, preparing data and getting usable results on large
scale data is non-trivial exercise. Typically process consists of model
prototyping using higher level, (mostly) single machine based tool like R,
Matlab, Weka, then coding in Java or some other language for large scale
deployment. This process is fairly involved, error prone, slow and
Existing tools aiming at automating and improving this process are still
somewhat immature and wide scale Machine Learning enterprise adopti... (more)
Oracle database upgrades are a stressful exercise. Normally you need to
backup production database, upgrade database software, then run database
scripts that will upgrade dictionary. Once it is all done and success is
confirmed you can start using upgraded database. If something goes wrong then
you rely on backups to restore to a previous state. In other words, there
is complete dependency on backups and restore success if things go wrong.
AWS gives us possibility to quickly and cheaply create new database instances
where an upgrade can be tested. We can also instantly create com... (more)
Hadoop and AWS are enterprise ready cloud computing, distributed
It is straightforward to add more DataNodes i.e. storage to Hadoop cluster on
AWS. You just need to create another AWS instance and add a new node to
Hadoop cluster. Hadoop will take care of balancing storage to keep level of
file system utilization across DataNodes as even as possible.
Cloudera's distribution of Hadoop includes Cloudera Manager which makes it
simple to install Hadoop and add new nodes to it. Screenshot below shows an
existing HDFS service with two DataNodes. We will expand HDFS by a... (more)
(Please refer to the following article: Oracle 12c In-Memory Database is Out
- Hardly Anybody Notices for update on Oracle 12c databases)
Contemporary large servers are routinely configured with 2TB of RAM. It is
thus possible to fit an entire average size OLTP database in memory directly
accessible by CPU. There is a long history of academic research on how to
best utilize relatively abundant computer memory. This research is becoming
increasingly relevant as databases serving business applications are heading
towards memory centric design and implementation.
If you simply place... (more)