Hadoop and AWS are enterprise ready cloud computing, distributed
It is straightforward to add more DataNodes i.e. storage to Hadoop cluster on
AWS. You just need to create another AWS instance and add a new node to
Hadoop cluster. Hadoop will take care of balancing storage to keep level of
file system utilization across DataNodes as even as possible.
Cloudera's distribution of Hadoop includes Cloudera Manager which makes it
simple to install Hadoop and add new nodes to it. Screenshot below shows an
existing HDFS service with two DataNodes. We will expand HDFS by adding a
third DataNode to it:
Once we click Add button new host can be picked from the list of available
servers ( in this case it is server ip-10-0-0-40 ):
New server is now DataNode-3, i.e. it is part of our Hadoop cluster.
New DataNode-3 still does not contain any data. Hadoop Balancer ... (more)
Big shift towards Cloud environment has started. It is now clear that this
change is similar in magnitude to the shift from mainframe to client-server
computing two decades ago.
Amazon Web Services is the pioneer and market leader in Cloud computing
space. Other vendors are playing catch up and do not come close to the
breadth and scale of AWS offerings. Services and features Amazon provides are
quite extensive and cover many of the enterprise-class computing needs. APIs
and command line interfaces are available for each service, which makes
scripting and automation achievable. ... (more)
Sqoop makes it very easy to transfer data between Oracle and Hadoop using a
single command. The reason why we would want to import data from an Oracle
database into Hadoop/Hive is that we might want to join Hive tables with
Oracle lookup tables, or other data residing in Oracle database.
Data originating from an Oracle database can help better understand and
analyze raw, more granular data contained in Hive/HDFS.
Sqoop uses JDBC driver to connect to an Oracle database. If you have a
table results in your Oracle database and want data from it to be imported
to Hadoop HDFS ( Hadoop... (more)
( For latest information on Oracle 12c database update please refer to the
following article: Oracle 12c Database and How It Relates to SAP Hana )
RDBMSs are stable and mature products. While there is nothing radically new
on horizon that would challenge Codd's relational theory and related advances
in data processing there are some developments that force established vendors
like Oracle to come up with new features and products.
Column Stores and Oracle
Column store concept has been around for quite a while. Vendors like HP
Vertica grabbed some market share in data warehousing seg... (more)
(Please refer to the following article: Oracle 12c In-Memory Database is Out
- Hardly Anybody Notices for update on Oracle 12c databases)
Contemporary large servers are routinely configured with 2TB of RAM. It is
thus possible to fit an entire average size OLTP database in memory directly
accessible by CPU. There is a long history of academic research on how to
best utilize relatively abundant computer memory. This research is becoming
increasingly relevant as databases serving business applications are heading
towards memory centric design and implementation.
If you simply place... (more)