Big Data Consultant

Ranko Mosic

Subscribe to Ranko Mosic: eMailAlertsEmail Alerts
Get Ranko Mosic via: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn

Top Stories by Ranko Mosic

Typical Oracle VLDB is multi terabyte megalith running on big, expensive hardware. It is hard or impossible to back up, adding or modifying columns can take days and query optimization is very difficult. Database sharding is a well known method of breaking up a large database into smaller, manageable pieces ( database shards ). It is data warehouses i.e. VLDBs that can best take advantage of AWS database sharding capabilities. Basic premise is: manage huge volume of data by splitting it into multiple databases instead of creating table partitions. Database sharding provides a method for scalability across independent servers, each with their own CPU, memory and disk. A database shard is a horizontal partition in a database. AWS quick instance creation/decommissioning capabilities make it quite easy to implement database sharding in very flexible fashion. Fact tab... (more)

Big Data: Enterprise Class Machine Learning with Spark and MLbase

Machine Learning is a critical part of extracting value from Big Data. Choosing proper model, preparing data and getting usable results on large scale data is non-trivial exercise. Typically process consists of model prototyping using higher level, (mostly) single machine based tool like R, Matlab, Weka, then coding in Java or some other language for large scale deployment. This process is fairly involved, error prone, slow and inefficient. Existing tools aiming at automating and improving this process are still somewhat immature and wide scale Machine Learning enterprise adopti... (more)

Oracle Database Upgrades Faster and Safer in Amazon Web Services

Oracle database upgrades are a stressful exercise. Normally you need to backup production database, upgrade database software, then run database scripts that will upgrade dictionary. Once it is all done and success is confirmed you can start using upgraded database. If something goes wrong then you rely on backups to restore to a  previous state. In other words, there is complete dependency on backups and restore success if things go wrong. AWS gives us possibility to quickly and cheaply create new database instances where an upgrade can be tested. We can also instantly create com... (more)

Hadoop Distributed Storage Management on Amazon Web Services

Hadoop and AWS are enterprise ready cloud computing, distributed technologies. It is straightforward to add more DataNodes i.e. storage to Hadoop cluster on AWS. You just need to create another AWS instance and add a new node to Hadoop cluster. Hadoop will take care of balancing storage to keep level of file system utilization across DataNodes as  even  as possible. Cloudera's distribution of Hadoop includes Cloudera Manager which makes it simple to install Hadoop and add new nodes to it. Screenshot below shows an existing HDFS service with two DataNodes. We will expand HDFS by a... (more)

Mainstream Business Applications and In-Memory Databases

(Please refer to the following article: Oracle 12c In-Memory Database is Out - Hardly Anybody Notices for update on Oracle 12c databases) Contemporary large servers are routinely configured with 2TB of RAM. It is thus possible to fit an entire average size OLTP database in memory directly accessible by CPU. There is a long history of academic research on how to best utilize relatively abundant computer memory. This research is becoming increasingly relevant as databases serving business applications are heading towards memory centric design and implementation. If you simply place... (more)