Big Data Consultant

Ranko Mosic

Subscribe to Ranko Mosic: eMailAlertsEmail Alerts
Get Ranko Mosic via: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn

Top Stories by Ranko Mosic

Machine Learning is a critical part of extracting value from Big Data. Choosing proper model, preparing data and getting usable results on large scale data is non-trivial exercise. Typically process consists of model prototyping using higher level, (mostly) single machine based tool like R, Matlab, Weka, then coding in Java or some other language for large scale deployment. This process is fairly involved, error prone, slow and inefficient. Existing tools aiming at automating and improving this process are still somewhat immature and wide scale Machine Learning enterprise adoption is still low. Efforts are under way to address this gap i.e. to make enterprise class Machine Learning more accessible and easier. Spark is new, purpose-built, distributed, in-memory engine that makes it possible to perform compute intensive jobs on commodity hardware clusters. One of ap... (more)

Column Store, In-Memory, MPP Databases and Oracle

( For latest information on Oracle 12c database update please refer to the following article: Oracle 12c Database and How It Relates to SAP Hana ) RDBMSs are stable and mature products. While there is nothing radically new on horizon that would challenge Codd's relational theory and related advances in data processing there are some developments that force established vendors like Oracle to come up with new features and products. Column Stores and Oracle Column store concept has been around for quite a while. Vendors like HP Vertica grabbed some market share in data warehousing seg... (more)

Installing Vertica

Vertica is high-performing, advanced RDBMS that is very simple to install and administer, thanks to the its modern design and purpose built architecture. Once we execute all preparatory steps on database servers and download Vertica software as per Installation Guide, we are starting installation process on a two node cluster (host01, host02): /opt/vertica/sbin/install_vertica -s host01,host02 -r vertica-ce-5.1.1-0.x86_64.RHEL5.rpm We initiate database creation process using dbadmin tool: $ /opt/vertica/bin/adminTools We will pick option 6 ( Configuration Menu ), then option 1 to... (more)

Mainstream Business Applications and In-Memory Databases

(Please refer to the following article: Oracle 12c In-Memory Database is Out - Hardly Anybody Notices for update on Oracle 12c databases) Contemporary large servers are routinely configured with 2TB of RAM. It is thus possible to fit an entire average size OLTP database in memory directly accessible by CPU. There is a long history of academic research on how to best utilize relatively abundant computer memory. This research is becoming increasingly relevant as databases serving business applications are heading towards memory centric design and implementation. If you simply place... (more)

Oracle RDBMS and Very Large Data Set Processing

Oracle database is a relational database management system that mostly complies with ACID transaction requirements ( atomicity, consistency, isolation, durability ). It means that each database transaction will be executed in a reliable, safe and integral manner. In order to comply with ACID Oracle database software implements fairly complex and expensive (in terms of computing resources, i.e., CPU, disk, memory) set of processes like redo and undo logging, memory latching, meta data maintenance etc. that make concurrent work possible, while maintaining data integrity. Any databa... (more)