Big Data Consultant

Ranko Mosic

Machine Learning is a critical part of extracting value from Big Data. Choosing proper model, preparing data and getting usable results on large scale data is non-trivial exercise. Typically process consists of model prototyping using higher level, (mostly) single machine based tool like R, Matlab, Weka, then coding in Java or some other language for large scale deployment. This process is fairly involved, error prone, slow and inefficient. Existing tools aiming at automating and improving this process are still somewhat immature and wide scale Machine Learning enterprise adoption is still low. Efforts are under way to address this gap i.e. to make enterprise class Machine Learning more accessible and easier. Spark is new, purpose-built, distributed, in-memory engine that makes it possible to perform compute intensive jobs on commodity hardware clusters. One of ap... (more)

Deploying Oracle Databases to Amazon AWS (EC2, RDS)

Amazon recently added Oracle database hosting capabilities to its RDS service offering. You can rent an Oracle database related infrastructure in a pay-as-you-go fashion now. We are going to explore if corporations should be utilizing Amazon AWS Oracle Database related services (EC2, RDS ), how it should be used, where possible savings and potential trouble points are. With services like Amazon AWS it doesn't matter where your hardware and software physically is - it could be in a room next to you or in some other country. It is much easier and cheaper to procure and get new serv... (more)

Oracle Disaster Recovery Site Hosted by Amazon Cloud

DR sites are typically built as an exact replica of the primary site. Application and database software is installed on DR site and sits there mostly unused, waiting for a disaster to happen. DR site is very expensive proposition than only large companies are able to afford. Amazon AWS is an interesting alternative to having your own DR site. Oracle databases on the DR side are in Data Guard configuration with a primary site and actively apply archive log files shipped from there. Pay per use, scalable Amazon Cloud model makes it an attractive alternative to creating and maintai... (more)

Oracle Database and Big Data: A Powerful Combination

Ever wondered how it is possible that Google searches through so much data with such speed and precision ? Part of the answer is MapReduce, Google technology for processing and generating large data sets. Apache Hadoop is open source software that can process petabytes of data in parallel on hundreds and thousands of commodity hardware nodes. It was inspired by Google MapReduce. Oracle corporation is acknowledging the power of Oracle/Hadoop combination by announcing Big Data Appliance - essentially Hadoop/Oracle database software/Oracle hardware bundle, to be available next yea... (more)

Oracle Fusion Applications - Installation and First Impressions (Part 4)

Oracle Identity Management is a component of Oracle Fusion Middleware and part of the Oracle Fusion Applications infrastructure. Its purpose is to manage user identities across the enterprise. We are going to install Oracle Internet Directory 11g (OID), Oracle Virtual Directory 11g (OVD), Oracle Identity Manager 11g (OIM), Oracle Access Manager 11g (OAM) as well as two instances of Oracle Database ( - one for the Identity Store and the other for the Policy Store. SOA Suite is first component to be installed since Identity Manager requires it. Some SOA Suite components (... (more)