MapReduce Patterns, Algorithms, and Use Cases
In this article I digested a number of MapReduce patterns and algorithms to give a systematic view of the different techniques that can be found on the web or scientific articles. Several practical...
View ArticleTricks with Direct Memory Access in Java
Java was initially designed as a safe, managed environment. Nevertheless, Java HotSpot VM contains a “backdoor” that provides a number of low-level operations to manipulate memory and threads directly....
View ArticleNoSQL Data Modeling Techniques
NoSQL databases are often compared by various non-functional criteria, such as scalability, performance, and consistency. This aspect of NoSQL is well-studied both in practice and theory because...
View ArticleHierarchical Navigation and Faceted Search on Top of Oracle Coherence
Some time ago I participated in design of a backend for one large online retailer company. From the business logic point of view, this was a pretty typical eCommerce service for hierarchical and...
View ArticleProbabilistic Data Structures for Web Analytics and Data Mining
Statistical analysis and mining of huge multi-terabyte data sets is a common task nowadays, especially in the areas like web analytics and Internet advertising. Analysis of such large data sets often...
View ArticleFast Intersection of Sorted Lists Using SSE Instructions
Intersection of sorted lists is a cornerstone operation in many applications including search engines and databases because indexes are often implemented using different types of sorted structures. At...
View ArticleSpeeding Up Hadoop Builds Using Distributed Unit Tests
We recently worked with one of the Hadoop vendors on the continuous integration system for Hadoop core and other Hadoop-related projects like Pig, Hive, HBase. One of the challenges we faced was very...
View ArticleDistributed Algorithms in NoSQL Databases
Scalability is one of the main drivers of the NoSQL movement. As such, it encompasses distributed system coordination, failover, resource management and many other capabilities. It sounds like a big...
View ArticleIn-Stream Big Data Processing
The shortcomings and drawbacks of batch-oriented data processing were widely recognized by the Big Data community quite a long time ago. It became clear that real-time query processing and in-stream...
View ArticleData Mining Problems in Retail
Retail is one of the most important business domains for data science and data mining applications because of its prolific data and numerous optimization problems such as optimal prices, discounts,...
View Article