One size doesn’t fits all ? — 1

One size doesn’t fits all ? — 1

·

5 min read

I came across this white paper listed in the references of the book Designing Data-Intensive Applications, which I am currently reading. While the paper was written almost a decade and a half ago (2005), I found the content very relevant and agree with the assessments made by the researchers.

You are welcome to read on your own here is the link.

Michael Stonebraker and Uğur Çetintemel: “One Size Fits All’: An Idea Whose Time Has Come and Gone,” at 21st International Conference on Data Engineering (ICDE), April 2005.

For others looking for a summary, I’ll share my key takeaways and what I found most interesting. The white paper argues that, due to vastly different requirements, one-size-fits-all database systems no longer suffice. They effectively illustrate this point using stream vs. DBMS requirements, highlighting the fundamental differences between relational and streaming database designs. The authors further explore specific examples like text search and XML databases, demonstrating the ongoing trend towards specialized data storage solutions tailored to unique demands.

Image created by Author using canva

The whitepaper explains the rising popularity of data warehousing driven by three factors: increased data volume, improved data consistency, and better decision-making capabilities. It proposes consolidating data from various sources, including data marts, individual feeds, and different types of data sources, into a centralized repository for analytical purposes. This approach benefits both analytical and transactional systems by decoupling them:

  • Improved Analytics: Analytical queries enjoy greater flexibility and access to extensive data without burdening transactional systems.

  • Enhanced Transactions: Transactional processes no longer face interruptions from long-running analytical queries, ensuring smooth daily operations.

In the past, a single database served both Online Transaction Processing (OLTP) and Online Analytical Processing (OLAP) needs, using one codebase for marketing, sales, and other functions. However, the “data as new fuel” era, fueled by petabytes of data from sensors, machines, and humans, has significantly changed requirements. Streaming data, window processing for Internet of Things (IoT) and high-volume data influx now dominate.

The white paper explores these design differences through practical examples:

  • Sensor Use in the Military: How real-time data from sensors impacts decision-making and situational awareness.

  • Traffic Monitoring: Managing traffic flow and congestion using streaming data.

  • Food Delivery Services: Handling order tracking, logistics, and real-time updates.

  • Financial Sector Challenges: Rapid market changes and the need for timely customer alerts.

In summary, the paper delves into the evolving landscape of data management, emphasizing the need for specialized approaches to handle diverse data sources and real-time demands.

After reading this whitepaper, I designed the timeline below to illustrate the emergence of some modern databases. I believe this context will be helpful. Please let me know in the comments if you think I missed any popular databases you’re using, and I’ll be happy to add them!

Image made by Author using camva

I really like the section of inbound vs outbound processing .

Outbound processing:

  • Stores data first, then processes it. This is the traditional approach used by most databases. Data is persisted on disk before analysis, ensuring data integrity and availability for future use. However, this can introduce latency due to storage and retrieval steps.

Inbound processing (streaming):

  • Processes data as it arrives in real-time. Storage is optional, and data might be buffered temporarily in memory for immediate processing. This lowers latency and reduces processing costs compared to traditional methods. However, maintaining data integrity and ensuring availability for historical analysis can be complex. Stream processing is often used for analyzing sensor data, social media feeds, and financial transactions in real-time. Examples of stream processing systems: Apache Kafka, Apache Spark, Apache Flink, Streambase.

There can be a hybrid approaches some systems combine elements of both traditional and streaming processing for complex use cases.

Application designers need to carefully understand their requirements and tailor the choice of database to best serve their needs. This decision hinges on crucial factors such as:

  • High availability: Does your service require constant accessibility and uptime?

  • Low latency: Are quick response times critical for your application’s performance?

  • Concurrency: Can your service handle multiple user transactions simultaneously?

These design choices, along with financial considerations, heavily influence the engine selection. Answering these questions and factoring in budget constraints are crucial to choosing the database engine that perfectly suits your application’s unique needs.

Beyond the core requirements, the whitepaper explores additional use cases for specialized databases, including:

  • Text search: Enables efficient searching and analysis of large text corpora.

  • Sensor networks: Handles high-volume, real-time data streams from sensors and IoT devices.

  • Scientific databases: Manages diverse data formats and complex queries for scientific research.

  • XML databases: Offers flexible storage and retrieval for XML-based data.

However, the paper also acknowledges challenges inherent to streaming systems:

  • Delayed messages: How to handle messages that arrive out of order or with latency.

  • Message loss: Assessing your application’s tolerance for potential data loss.

  • Replication errors: Strategies for mitigating and recovering from replication issues.

These are all valuable considerations when designing and implementing streaming systems.

I’m diving deeper into Designing Data-Intensive Applications and will be sharing insights on specific whitepapers, concepts, and design patterns that capture my attention. If you’d like to join me on this exploration, consider following me to receive automatic notifications about my next article!

In case you missed my earlier article on this topic here is the link

Article link

Did you find this article valuable?

Support Aruna's blog by becoming a sponsor. Any amount is appreciated!