Data Frontiers, by Curt Monash Curt Monash runs Monash Research, which provides strategic, analysis-based advice to users and vendors of advanced information technology. He also writes the blogs DBMS2, Text Technologies, and Strategic Messaging. Write him at contact@monash.com Hot Topics in High-Performance Analytics For the past few months, I've collected a lot of data points to the effect that high-performance analytics i.e., beyond straightforward query is becoming increasingly important. And I've written about some of them at length. For example:
Ack. I can't decide whether "analytics" should be a singular or plural noun. Thoughts? Another area that's come up which I haven't blogged about so much is data mining in the database. Data mining accounts for a large part of data warehouse use. The traditional way to do data mining is to extract data from the database and dump it into SAS. But there are problems with this scenario, including: >>Continue reading "Hot Topics in High-Performance Analytics" Posted Monday, November 17, 2008 10:03 AM >>Comments Getting to Answers on Oracle's New Hardware I spent about six hours at Oracle last week talking with Andy Mendelsohn, Ray Roccaforte, Juan Loaiza, Cetin Ozbutun, et al. and plan to write more later. For now, let me pass along a few quick comments. The key philosophical point that I had perhaps been missing is that Oracle thinks there is and should be a storage (server) tier, just as there also are database (server), application (server), and web (server) tiers. Exadata cells are designed to never talk with each other. Instead, they talk to a set of Infiniband switches, which then talk to a grid of servers on the database tier. Oracle thinks this has solved its I/O bandwidth problem for once and for all. It's hard to see why that wouldn't be the case. What Exadata does on the storage tier in query execution is throw stuff away. Mainly, this is projection and restriction/SELECT. But if a join has been resolved on a small fact table, and Oracle is now filtering a fact table to match a value or set of values, the storage tier can do that too. >>Continue reading "Getting to Answers on Oracle's New Hardware" Posted Wednesday, October 22, 2008 12:27 PM >>Comments A Quick Guide to Teradata's Latest News The Teradata Partners (i.e., user) conference is this week. So there have been lots of press releases, some presentations, lots of meetings, and so on. A lot of Teradata's messaging is in flux, as it moves fairly rapidly to correct what I believe have been some deficiencies in the past. One confusing result is that there was very little prebriefing about the actual announcement details, and we're all scrambling to figure out what's up. Teradata does a good job of collecting its press releases at one URL. So without linking to most of them individually, let me jump in to an overview of Teradata news this week (whether or not in actual press release format): >>Continue reading "A Quick Guide to Teradata's Latest News" Posted Tuesday, October 14, 2008 11:51 AM >>Comments HP-Oracle Appliance Prices Estimated I've been trying to figure out how much the HP-Oracle Database Machine and HP-Oracle Exadata Storage Server actually cost. My first estimate was $58-190K/TB (user data), but I've since updated my pricing spreadsheet. Specifically: The first page of these estimates have been modestly altered to reflect more chargeable software options, as per the discussion below. >>Continue reading "HP-Oracle Appliance Prices Estimated" Posted Friday, October 3, 2008 1:11 PM >>Comments HP-Oracle Hardware Parallelization Clarified Some kind Oracle development managers have reached out and helped me better understand where Oracle does or doesn't stand in query and analytic parallelization. Let's start with the part everybody pretty much knows already: There are two parts to a parallelization story how you get data off of disk, and what you do with it once you have it. >>Continue reading "HP-Oracle Hardware Parallelization Clarified" Posted Wednesday, October 1, 2008 11:49 AM >>Comments Oracle Finally Answers Data Warehouse Challengers Oracle, in partnership with HP, has announced a new data warehouse appliance product line, cleverly branded "Exadata." The basic idea seems to be that database processing is split among two sets of servers: (The new stuff) A set of back-end servers the Oracle Exadata Storage Servers that gets data off of disk and does some preliminary query processing. Numbers are being thrown around suggesting that, unlike prior Oracle offerings, the Exadata-based appliance at least has scalability and price/performance worth comparing to Teradata hey, Exa is bigger than Tera! Netezza, et al. >>Continue reading "Oracle Finally Answers Data Warehouse Challengers" Posted Thursday, September 25, 2008 1:49 AM >>Comments Vertica Spells Out Compression Claims Omer Trajman of column-store DBMS vendor Vertica put up a must-read blog spelling out detailed compression numbers, based on actual field experience (which I'd guess is from a combination of production systems and POCs): >>Continue reading "Vertica Spells Out Compression Claims " Posted Wednesday, September 24, 2008 12:54 PM >>Comments Infobright Open Source Move Packs Potential Infobright announced today that it's going full-bore into open source specifically in the MySQL ecosystem with the licensing approach, pricing, distribution strategy, and VC money from Sun that such a move naturally entails. I think this is a great idea, for a number of reasons: >>Continue reading "Infobright Open Source Move Packs Potential" Posted Monday, September 15, 2008 11:16 AM >>Comments Tradeoffs In Splitting DBMS Work Among MPP Nodes I talk with lots of vendors of MPP data warehouse DBMS. I've now heard enough different approaches to MPP architecture that I think it might be interesting to contrast some of the alternatives. The base-case MPP DBMS architecture is one in which there are two kinds of nodes: A boss node, whose jobs include: >>Continue reading "Tradeoffs In Splitting DBMS Work Among MPP Nodes" Posted Tuesday, September 9, 2008 12:16 PM >>Comments Why MapReduce Matters to SQL Data Warehousing Greenplum and Aster Data have both just announced the integration of MapReduce into their SQL MPP data warehouse products. So why do I think this could be a big deal? The short answer is "Because MapReduce offers dramatic performance gains in analytic application areas that still need great performance speed-up." The long answer goes something like this. The core ideas of MapReduce are: >>Continue reading "Why MapReduce Matters to SQL Data Warehousing" Posted Thursday, August 28, 2008 8:53 AM >>Comments David Raab Offers Kudos for QlikView David Raab is a great fan and former reseller of QlikTech's QlikView. His recent lengthy post about the product (I hesitate to call it "detailed" only because he rightly observes that QlikTech is in fact stingy with technical detail) is positive enough to have been recommended by the company itself. Specifically, it was cited in the comment thread to my recent post on QlikTech, where David himself also addressed some of my questions. But of course, no technology is perfect, not even one as great as David thinks QlikView is. >>Continue reading "David Raab Offers Kudos for QlikView" Posted Monday, August 25, 2008 8:18 AM >>Comments When to Use Modern DBMS Alternatives If there's one central theme in my DBMS2 blog, it's that modern database management system alternatives should in many cases be used instead of the traditional market leaders. So it was only a matter of time before somebody sponsored a white paper on that subject. The paper, sponsored by EnterpriseDB (disclosure noted), is now posted along with my other recent white papers. Its conclusion summarizing what kinds of database management system you should use in which circumstances is reproduced below. Many new applications are built on existing databases, adding new features to already-operating systems. But others are built in connection with truly new databases. And in the latter cases, it's rare that a market-leading product is the best choice. Mid-range DBMS (for OLTP) or specialty data warehousing systems (for analytics) are usually just as capable, and much more cost-effective. Exceptions arise mainly in three kinds of cases: >>Continue reading "When to Use Modern DBMS Alternatives" Posted Thursday, August 21, 2008 8:13 AM >>Comments Comparing Vertica, ParAccel and Exasol I talked with executives at Nuremberg, Germany-based Exasol last week at 5:00 am ET! and of course want to blog about it. For clarity, I'd like to start by comparing/contrasting the fundamental data structures at Vertica, ParAccel, and Exasol. And it feels like that should be a separate post. So here goes. >>Continue reading "Comparing Vertica, ParAccel and Exasol" Posted Tuesday, August 19, 2008 9:00 AM >>Comments Patent Nonsense in the Data Warehouse DBMS Market There are two recent patent lawsuits in the data warehouse DBMS market. In one, Sybase is suing Vertica. In another, an individual named Cary Jardin (techie founder of XPrime, a sort of predecessor company to ParAccel) is suing DATAllegro. Naturally, there's press coverage of the DATAllegro case, due in part to its surely non-coincidental timing right after the Microsoft acquisition was announced and in part to a vigorous PR campaign around it. And the Sybase case so excited one troll that he posted identical references to it on about 12 different threads in this blog, as well as to a variety of Vertica-related articles in the online trade press. But I think it's very unlikely that either of these cases turns out to much matter. >>Continue reading "Patent Nonsense in the Data Warehouse DBMS Market" Posted Friday, August 15, 2008 10:15 AM >>Comments
|
Blog Channels
The Brain Food Blogger SQL Puzzlers by Joe Celkoon Enterprise App Development on Changing the Enterprise by Shawn Shell by Kas Thomas Product Maven Subscribe to RSS feed of all blogs Archives
|
| |||||||||||||||||||||||||||||||






















