login/register

Snip!t from collection of Alan Dix

see all channels for Alan Dix

Snip
summary

In earlier work, we defined "multi-structural databases,... support efficient analysis of large, complex data sets o... numerical and hierarchical dimensions. We defined three ... over this data model, each of which required solving an ... problem. An

VLDB: VLDB '05, Efficient implementation of large-scale ...
http://portal.acm.org/citation.cfm?...

Categories

/Channels/random links

[ go to category ]

For Snip

loading snip actions ...

For Page

loading url actions ...

In earlier work, we defined "multi-structural databases," a data model to support efficient analysis of large, complex data sets over multiple numerical and hierarchical dimensions. We defined three types of queries over this data model, each of which required solving an optimization problem. An example is to find the ten most significant non-overlapping regions of geography crossed with time in which coverage of the Olympics was much stronger in newspapers than online sources.In this paper, we present a general query framework capturing the original three queries as part of a much broader family. We then give efficient algorithms for particular subclasses of this family. Finally, we describe an implementation of these algorithms that operates on a collection of several billion web documents. Using our algorithms in conjunction with random sampling techniques, our system can solve these queries in real time.

HTML

In earlier work, we defined "multi-structural databases," a data model to support efficient analysis of large, complex data sets over multiple numerical and hierarchical dimensions. We defined three types of queries over this data model, each of which required solving an optimization problem. An example is to find the ten most significant non-overlapping regions of geography crossed with time in which coverage of the Olympics was much stronger in newspapers than online sources.In this paper, we present a general query framework capturing the original three queries as part of a much broader family. We then give efficient algorithms for particular subclasses of this family. Finally, we describe an implementation of these algorithms that operates on a collection of several billion web documents. Using our algorithms in conjunction with random sampling techniques, our system can solve these queries in real time.