Integrating hadoop and parallel dbms

Y Xu, P Kostamaa, L Gao - Proceedings of the 2010 ACM SIGMOD …, 2010 - dl.acm.org
Proceedings of the 2010 ACM SIGMOD International Conference on Management of …, 2010dl.acm.org
Teradata's parallel DBMS has been successfully deployed in large data warehouses over
the last two decades for large scale business analysis in various industries over data sets
ranging from a few terabytes to multiple petabytes. However, due to the explosive data
volume increase in recent years at some customer sites, some data such as web logs and
sensor data are not managed by Teradata EDW (Enterprise Data Warehouse), partially
because it is very expensive to load those extreme large volumes of data to a RDBMS …
Teradata's parallel DBMS has been successfully deployed in large data warehouses over the last two decades for large scale business analysis in various industries over data sets ranging from a few terabytes to multiple petabytes. However, due to the explosive data volume increase in recent years at some customer sites, some data such as web logs and sensor data are not managed by Teradata EDW (Enterprise Data Warehouse), partially because it is very expensive to load those extreme large volumes of data to a RDBMS, especially when those data are not frequently used to support important business decisions. Recently the MapReduce programming paradigm, started by Google and made popular by the open source Hadoop implementation with major support from Yahoo!, is gaining rapid momentum in both academia and industry as another way of performing large scale data analysis. By now most data warehouse researchers and practitioners agree that both parallel DBMS and MapReduce paradigms have advantages and disadvantages for various business applications and thus both paradigms are going to coexist for a long time [16]. In fact, a large number of Teradata customers, especially those in the e-business and telecom industries have seen increasing needs to perform BI over both data stored in Hadoop and data in Teradata EDW. One common thing between Hadoop and Teradata EDW is that data in both systems are partitioned across multiple nodes for parallel computing, which creates integration optimization opportunities not possible for DBMSs running on a single node. In this paper we describe our three efforts towards tight and efficient integration of Hadoop and Teradata EDW.
ACM Digital Library
以上显示的是最相近的搜索结果。 查看全部搜索结果