MOON | Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
MOON: MapReduce On Opportunistic eNvironments
Abstract
MapReduce offers an ease-of-use programming paradigm for processing large data sets, making it an attractive model for distributed volunteer computing systems. However, unlike on dedicated resources, where MapReduce has mostly been deployed, such volunteer computing systems have significantly higher rates of node unavailability. Furthermore, nodes are not fully controlled by the MapReduce framework. Consequently, we found the data and task replication scheme adopted by existing MapReduce implementations woefully inadequate for resources with high unavailability.
To address this, we propose MOON, short for MapReduce On Opportunistic eNvironments. MOON extends Hadoop, an open-source implementation of MapReduce, with adaptive task and data scheduling algorithms in order to offer reliable MapReduce services on a hybrid resource architecture, where volunteer computing systems are supplemented by a small set of dedicated nodes. Our tests on an emulated volunteer computing system, which uses a 60-node cluster where each node possesses a similar hardware configuration to a typical computer in a student lab, demonstrate that MOON can deliver a three-fold performance improvement to Hadoop in volatile, volunteer computing environments.
References
[1]
}}Hadoop. http://hadoop.apache.org/core/.
[2]
}}A. Adya, W. Bolosky, M. Castro, R. Chaiken, G. Cermak, J. Douceur, J. Howell, J. Lorch, M. Theimer, and R. Wattenhofer. FARSITE: Federated, available, and reliable storage for an incompletely trusted environment. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation, 2002.
[3]
}}D. Anderson. Boinc: A system for public-resource computing and storage. Grid Computing, IEEE/ACM International Workshop on, 0, 2004.
[4]
}}Apple Inc. Xgrid. http://www.apple.com/server/macosx/technology/xgrid.html.
[5]
}}S. Chen and S. Schlosser. Map-reduce meets wider varieties of applications meets wider varieties of applications. Technical Report IRP-TR-08-05, Intel Research, 2008.
[6]
}}A. Chien, B. Calder, S. Elbert, and K. Bhatia. Entropia: Architecture and performance of an enterprise desktop grid system. Journal of Parallel and Distributed Computing, 63, 2003.
[7]
}}B.-G. Chun, F. Dabek, A. Haeberlen, E. Sit, H. Weatherspoon, M. F. Kaashoek, J. Kubiatowicz, and R. Morris. Efficient Replica Maintenance for Distributed Storage Systems. In NSDI'06: Proceedings of the 3rd conference on Networked Systems Design & Implementation, pages 4--4, Berkeley, CA, USA, 2006. USENIX Association.
[8]
}}J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Commun. ACM, 51(1), 2008.
[9]
}}G. Fedak, H. He, and F. Cappello. Bitdew: a programmable environment for large-scale data management and distribution. In SC '08: Proceedings of the 2008 ACM/IEEE conference on Supercomputing, pages 1--12, Piscataway, NJ, USA, 2008. IEEE Press.
[10]
}}A. Gharaibeh and M. Ripeanu. Exploring Data Reliability Tradeoffs in Replicated Storage Systems. In HPDC '09: Proceedings of the 18th ACM international symposium on High performance distributed computing, pages 217--226, New York, NY, USA, 2009. ACM.
[11]
}}S. Ghemawat, H. Gobioff, and S. Leung. The Google file system. In Proceedings of the 19th Symposium on Operating Systems Principles, 2003.
[12]
}}M. Grant, S. Sehrish, J. Bent, and J. Wang. Introducing map-reduce to high end computing. 3rd Petascale Data Storage Workshop, Nov 2008.
[13]
}}GridGain Systems, LLC. Gridgain. http://www.gridgain.com/.
[14]
}}A. Gupta, B. Lin, and P. A. Dinda. Measuring and understanding user comfort with resource borrowing. In HPDC '04: Proceedings of the 13th IEEE International Symposium on High Performance Distributed Computing, pages 214--224, Washington, DC, USA, 2004. IEEE Computer Society.
[15]
}}A. Haeberlen, A. Mislove, and P. Druschel. Glacier: Highly durable, decentralized storage despite massive correlated failures. In Proceedings of the 2nd Symposium on Networked Systems Design and Implementation (NSDI'05), May 2005.
[16]
}}B. Javadi, D. Kondo, J.-M. Vincent, and D. P. Anderson. Mining for Statistical Models of Availability in Large-Scale Distributed Systems: An Empirical Study of SETI@home. In 17th IEEE/ACM International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS), September 2009.
[17]
}}S. Ko, I. Hoque, B. Cho, and I. Gupta. On Availability of Intermediate Data in Cloud Computations. In 12th Workshop on Hot Topics in Operating Systems (HotOS XII), 2009.
[18]
}}D. Kondo, M. Taufe, C. Brooks, H. Casanova, and A. Chien. Characterizing and evaluating desktop grids: an empirical study. In Proceedings of the 18th International Parallel and Distributed Processing Symposium, 2004.
[19]
}}A. Matsunaga, M. Tsugawa, and J. Fortes. Cloudblast: Combining mapreduce and virtualization on distributed resources for bioinformatics. Microsoft eScience Workshop, 2008.
[20]
}}J. Strickland, V. Freeh, X. Ma, and S. Vazhkudai. Governor: Autonomic throttling for aggressive idle resource scavenging. In Proceedings of the 2nd IEEE International Conference on Autonomic Computing, 2005.
[21]
}}Sun Microsystems. Compute server. https://computeserver.dev.java.net/.
[22]
}}D. Thain, T. Tannenbaum, and M. Livny. Distributed Computing in Practice: The Condor Experience. Concurrency and Computation: Practice and Experience, 2004.
[23]
}}S. Vazhkudai, X. Ma, V. Freeh, J. Strickland, N. Tammineedi, and S. Scott. Freeloader: Scavenging desktop storage resources for bulk, transient data. In Proceedings of Supercomputing, 2005.
[24]
}}M. Zaharia, A. Konwinski, A. Joseph, R. Katz, and I. Stoica. Improving mapreduce performance in heterogeneous environments. In OSDI, 2008.
[25]
}}M. Zhong, K. Shen, and J. Seiferas. Replication degree customization for high availability. SIGOPS Oper. Syst. Rev., 42(4), 2008.
Information & Contributors
Information
Published In
HPDC '10: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
June 2010
911 pages
Copyright © 2010 ACM.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
Published: 21 June 2010
Permissions
Request permissions for this article.
Check for updates
Author Tags
Qualifiers
- Research-article
Funding Sources
Conference
Acceptance Rates
Overall Acceptance Rate 166 of 966 submissions, 17%
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- Downloads (Last 12 months)17
- Downloads (Last 6 weeks)2
Reflects downloads up to 27 Feb 2025
Other Metrics
Citations
- Luo ZSon SRatnasamy SShenker SGavrilovska ATerry D(2024)Harvesting memory-bound CPU stall cycles in software with MSHProceedings of the 18th USENIX Conference on Operating Systems Design and Implementation10.5555/3691938.3691942(57-75)Online publication date: 10-Jul-2024
- Singh BVerma HMadaan V(2023)Performance Challenges and Solutions in Big Data Platform HadoopRecent Advances in Computer Science and Communications10.2174/266625581666623060816514616:9Online publication date: Nov-2023
- Liang XYao LWu SLi YXu Y(2023)CARE: A Cost-AwaRe Eviction Strategy for Improving Throughput in Cloud Environments2023 IEEE 29th International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS60453.2023.00305(2269-2276)Online publication date: 17-Dec-2023
- Yalles SHandaoui MDartois JBarais Od'Orazio LBoukhobza J(2022)RISCLESS: A Reinforcement Learning Strategy to Guarantee SLA on Cloud Ephemeral and Stable Resources2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)10.1109/PDP55904.2022.00021(83-87)Online publication date: Mar-2022
- Saadoon MAb. Hamid SSofian HAltarturi HAzizul ZNasuha N(2022)Fault tolerance in big data storage and processing systems: A review on challenges and solutionsAin Shams Engineering Journal10.1016/j.asej.2021.06.02413:2(101538)Online publication date: Mar-2022
- Awaysheh FAlazab MGarg SNiyato DVerikoukis C(2021)Big Data Resource Management & Networks: Taxonomy, Survey, and Future DirectionsIEEE Communications Surveys & Tutorials10.1109/COMST.2021.309499323:4(2098-2130)Online publication date: Dec-2022
- Kratzke N(2020)Volunteer Down: How COVID-19 Created the Largest Idling Supercomputer on EarthFuture Internet10.3390/fi1206009812:6(98)Online publication date: 6-Jun-2020
- Dos Anjos JMatteussi KDe Souza PGrabher GBorges GBarbosa JGonzalez GLeithardt VGeyer C(2020)Data Processing Model to Perform Big Data Analytics in Hybrid InfrastructuresIEEE Access10.1109/ACCESS.2020.30233448(170281-170294)Online publication date: 2020
- Banerjea SPandey MGore MKumar A(2020)Publish/Subscribe-Based P2P-Cloud of Underutilized Computing Resources for Providing Computation-as-a-ServiceProceedings of the National Academy of Sciences, India Section A: Physical Sciences10.1007/s40010-020-00662-4Online publication date: 19-Feb-2020
- Javadi SSuresh AWajahat MGandhi A(2019)ScavengerProceedings of the ACM Symposium on Cloud Computing10.1145/3357223.3362734(272-285)Online publication date: 20-Nov-2019
- Show More Cited By
View Options
Login options
Check if you have access through your login credentials or your institution to get full access on this article.