Skip to main content

Non-Grid Opportunistic Resources for (Big Data) Volunteer Computing

Rashid, Md Mamunur (2017) Non-Grid Opportunistic Resources for (Big Data) Volunteer Computing. Doctor of Philosophy (PhD) thesis, University of Kent. (Access to this publication is currently restricted. You may be able to access a copy if URLs are provided) (KAR id:61077)

PDF
Language: English

Restricted to Repository staff only
Contact us about this Publication
[img]

Abstract

CPU-intensive computing at LHC (The Large Hadron Collider) requires collaborative distributed computing resources to accomplish its data reconstruction and analysis. Currently, institutional Grid is trying to manage and process large datasets within limited time and cost. The baseline paradigm is now well established to use the Computing Grid and more specifically the WLCG (Worldwide LHC Computing Grid) and its supporting infrastructures. In order to achieve its Grid Computing, LHCb has developed a Community Grid Solution called DIRAC (Distributed Infrastructure with Remote Agent Control). It is based on a pilot job submission system to the institutional Grid infrastructures. However, there are other computing resources like idle desktops (e.g. SETI@home), the idle computing cluster (e.g. CERN's Online selection farm outside data-taking periods by LHC detectors) that could be used outside the Grid infrastructures. Because of their lightweight, in particular, simulation activities could benefit from using those opportunistic resources. The DIRAC architecture allows the use of the existing institutional grid resources. To expand the capability of existing computing powers, I have proposed to integrate opportunistic resources in the distributed computing system (DIRAC). In order, not to be dependent on the local settings for the worker node at the external resource, I propose using virtual machines. The architectural modifications required for DIRAC are presented here, with specific examples for data analyses non-Grid clusters. This solution was achieved by making the necessary changes in 3 state-of-the-art technologies: DIRAC, CernVM and OpenNebula. The combination of these three techniques is referred to as the DiCON architecture. I am referring the new approach as a framework rather than a specific technical solution to a specific scientific problem as this can be reused in similar big data analysis environment. I have also shown how this was used to analyse large-scale climate data. This was a rather challenging to use one developed infrastructure to another research area. I have also proposed to use dataflow architecture to exploit the possibilities of opportunistic resources and in the meantime, establish reliability and stability. Dataflow computing architecture in a virtual environment is seen as a possible future research extension of this work. This is a theoretical contribution only and this is a unique approach in a virtual cloud (not in-house computing) environment.

This paradigm could give the scientific community access to a large number of non- conventional opportunistic CPU resources for scientific data processing. This PhD work optimises the challenges and the solutions provided by such a computing infrastructure.

Item Type: Thesis (Doctor of Philosophy (PhD))
Thesis advisor: Wang, Frank
Uncontrolled keywords: Big Data, Volunteer Computing, Opportunistic Resources
Divisions: Faculties > Sciences > School of Computing
Depositing User: Users 1 not found.
Date Deposited: 28 Mar 2017 17:00 UTC
Last Modified: 06 Feb 2020 04:15 UTC
Resource URI: https://kar.kent.ac.uk/id/eprint/61077 (The current URI for this page, for reference purposes)
  • Depositors only (login required):

Downloads

Downloads per month over past year