This website is about the 2013 edition of the school. Visit this year's edition of LASER.
Access lecture material School pictures
LASER Logo
 

10th LASER Summer School on Software Engineering

Software for the Cloud and Big Data

September 8-14, 2013 - Elba Island, Italy

Read the proceedings of previous LASER schools
LASER proceedings 2013/2014 LASER proceedings 2011 LASER proceedings 2008-2010 LASER proceedings 2007/2008

Roger Barga (Microsoft)
Karin Breitman (EMC)
Sebastian Burckhardt (Microsoft)
Adrian Cockcroft (Netflix)
Carlo Ghezzi (Politecnico di Milano)
Anthony Joseph (Berkeley)
Pere Mato Vila (CERN)
Bertrand Meyer (ETH Zurich)

Lectures

Development of dynamically evolving and self-adaptive software

Speaker: Carlo Ghezzi, Politecnico di Milano

Description:

Software is increasingly embedded in unstable settings where changes occur at all levels and continuously. Changes may occur at the requirements level. They may occur in the environment, and thereby affect the domain assumptions upon which the software was developed. They may also affect the computational infrastructure within which the software is run. Changes may lead an existing software into a situation where it fails to satisfy its intended goals. They may lead to failures or to unacceptable quality of service, and thus often to breaking the contract with the software's clients.

Software engineering has long studied the problem of off-line evolution (aka software maintenance). Many applications, however, are continuously running and ask for on-line change support as they are providing service: they require self-adapting capabilities. To achieve this goal, a paradigm shift is needed, which dissolves the traditional boundary between development time and run time. In particular, models must be kept at run-time and verification must be performed to detect possible requirements violations. The lectures start by focusing on the real-world requirements that lead to self-adaptive systems and then discuss how reflective capabilities can be designed to support self-adaptive capabilities. It will discuss the issues involved in run-time verification (in the context of model checking) and in supporting safe dynamic software updates.

Short biography:
see Speakers page



Consistency in Distributed Systems

Speaker: Sebastian Burckhardt, Microsoft

Description:

Data replication is a common technique for programming distributed systems, and is often important to achieve performance or reliability goals. Unfortunately, the replication of data can compromise its consistency, and thereby break programs that are unaware. In particular, in weakly consistent systems, programmers must assume some responsibility to properly deal with queries that return stale data, and to avoid state corruption under conflicting updates. The fundamental tension between performance (favoring weak consistency) and correctness (favoring strong consistency) is a recurring theme when designing concurrent and distributed systems, and is both practically relevant and of theoretical interest. In this course, we investigate how to understand and formalize consistency guarantees, and how we can determine if a system implementation is correct with respect to such specifications. As a special case, we will visit some classical results of distributed systems, and learn about correctness conditions for concurrent objects and replicated data types.

Short biography:
see Speakers page



WT* is Big Data?

Speaker: Karin Breitman, EMC

Description:

Most companies around the world have massive structured and unstructured business data about its projects, products, processes, production, and people. Today's challenge is how to transform all that into valuable insight. Big Data is a recent buzzword used to represent the collection of tools, methods, and techniques that can be employed in the manipulation of very large datasets. Despite the hip and inconsistencies around the term, there's a comprehensible set of computer science skills required of anyone that goes about calling him/herself a data scientist. In this lecture we dissect, define and take a deep dive on some of the (unsurprisingly not new) required Big Data disciplines. We illustrate then with real industry (no Social Networks, sorry) examples – Telco, Health and Oil & Gas. We finalize by discussing job opportunities in industry/research.

Short biography:
see Speakers page



NetflixOSS - A Cloud Native Architecture

Speaker: Adrian Cockcroft, Netflix

Description:

Starting in 2009, Netflix built out a set of architectural patterns that are focused on future-proofing the scalability, availability and agility needs of the Netflix streaming video service as a "green field" application, optimized for running on the globally distributed public cloud supplied by AWS. As the architecture matured parts of it were released as open source projects, and now in 2013 they form a complete platform as a service (PaaS) offering known as NetflixOSS. The platform is built using Java, Scala, Groovy and Python. This cloud native architecture is notable for having every service (including storage) be ephemeral; it's use of chaos engines to continuously disrupt services to promote antifragility; achieving high scalability and availability through fine grain stateless micro-services; backed by a storage tier that is triple replicated within a region, and supports global replication across regions.

From March to September 2013, Netflix is also sponsoring a Cloud Prize competition for open source contributions to NetflixOSS. Code and information can be found at http://netflix.github.com. Submissions by third parties include new features and services, as well as ports to other environments such as the Eucalyptus private cloud.

The LASER lectures will cover 1) The motivations and economics of public cloud. 2) The migration path from datacenter to cloud. 3) Service and API Architectures. 4) Storage architecture. 5) Cloud based operations and tools. 6) Example applications.

Short biography:
see Speakers page



Software challenges of doing big science on the cloud

Speaker: Pere Mato Vila, CERN

Description:

On this series of lectures we will cover the full life cycle of the scientific software and its challenges of adapting it for doing big science on clouds and grids. To illustrate with concrete needs we will be using the LHC experiments at CERN, which have recently managed to process more than 15PB data that led to extraordinary discoveries in the filed of High Energy Physics. In general, big science requires big data, thus all the challenges associated with data access and management, but it also requires high performance scientific data processing software that would allow scientists to extract the knowledge from the unprecedented amount of data coming from these modern experimental devices. A large and geographically distributed development team of scientists does the design and development of this software, which for these LHC experiments consists of several millions of lines of code. Moreover, most of these scientists are not formally trained as software engineers. Being able to produce working and performant software is perhaps one the first challenge we need to cope with. We need then to integrate all these software components and libraries into a number of data processing applications that should be able to analyze large amount of data very reliably and efficiently. We need to configure, optimize and validate all these applications for various operating systems and platforms, and finally the challenge of distributing and deploying it on clouds and grids. Perhaps the most challenging aspect is coping with software changes. Scientific software is not very static. New ideas from scientists and better understanding of the experimental apparatus translates typically in new code that needs to be tested, configured, packaged and deployed on the cloud. To really exploit the scientific potential of the experimental facilities and to motivate the creativity from scientists, it is better to be able to upgrade and deploy new software in hours rather than in weeks. Current technologies such as virtualization and clouds can really help big science.

Short biography:
see Speakers page



Concurrency, mobility and distributed development

Speaker: Bertrand Meyer, ETH Zurich and Eiffel Software

Description:

In the first set of topics covered by these lectures, I will review recent developments in the SCOOP concurrency model intended to support “scaling up”: providing the solid mechanisms required by high-performance computing and big data. The mechanisms include processing of large data structures concurrently and support for object mobility.

The second set of topics covers new developments In methods and tools for distributed software construction, an ever more important model for software projects. It will take advantage of lessons learned both in an industrial setting and in the ETH “Distributed Software Engineering Laboratory” course and project.

Short biography:
see Speakers page



Mesos: A Platform for Fine-Grained Resource Sharing in the Data

Speaker: Anthony Joseph, Berkeley

Description:

Mesos is a platform for running multiple diverse cluster computing frameworks, such as Hadoop, MPI, and web services, on commodity clusters. Sharing improves cluster utilization and avoids per-framework data replication. Mesos shares resources in a fine-grained manner, which allows frameworks to achieve data locality by taking turns reading data stored on each machine. To support the sophisticated schedulers of today's frameworks, Mesos introduces a distributed two-level scheduling mechanism, called resource offers. Mesos decides how many resources to offer each framework, while frameworks decide which resources to accept and which computations to schedule on these resources. Our experimental results show that Mesos can achieve near-optimal locality when sharing the cluster among diverse frameworks, can scale up to 50,000 (emulated) Nodes, and is resilient to node failures. Mesos is in production at numerous companies, including AirBnB where it manages the open source Chronos platform, and Twitter where it manages several thousand machines.

Short biography:
see personal website



Programming the Cloud

Speaker: Roger Barga, Microsoft

Description:

Cloud computing allows data centers to provide self-service, automatic, and on-demand access to services such as data storage and hosted applications that provide scalable web services and large scale data analysis. While the architecture of a data center is similar to a conventional supercomputer, they are designed with a very different goal. For example, cloud computing makes heavy use of virtualization technology for dynamic application scaling and data storage redundancy for fault tolerance. And cloud computing often separates compute services from storage services to better support isolation and multitenancy. The massive scale and external network bandwidth of today’s data centers make it possible for users and application service providers to scale from one to thousands of CPU core and pay only for the resources consumed. However, to efficiently utilize cloud computing resource requires developers understand the implications of data center design, cloud system architectures, along with new patterns and practices for programming in the cloud.

This LASER lecture series will cover the 1) data center design and implications on application development, 2) cloud computing system architecture, using Windows Azure as a concrete example, 3) application programming models for cloud computing, and 4) common patterns and practices for Big Data analytics on the cloud using examples from enterprise applications.

Short biography:
see Speakers page




Chair of Software Engineering - ETH Zurich