Distributed systems employed in critical infrastructures must fulfill dependability, timeliness, and performance specifications. Since these systems most often operate in an unpredictable environment, their design and maintenance require quantitative evaluation of deterministic and probabilistic timed models. This need gave birth to an abundant literature devoted to formal modeling languages combined with analytical and simulative solution techniques
The aim of the book is to provide an overview of techniques and methodologies dealing with such specific issues in the context of distributed systems and covering aspects such as performance evaluation, reliability/availability, energy efficiency, scalability, and sustainability. Specifically, techniques for checking and verifying if and how a distributed system satisfies the requirements, as well as how to properly evaluate non-functional aspects, or how to optimize the overall behavior of the system, are all discussed in the book. The scope has been selected to provide a thorough coverage on issues, models. and techniques relating to validation, evaluation and optimization of distributed systems. The key objective of this book is to help to bridge the gaps between modeling theory and the practice in distributed systems through specific examples.
Dario Bruneo received his Degree in Computer Engineering from the Engineering Faculty of the University of Palermo (Italy) in 2000 and the PhD in Advanced Technologies for Information Engineering at the University of Messina (Italy) in 2005. Since then he has been engaged in research on distributed systems. He is currently an associate researcher at the Engineering Faculty of the University of Messina. The research activity of Dario Bruneo has been focused on the study of distributed systems with particular regards to the management of advanced service provisioning, to?the?system modeling and performance evaluation. Different research fields have been investigated ranging from the Quality of Service management, to the distributed programming, from ad-hoc and sensor networks to the performance analysis through analytical and simulative techniques. Is coauthor of more than 40 scientific papers on international journals and conference proceedings.
Salvatore Distefano is an assistant professor of the Politecnico di Milano. His research interests ?include performance evaluation, parallel?and distributed?computing, software engineering, and reliability techniques. During his research activity, he has contributed in the development of several tools such as WebSPN, ArgoPerformance and GS3.?He has been involved in several national and international research projects. He is author and co-author of more than 80 scientific papers.
Preface xiii
PART I VERIFICATION
1. Modeling and Verification of Distributed Systems Using Markov Decision Processes 3
1.1 Introduction 4
1.2 Markov Decision Processes 5
1.3 Markov Decision Well-Formed Net formalism 8
1.4 Case study: Peer-to-Peer Botnets 10
1.5 Conclusion 18
Appendices: Well-formed Net Formalism 21
A.1.1 Syntax of Basic Predicates 22
A.1.2 Markings and Enabling 23
References 25
2 Quantitative Analysis of Distributed Systems in Stoklaim: A Tutorial 27
2.1 Introduction 28
2.2 StoKlaim: Stochastic Klaim 29
2.3 StoKlaim Operational Semantics 34
2.4 MoSL: Mobile Stochastic Logic 43
2.5 jSAM: Java Stochastic Model-Checker 47
2.6 Leader Election in StoKlaim 49
2.7 Concluding Remarks 52
References 53
3 Stochastic Path Properties of Distributed Systems: the CSLTA Approach 57
3.1 Introduction 58
3.2 The Reference Formalisms for System Definition 59
3.3 The Formalism for Path Property Definition: CSLTA 61
3.4 CSLTA at work: a Fault-Tolerant Node 67
3.5 Literature Comparison 71
3.6 Summary and Final Remarks 73
References 75
PART II EVALUATION
4 Failure Propagation in Load-Sharing Complex Systems 81
4.1 Introduction 82
4.2 Building Blocks 84
4.3 Sand Box for Distributed Failures 89
4.4 Summary 102
References 103
5 Approximating Distributions and Transient Probabilities by Matrix Exponential Distributions and Functions 107
5.1 Introduction 108
5.2 Phase Type and Matrix Exponential Distributions 109
5.3 Bernstein Polynomials and Expolynomials 114
5.4 Application of BEs to Distribution Fitting 116
5.5 Application of BEs to Transient Probabilities 121
5.6 Conclusions 124
References 125
6 Worst-Case Analysis of Tandem Queueing Systems Using Network Calculus 129
6.1 Introduction 130
6.2 Basic Network Calculus Modeling: Per-flow Scheduling 132
6.3 Advanced Network Calculus Modeling: Aggregate Multiplexing 148
6.4 Tandem Systems Traversed by Several Flows 152
6.5 Mathematical Programming Approach 154
6.6 Related Work 165
6.7 Numerical Results 166
6.8 Conclusions 168
References 171
7 Cloud Evaluation: Benchmarking and Monitoring 175
7.1 Introduction 176
7.2 Benchmarking 176
7.3 Benchmarking with mOSAIC 184
7.4 Monitoring 185
7.5 Cloud Monitoring in mOSAIC?s Cloud Agency 191
7.6 Conclusions 193
References 195
8 Multiformalism and Multisolution Strategies for Systems Performance 201
8.1 Introduction 202
8.2 Multiformalism and Multisolution 203
8.3 Choosing the Right Strategy 205
8.4 Learning by the Experience 206
8.5 Conclusions and Perspectives 218
References 219
PART III OPTIMIZATION AND SUSTAINABILITY
9 Quantitative Assessment of Distributed Networks Through Hybrid Stochastic Modeling 225
9.1 Introduction 226
9.2 Modeling of Complex Systems 228
9.3 Performance Evaluation of KNXnet/IP Networks Flow Control Mechanism 234
9.4 LCII: On-line Risk Estimation of A Power-Telco Network 248
9.5 Conclusion 259
References 261
10 Design of IT Infrastructures of Data Centers: An Approach Based on Business and Technical Metrics 265
10.1 Introduction 266
10.2 Fundamental Concepts 267
10.3 Business-Oriented Models 270
10.4 Data Center Infrastructure Models 274
10.5 Methodology 277
10.6 Case Study - Data Center Design 283
10.7 Conclusion 292
References 297
11 Software Rejuvenation and its Application in Distributed Systems 301
11.1 Introduction 302
11.2 Software rejuvenation scheduling classification 304
11.3 Software rejuvenation granularity classification 307
11.4 Methods, policies and metrics of software rejuvenation 314
11.5 Software rejuvenation in distributed systems 315
11.6 Summary 318
References 321
12 Machine Learning Based Dynamic Reconfiguration of Distributed Data Management Systems 327
12.1 Introduction 328
12.2 Methodologies 330
12.3 Brief overview of Neural Networks 334
12.4 System Architecture and Performance Prediction Scheme 336
12.5 Experimentation 339
12.6 Conclusions 346
References 347
13 Going Green with the Networked Cloud: Methodologies and Assessment 351
13.1 Introduction 352
13.2 Modeling of Data Centre Power Consumption 353
13.3 Energy Efficiency in the Cloud 356
13.4 Performance Analysis Methodologies and Tools 361
13.5 Case Study: Performance Evaluation of Energy Aware Resource Allocation in the Cloud 366
13.6 Summary 370
References 371
Index 375