diff --git a/2021-guangzhou.html b/2021-guangzhou.html index 07570b8..0156116 100644 --- a/2021-guangzhou.html +++ b/2021-guangzhou.html @@ -74,6 +74,8 @@ + + diff --git a/6824-lec-01.html b/6824-lec-01.html index 39cd1d5..05ff62d 100644 --- a/6824-lec-01.html +++ b/6824-lec-01.html @@ -74,6 +74,8 @@ + + diff --git a/about.html b/about.html index 96719f5..e51996f 100644 --- a/about.html +++ b/about.html @@ -71,6 +71,8 @@ + + diff --git a/bayes.html b/bayes.html index d29d215..bd04db3 100644 --- a/bayes.html +++ b/bayes.html @@ -74,6 +74,8 @@ + + diff --git a/categories.html b/categories.html index 1a0db4a..b74d2e8 100644 --- a/categories.html +++ b/categories.html @@ -71,6 +71,8 @@ + + diff --git a/cs169-c1l1.html b/cs169-c1l1.html index 91bcacd..8b523d1 100644 --- a/cs169-c1l1.html +++ b/cs169-c1l1.html @@ -74,6 +74,8 @@ + + diff --git a/da-01.html b/da-01.html index 9ff782c..c003ab4 100644 --- a/da-01.html +++ b/da-01.html @@ -74,6 +74,8 @@ + + diff --git a/decision-tree.html b/decision-tree.html index 16a63a4..840f772 100644 --- a/decision-tree.html +++ b/decision-tree.html @@ -74,6 +74,8 @@ + + diff --git a/gfs.html b/gfs.html index 93d0faa..1043322 100644 --- a/gfs.html +++ b/gfs.html @@ -74,6 +74,8 @@ + + diff --git a/index.html b/index.html index 70f5cc9..5be67d2 100644 --- a/index.html +++ b/index.html @@ -69,6 +69,8 @@ + + @@ -123,32 +125,28 @@

Patrick’s Blog

- + - - [UCB CS169] C1L1: Introduction to SaaS, Agile, Cloud Computing + + MapReduce

- UCB CS169 Software Engineering Course 1 Module 1: Introduction to SaaS, Agile, Cloud Computing + MapReduce: Simplified Data Processing on Large Clusters

- Posted on Wed, Apr 13, 2022 + Posted on Sat, Apr 16, 2022 📖Note - - Software - - - - UCB CS169 + + Distributed
diff --git a/mapreduce.html b/mapreduce.html new file mode 100644 index 0000000..a8911af --- /dev/null +++ b/mapreduce.html @@ -0,0 +1,156 @@ + + + + + + + + + + + + + + + + + + + + + + MapReduce | Patrick’s Blog + + + + + + + + + + + + + + +
+ +
+
+ +
+ +
+ +

MapReduce

+ +
+ + Posted on Sat, Apr 16, 2022 + + + + 📖Note + + + + Distributed + + +
+ +
+

MapReduce: Simplified Data Processing on Large Clusters

MapReduce is a programming model and an associated implementation for processing and generating large data sets.

partitioning the input data, scheduling the program’s execution across a set of machines, handling machine failures, and managing the required inter-machine communication

1 Introduction

The issues of how to parallelize the computation, distribute the data, and handle failures conspire to obscure the original simple computation with large amounts of complex code to deal with these issues.

new abstraction:

(inspired by the map and reduce primitives present in Lisp and many other functional languages)

2 Programming Model

input: key/value pairs

output: key/value pairs

2.1 Example

Counting the number of occurrences of each word in a large collection of documents. The user would write code similar to the following pseudo-code:

map(String key, String value):
+  // key: document name
+  // value: document contents
+  for each word w in value:
+    EmitIntermediate(w, "1");
+
+reduce(String key, Iterator values):
+  // key: a word
+  // values: a list of counts
+  int result = 0;
+  for each v in values:
+    result += ParseInt(v);
+  Emit(AsString(result));

In addition, the user writes code to fill in a mapreduce specification object with the names of the input and output files, and optional tuning parameters. The user then invokes the MapReduce function, passing it the specification object.

2.2 Types

Conceptually the map and reduce functions supplied by the user have associated types:

map     (k1, v1)       -> list(k2, v2)
+reduce  (k2, list(v2)) -> list(v2)

The input keys and values are drawn from a different domain than the output keys and values. The intermediate keys and values are from the same domain as the output keys and values.

2.3 More Examples

3 Implementation

This section describes an implementation targeted to the computing environment: large clusters of commodity PCs connected together with switched Ethernet.

3.1 Execution Overview

The Map invocations are distributed across multiple machines by automatically partitioning the input data into a set of M splits.

+ + + + \ No newline at end of file diff --git a/mysql-01.html b/mysql-01.html index 88e2661..011abc4 100644 --- a/mysql-01.html +++ b/mysql-01.html @@ -74,6 +74,8 @@ + + diff --git a/mysql-02.html b/mysql-02.html index 70941d7..80be225 100644 --- a/mysql-02.html +++ b/mysql-02.html @@ -74,6 +74,8 @@ + + diff --git a/mysql-03.html b/mysql-03.html index a72bed3..b207eee 100644 --- a/mysql-03.html +++ b/mysql-03.html @@ -74,6 +74,8 @@ + + diff --git a/mysql-04.html b/mysql-04.html index 8294868..3cc0817 100644 --- a/mysql-04.html +++ b/mysql-04.html @@ -74,6 +74,8 @@ + + diff --git a/svm.html b/svm.html index be26c91..0beabc2 100644 --- a/svm.html +++ b/svm.html @@ -74,6 +74,8 @@ + + diff --git a/tag/Database.html b/tag/Database.html index 81d850d..a25e9ac 100644 --- a/tag/Database.html +++ b/tag/Database.html @@ -65,6 +65,8 @@ + + diff --git a/tag/Distributed.html b/tag/Distributed.html index 48e583a..491dcf0 100644 --- a/tag/Distributed.html +++ b/tag/Distributed.html @@ -65,6 +65,8 @@ + + @@ -98,6 +100,36 @@

#Distributed

+
+

+ + + + + MapReduce + +

+ +

+ MapReduce: Simplified Data Processing on Large Clusters +

+ +
+ + Posted on Sat, Apr 16, 2022 + + + + 📖Note + + + + Distributed + + +
+
+

diff --git a/tag/MIT 6.824.html b/tag/MIT 6.824.html index d3bfc0d..dd7f9e4 100644 --- a/tag/MIT 6.824.html +++ b/tag/MIT 6.824.html @@ -65,6 +65,8 @@ + + diff --git a/tag/ML.html b/tag/ML.html index 67af3bb..3d5de50 100644 --- a/tag/ML.html +++ b/tag/ML.html @@ -65,6 +65,8 @@ + + diff --git a/tag/Software.html b/tag/Software.html index a9b1ee0..76a133f 100644 --- a/tag/Software.html +++ b/tag/Software.html @@ -65,6 +65,8 @@ + + diff --git a/tag/UCB CS169.html b/tag/UCB CS169.html index 4d10567..ef6e80f 100644 --- a/tag/UCB CS169.html +++ b/tag/UCB CS169.html @@ -65,6 +65,8 @@ + + diff --git "a/tag/\360\237\223\226Note.html" "b/tag/\360\237\223\226Note.html" index 1732517..a4971d4 100644 --- "a/tag/\360\237\223\226Note.html" +++ "b/tag/\360\237\223\226Note.html" @@ -65,6 +65,8 @@ + + @@ -98,6 +100,36 @@

#📖Note

+
+

+ + + + + MapReduce + +

+ +

+ MapReduce: Simplified Data Processing on Large Clusters +

+ +
+ + Posted on Sat, Apr 16, 2022 + + + + 📖Note + + + + Distributed + + +
+
+

diff --git "a/tag/\360\237\232\236Memory.html" "b/tag/\360\237\232\236Memory.html" index e93d861..59293f0 100644 --- "a/tag/\360\237\232\236Memory.html" +++ "b/tag/\360\237\232\236Memory.html" @@ -65,6 +65,8 @@ + +