alibaba · Gituser143 · Jun 3, 2020
diff --git a/README.md b/README.md
@@ -2,36 +2,36 @@
 
 ## Overview
 
-The *Alibaba Cluster Trace Program* is published by Alibaba Group. By providing cluster trace from real production, the program helps the researchers, students and people who are interested in the field to get better understanding of the characterastics of modern internet data centers (IDC's) and the workloads.
+The *Alibaba Cluster Trace Program* is published by Alibaba Group. By providing cluster trace from real production, the program helps the researchers, students and people who are interested in the field to get better understanding of the characteristics of modern internet data centres (IDC's) and the workloads.
 
 So far, two versions of traces have been released:
 
 * *cluster-trace-v2017* includes about 1300 machines in a period of 12 hours. The trace-v2017 firstly introduces the collocation of online services (aka long running applications) and batch workloads. To see more about this trace, see related documents ([trace_2017](./cluster-trace-v2017/trace_201708.md)). Download link is available after a short survey ([survey link](https://goo.gl/forms/eOoe6DwZQpd2H5n53)).
 * *cluster-trace-v2018* includes about 4000 machines in a period of 8 days. Besides having larger scaler than trace-v2017, this piece trace also contains  the DAG information of our production batch workloads. See related documents for more details ([trace_2018](./cluster-trace-v2018/trace_2018.md)). Download link is available after a survey (less than a minute, [survey link](http://alibabadeveloper.mikecrm.com/BdJtacN)).
 
-We encourage anyone to use the traces for study or research purposes, and if you had any question when using the trace, please contact us via email: [aliababa-clusterdata](mailto:[email protected]), or file an issue on Github. Filing an issue is recommanded as the discussion would help all the community. Note that the more clearly you ask the question, the more likely you would get a clear answer.
+We encourage anyone to use the traces for study or research purposes, and if you had any question when using the trace, please contact us via email: [aliababa-clusterdata](mailto:[email protected]), or file an issue on GitHub. Filing an issue is recommanded as the discussion would help all the community. Note that the more clearly you ask the question, the more likely you would get a clear answer.
 
-It would be much appreciated if you could tell us once any publication using our trace is available, as we are maintaining a list of related publicatioins for more researchers to better communicate with each other.
+It would be much appreciated if you could tell us once any publication using our trace is available, as we are maintaining a list of related publications for more researchers to better communicate with each other.
 
 In future, we will try to release new traces at a regular pace, please stay tuned.
 
 ## Our motivation
 
-As said at the beginning, our motivation on publishing the data is to help people in related field get a better understanding of modern data centers and provide production data for researchers to varify their ideas. You may use trace however you want as long as it is for reseach or study purpose.
+As said at the beginning, our motivation on publishing the data is to help people in related field get a better understanding of modern data centres and provide production data for researchers to verify their ideas. You may use trace however you want as long as it is for research or study purpose.
 
-From our perspective, the data is provided to address [the challenges Alibaba face](https://github.com/alibaba/clusterdata/wiki/About-Alibaba-cluster-and-why-we-open-the-data) in IDC's where online services and batch jobs are collocated.  We distill the challenges as the following topics:
+From our perspective, the data is provided to address [the challenges Alibaba face](https://github.com/alibaba/clusterdata/wiki/About-Alibaba-cluster-and-why-we-open-the-data) in IDC's where online services and batch jobs are collocated.  We distil the challenges as the following topics:
 
 1. **Workload characterizations**. How to characterize Alibaba workloads in a way that we can simulate various production workload in a representative way for scheduling and resource management strategy studies.
-2. **New algorithms to assign workload to machines**. How to assign and reschedule workloads to machines for better resource utilization and ensuring the performance SLA for different applications (e.g. by reducing resource contention and defining proper proirities).
-3. **Collaboration between online service scheduler (Sigma) and batch jobs scheduler (Fuxi)**. How to adjust resource allocation between online service and batch jobs to improve throughput of batch jobs while maintain acceptable QoS (Quolity of Service) and fast failure recovery for online service. As the scale of collocation (workloads managed by different schedulers) keeps growing, the design of collaboration mechanism is becoming more and more critical.
+2. **New algorithms to assign workload to machines**. How to assign and reschedule workloads to machines for better resource utilization and ensuring the performance SLA for different applications (e.g. by reducing resource contention and defining proper priorities).
+3. **Collaboration between online service scheduler (Sigma) and batch jobs scheduler (Fuxi)**. How to adjust resource allocation between online service and batch jobs to improve throughput of batch jobs while maintain acceptable QoS (Quality of Service) and fast failure recovery for online service. As the scale of collocation (workloads managed by different schedulers) keeps growing, the design of collaboration mechanism is becoming more and more critical.
 
 Last but not least, we are always open to work together with researchers to improve the efficiency of our clusters, and there are positions open for research interns. If you had any idea in your mind, please contact us via [aliababa-clusterdata](mailto:[email protected]) or [Haiyang Ding](mailto:[email protected]) (Haiyang maintains this cluster trace and works for Alibaba's resource management & scheduling group).
 
 ## Outcomes from the trace
 
 ### Papers using Alibaba cluster trace
 
-The fundemental idea of our releasing cluster data is to enable researchers & practitioners doing resaerch, simulation with more realistic data and thus making the result closer to industry adoption. It is a huge encouragement to us to see more works using our data. Here is a list of existing works using Alibaba cluster data. **If your paper uses our trace, it would be great if you let us know by sending us email** ([aliababa-clusterdata](mailto:[email protected])).
+The fundamental idea of our releasing cluster data is to enable researchers & practitioners doing research, simulation with more realistic data and thus making the result closer to industry adoption. It is a huge encouragement to us to see more works using our data. Here is a list of existing works using Alibaba cluster data. **If your paper uses our trace, it would be great if you let us know by sending us email** ([aliababa-clusterdata](mailto:[email protected])).
 
 * cluster trace v2018
   * [Who Limits the Resource Efficiency of My Datacenter: An Analysis of Alibaba Datacenter Traces](https://dl.acm.org/citation.cfm?doid=3326285.3329074), Jing Guo, Zihao Chang, Sa Wang, Haiyang Ding, Yihui Feng, Liang Mao, Yungang Bao, IEEE/ACM International Symposium on Quality of Service, IWQoS 2019
@@ -49,6 +49,6 @@ The fundemental idea of our releasing cluster data is to enable researchers & pr
 
 ### Tech reports and projects on analysing the trace
 
-So far this session is empty. In future, we are going to link some reports and open source repo on how to anaylsis the trace here, with the permission of the owner.
+So far this session is empty. In future, we are going to link some reports and open source repo on how to analysis the trace here, with the permission of the owner.
 
 The purpose of this is to help more beginners to get start on learning either basic data analysis or how to inspect cluster from statistics perspective.
diff --git a/cluster-trace-v2017/trace_201708.md b/cluster-trace-v2017/trace_201708.md
@@ -4,7 +4,7 @@
 
 As datacenter grows in scale, large-scale online service and batch jobs co-allocation is used to increase the datacenter efficiency. The co-allocation brings great challenge to existing cluster management system, in particularly to the services and jobs scheduler, which must work together to increase the cluster utilization and efficiency.
 
-We distill the challenge to the following research topics that we think are interested to both academic community and industry:
+We distil the challenge to the following research topics that we think are interested to both academic community and industry:
 
 * *Workload characterizations*: How can we characterize Alibaba workloads in a way that we can simulate various production workload in a representative way for scheduler studies.
 * *New algorithms to assign workload to machines and to cpu cores*. How we can assign and re-adjust workload to different machines and cpus for better resource utilization and acceptable resource contention.
@@ -38,7 +38,7 @@ Cpu core count is NOT normalized
 
 # Data tables
 
-Below we desribe the provided table. Reminder: not all traces will include all the types of data described here. The columns might occur in a different order, or have different names than reported here: the definitive specification of such details can be found in the schema.csv file.
+Below we describe the provided table. Reminder: not all traces will include all the types of data described here. The columns might occur in a different order, or have different names than reported here: the definitive specification of such details can be found in the schema.csv file.
 
 ## Machines
 
@@ -60,7 +60,7 @@ This trace include three types of machine events:
 * softerror. A machine becomes temporarily unavailable due to software failures, such as low disk space and agent failures.
 * harderror. A machine becomes unavailable due to hardware failures, such as disk failures.
 
-In the case of software and hardware errors, New online services and batch jobs should not be placed in the machines, but existing services and jobs may still function normally. Error reasons can be infered from the event detail field.
+In the case of software and hardware errors, New online services and batch jobs should not be placed in the machines, but existing services and jobs may still function normally. Error reasons can be inferred from the event detail field.
 
 Machine capacities reflect the normalized physical capacity of each machine along each dimension. Each dimension (CPU cores, RAM size) is normalized independently.
 
@@ -94,7 +94,7 @@ Users submit batch workload in the form of Job (which is not included in the tra
 4. task_id
 5. instance_num: number of instances for the task
 6. status: Task states includes Ready | Waiting | Running | Terminated | Failed | Cancelled
-7. plan_cpu: cpu requested for each instane of the task
+7. plan_cpu: cpu requested for each instance of the task
 8. plan_mem: normalized memory requested for each instance of the task
 
 ### Instance table(batch_instance.csv)
@@ -104,7 +104,7 @@ Users submit batch workload in the form of Job (which is not included in the tra
 3. job_id
 4. task_id
 5. machineID: the host machine running the instance
-6. status: Instance states includes Ready | Waiting | Running | Terminated | Failed | Cancelled | Interupted
+6. status: Instance states includes Ready | Waiting | Running | Terminated | Failed | Cancelled | Interrupted
 7. seq_no: running trials number, starts from 1 and increase by 1 for each retry
 8. total_seq_no: total number of retries
 9. real_cpu_max: maximum cpu numbers of actual instance running
@@ -134,7 +134,7 @@ online service are described by these tables:
 
 This trace includes only two type of instance event. Each create event records the finish of an online instance creation, and each remove event records the finish of an online instance removal. For containers created before the trace period, the ts field has a value of zero. The start time of instance creation and removal can be inferred from the finish time , since creation and removal usually finish in a few minutes.
 
-Each online instance is given a unique cpuset allocation by online scheduler according to cpu topology and service constraints. For the 64 cpus machine in the dataset, cpus from 0 to 31 are in the same cpu package, while cpus from 32-63 are in another cpu package. cpus 0 and 32 belongs to the same cpu cores, cpu 1 and 33 belongs to another cpu cores, et cetera. The cpuset allocation far from ideal and can be improved for example by considering the difference levels of interference between instances sharing the same cpu core and package.
+Each online instance is given a unique cpuset allocation by online scheduler according to cpu topology and service constraints. For the 64 cpus machine in the dataset, cpus from 0 to 31 are in the same cpu package, while cpus from 32-63 are in another cpu package. cpus 0 and 32 belongs to the same cpu cores, cpu 1 and 33 belongs to another cpu cores, etcetera. The cpuset allocation far from ideal and can be improved for example by considering the difference levels of interference between instances sharing the same cpu core and package.
 
 ### service instance usage (container_usage.csv)
 

diff --git a/cluster-trace-v2018/schema.txt b/cluster-trace-v2018/schema.txt
@@ -1,6 +1,6 @@
 # Schema
 
-This file describes the schema of each data file. 
+This file describes the schema of each data file.
 
 The index below is aligned with the data column in each file.
 
@@ -29,8 +29,8 @@ The index below is aligned with the data column in each file.
 | mem_util_percent | bigint     |       | [0, 100]                                   |
 | mem_gps         | double     |       |  normalized memory bandwidth, [0, 100]      |
 | mkpi            | bigint     |       |  cache miss per thousand instruction        |
-| net_in          | double     |       |  normarlized in coming network traffic, [0, 100]   |
-| net_out         | double     |       |  normarlized out going network traffic, [0, 100]   |
+| net_in          | double     |       |  normalized in coming network traffic, [0, 100]   |
+| net_out         | double     |       |  normalized out going network traffic, [0, 100]   |
 | disk_io_percent | double     |       |  [0, 100], abnormal values are of -1 or 101 |
 +------------------------------------------------------------------------------------+
 
@@ -46,7 +46,7 @@ The index below is aligned with the data column in each file.
 | status          | string     |       |                                             |
 | cpu_request     | bigint     |       | 100 is 1 core                               |
 | cpu_limit       | bigint     |       | 100 is 1 core                               |
-| mem_size        | double     |       | normarlized memory, [0, 100]                |
+| mem_size        | double     |       | normalized memory, [0, 100]                |
 +------------------------------------------------------------------------------------+
 
 * about app_du: Containers belong to the same deploy unit provides one service, typically, they should be spread across failure domains
@@ -61,8 +61,8 @@ The index below is aligned with the data column in each file.
 | cpi             | double     |       |                                             |
 | mem_gps         | double     |       | normalized memory bandwidth, [0, 100]       |
 | mpki            | bigint     |       |                                             |
-| net_in          | double     |       | normarlized in coming network traffic, [0, 100] |
-| net_out         | double     |       | normarlized out going network traffic, [0, 100] |
+| net_in          | double     |       | normalized in coming network traffic, [0, 100] |
+| net_out         | double     |       | normalized out going network traffic, [0, 100] |
 | disk_io_percent | double     |       | [0, 100], abnormal values are of -1 or 101  |
 +------------------------------------------------------------------------------------+
 
@@ -76,7 +76,7 @@ The index below is aligned with the data column in each file.
 | start_time      | bigint     |       | start time of the task                      |
 | end_time        | bigint     |       | end of time the task                        |
 | plan_cpu        | double     |       | number of cpu needed by the task, 100 is 1 core |
-| plan_mem        | double     |       | normalized memorty size, [0, 100]           |
+| plan_mem        | double     |       | normalized memory size, [0, 100]           |
 +------------------------------------------------------------------------------------+
 
 * task name indicates the DAG information, see the explanation of batch workloads
@@ -98,5 +98,5 @@ The index below is aligned with the data column in each file.
 | mem_max         | double     |       | max memory used by the instance (normalized, [0, 100]) |
 +------------------------------------------------------------------------------------+
 
-* Task name is uniqe within a job; note task name indicates the DAG information, see the explanation of batch workloads
+* Task name is unique within a job; note task name indicates the DAG information, see the explanation of batch workloads
 * There are totally 12 types, and only some of them have DAG info