This repository has been archived by the owner on Jan 13, 2019. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 5
/
Copy pathslide_1.Rmd
253 lines (209 loc) · 10.6 KB
/
slide_1.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
---
title: "fiRst Group Project"
author: "Ismail Batur Usta / Efehan Danisman / Ozgur Ozdemir"
output:
beamer_presentation: default
ioslides_presentation:
fig_height: 24
fig_width: 32
widescreen: yes
---
```{r setup, echo=FALSE,warning=FALSE,message=FALSE}
knitr::opts_chunk$set(error=TRUE)
library(tidyverse)
library(plotly)
library(scales)
library(wordcloud)
library(tm)
library(lubridate)
```
What we did? {data-width=350}
-----------------------------------------------------------------------
- We analyzed cuts at the power plants in Turkey between 2012-2018.
- We had in total 73313 observations with 8 variables
- We mutated new observations from the existing ones: Plant.Type, Duration of Cut, Capacity Ratio at the cut and reason of the cut.
- We tidied the raw data using regular expressions and stringr package.
- We used tidy text mining to analyze count of words and which word is following which word.
- We divided cuts into two category, Malfunctions and Planned Activities and looked for their distributions.
- We looked at differences between malfunctions and planned activities in terms of duration of the cut.
- We looked at malfunction types, malfunction reasons and durations according to plant type.
Cuts At Power Plants in Turkey(2012-2018) {data-width=650}
-----------------------------------------------------------------------
### Yearly Incidents are way higher at 2018.
```{r,fig.height = 3, fig.width = 5, echo=FALSE}
ggplot(data=yearly_cuts, aes(y=Start.Date, x=factor(year(year)), fill=factor(year(year))))+
geom_bar(stat="identity")+
labs(x="Year", y="Incident Count", title="Yearly Total Incidents")+
theme_light()+
scale_fill_brewer(palette="PuBuGn")+
theme(legend.position="none")
```
Glimpse of Cleaning {data-width=350}
-----------------------------------------------------------------------
### It was not easy
cuts$Plant.Name <- cuts$Plant.Name %>%
str_replace_all("[ý]", "i") %>%
str_replace_all("enerj.sa", "enerjisa") %>%
str_replace_all("yenikoy ts", "yenikoy tes") %>%
str_replace_all("ienikoi tes", "yenikoy tes") %>%
str_replace_all("^ova elektrik", "gebze ova elektrik") %>%
str_replace_all("yatagan .*", "yatagan tes") %>%
str_replace_all("kokluce$", "kokluce hes") %>%
str_replace_all(".* entek", "entek") %>%
str_replace_all("kurtun-hes", "kurtun hes") %>%
str_replace_all("^rwe_turcas_guney", "denizli rwe_turcas_guney") %>%
str_replace_all("tekirdag santrali.*", "modern enerji tekirdag santrali") %>%
str_replace_all("karadag", "karadag res") %>%
str_replace_all(".?menzelet( hes)?", "menzelet hes") %>%
str_replace_all("\\.", "") %>%
str_replace_all("hidro(\\s?elektrik santral[ýi]| e\\.?s)", " hes") %>%
str_replace_all("(termik santral[ýi]|\\sts\\s?)", " tes") %>%
str_replace_all("tuncbilektes", "tuncbilek tes") %>%
str_replace_all("d.*(k.*)c.*(s.*)?", "dgkc") %>%
str_replace_all("jeotermal (e.*s.*)", "jes")
Overview of Plant Categories {data-width=350}
-----------------------------------------------------------------------
We've categorised power plants by their type, doing analysis by plant name would not yield much useful results.
*HES: Hydroelectricity Plant
*TES: Thermal Energy Plant
*RES: Wind Energy Plant(Wind Turbines)
*DGKC: Natural Gas Combined Cycle Plant
*JES: Geothermal Energy Plant
Overview of Plant Categories-cont'd. {data-width=350}
-----------------------------------------------------------------------
```{r,fig.height = 6, fig.width = 9, echo=FALSE,warning=FALSE}
cuts %>%
select(Plant.Type, Plant.Name, Established.Power) %>%
distinct(Plant.Name, Plant.Type, Established.Power) %>%
group_by(Plant.Type) %>%
summarize(Mean=mean(Established.Power), Total=sum(Established.Power)) %>%
ggplot(.)+
geom_bar(aes(x=reorder(Plant.Type, -Mean), y=Mean, fill=Plant.Type), stat="identity")+
geom_text(aes(x=Plant.Type, y=Total/100, label=signif(Total, 2)))+
labs(x="", y="Average Power Output MWe", title="Power Output Based on Plant Type in Turkey", x="Plant Type")+
theme_light()+
scale_fill_brewer(palette="Greens")+
theme(legend.position="none")+
scale_y_continuous(sec.axis=sec_axis(~.*100, name="Total Power Output MWe"))
```
Cut Reason by Text Mining {data-width=350}
-----------------------------------------------------------------------
```{r,fig.height = 6, fig.width = 9, echo=FALSE}
ggplot(bigram_counttop20,aes(reorder(bigram,n),n,fill=n))+
geom_bar(stat="identity")+
coord_flip()+
facet_wrap(~Plant.Type,scales="free")+
theme_bw()+
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),legend.position = "none")
```
```{r,fig.height = 6, fig.width = 9, echo=FALSE}
cuts %>%
select(Plant.Type, Plant.Name, Established.Power) %>%
distinct(Plant.Name, Plant.Type, Established.Power) %>%
group_by(Plant.Type) %>%
summarize(Mean=mean(Established.Power), Total=sum(Established.Power)) %>%
ggplot(.)+
geom_bar(aes(x=reorder(Plant.Type, -Mean), y=Mean, fill=Plant.Type), stat="identity")+
geom_text(aes(x=Plant.Type, y=Total/100, label=signif(Total, 2)))+
labs(x="", y="Average Power Output MWe", title="Power Output Based on Plant Type in Turkey", x="Plant Type")+
theme_light()+
scale_fill_brewer(palette="Greens")+
theme(legend.position="none")+
scale_y_continuous(sec.axis=sec_axis(~.*100, name="Total Power Output MWe"))
```
Cut Reason by Plant {data-width=350}
-----------------------------------------------------------------------
```{r,fig.height = 6, fig.width = 9, echo=FALSE, warning=FALSE}
m_by_type<- catmalf %>%
group_by(Plant.Type, Malf.Category) %>%
filter(Plant.Type!="Other")%>%
count() %>%
ungroup()%>%
group_by(Plant.Type)%>%
mutate(perc=`n`/sum(`n`))
#Plot pie charts for most occured malfunction type
plot_ly(textposition = 'inside',
textinfo = 'label+percent',
insidetextfont = list(color = '#FFFFFF'),
marker = list(colors = colors,
line = list(color = '#FFFFFF', width = 1))) %>%
add_pie(data = subset(m_by_type, Plant.Type=="TES"), labels = m_by_type$Malf.Category, values = n,
name = "Thermal Energy Plant", domain = list(x = c(0, 0.35), y = c(0.50, 0.95))) %>%
add_pie(data = subset(m_by_type, Plant.Type=="HES"), labels = m_by_type$Malf.Category, values = n,
name = "Hydroelectricity Plant", domain = list(x = c(0.35, 1), y = c(0.50, 0.95))) %>%
add_pie(data = subset(m_by_type, Plant.Type=="RES"), labels = m_by_type$Malf.Category, values = n,
name = "Wind Energy Plant", domain = list(x = c(0, 0.35), y = c(0, 0.45))) %>%
add_pie(data = subset(m_by_type, Plant.Type=="DGKC"), labels = m_by_type$Malf.Category, values = n,
name = "Natural Gas CC Plant", domain = list(x = c(0.35, 1), y = c(0, 0.45))) %>%
layout(title = "Malfunction Type by Plant", showlegend = F,
xaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = TRUE),
yaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE),
annotations = list(
list(x = 0.09 , y = 1.0, text = "Thermal Energy Plant", showarrow = F, xref='paper', yref='paper'),
list(x = 0.8 , y = 1.0, text = "Hydroelectricity Plant", showarrow = F, xref='paper', yref='paper'),
list(x = 0.1 , y = 0.47, text = "Wind Turbine", showarrow = F, xref='paper', yref='paper'),
list(x = 0.8 , y = 0.47, text = "Natural Gas CC Plant", showarrow = F, xref='paper', yref='paper')))
```
Shutdown Reason by Category {data-width=350}
-----------------------------------------------------------------------
```{r,fig.height = 6, fig.width = 9, echo=FALSE,warning=FALSE}
catmalf %>%
filter(Capacityratio<=0.05 & Plant.Type %in% c("HES", "TES", "DGKC", "JES")) %>%
select(Established.Power, Power.atOutage, Plant.Type, Malf.Category, Duration) %>%
group_by(Plant.Type, Malf.Category) %>%
summarise(count=n()) %>%
mutate(perc=count/sum(count)) %>%
filter(!Malf.Category %in% c("Outside Factors", "Other")) %>%
ggplot(., aes(x=Malf.Category, y=perc, fill=Plant.Type))+
geom_bar(stat="identity", position="dodge")+
scale_y_continuous(limits=c(0,0.4), labels=percent)+
theme_bw()+
labs(x="Source of Shutdown", y="Percentage", title="Shutdown Causes")+
scale_fill_brewer(palette="PuBuGn")+
theme(legend.position = c(0.1,0.8), legend.title = element_text("Plant Type"))+
scale_x_discrete(labels=c("Control and Automation", "Utilities", "Feedstock", "Rotating\nEquipment", "Static\nEquipment", "Unspecified"))
```
Cut Reason by Category {data-width=350}
-----------------------------------------------------------------------
```{r,fig.height = 6, fig.width = 9, echo=FALSE, warning=FALSE}
#Gather plants that reported more than 1000 malfunctions in the last 6 years
m_count <- cuts %>%
filter(TypeofCut=="Malfunction") %>%
group_by(Plant.Name) %>%
summarize(m_count=n()) %>%
arrange(desc(m_count)) %>%
filter(m_count >= 1000)
#Group them according to plant name and quarters.
m_plants <- as.vector(m_count$Plant.Name)
malf <- cuts %>%
filter(Plant.Name %in% m_plants, TypeofCut=="Malfunction") %>%
mutate(quarter=lubridate::quarter(Start.Date, with_year = T)) %>%
group_by(Plant.Name, quarter) %>%
summarize(malf=n())
#Visualize
malf$quarter=as.character(malf$quarter)
ggplotly(
ggplot(malf, aes(x=quarter))+
coord_flip()+
theme_bw()+
geom_bar(aes(y=malf, fill=Plant.Name), stat="identity")+
theme(legend.position = "bottom", legend.title = element_text("Plant Name"))+
labs(x="Quarter", y="Malfunction Count", title="Quarterly Fault Count of Top Frequently Malfunctioning Plants")
)
```
Conclusions {data-width=350}
-----------------------------------------------------------------------
* Most time consuming part was data transformation and cleaning.
* Especially in 2018, number of data entries have drastically increased.
* While in average Thermal plants produce higher amounts of power, on total Hydroelectric plants' throughput is the highest.
* Each type of plant have a different leading reason for shutdowns.
Thanks {data-width=350}
-----------------------------------------------------------------------
```{r,fig.height = 6, fig.width = 9, echo=FALSE, warning=FALSE}
cutsReason.Corpus<-Corpus(VectorSource(cuts$Reason))
cutsReason.Corpus<-tm_map(cutsReason.Corpus, PlainTextDocument)
cutsReason.Corpus<-tm_map(cutsReason.Corpus,tolower)
wordcloud(cutsReason.Corpus,min.freq = 5,
max.words=100, random.order=FALSE, rot.per=0.25,
colors=brewer.pal(8, "PuOr"),scale=c(6.5,1.3))
```