forked from ratecda/raprism.github.io
-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.html
233 lines (223 loc) · 13.3 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<title>:: RaPrism ::</title>
<link rel="stylesheet" href="./theme/css/main.css" />
<link href="https://raprism.github.io/feeds/all.atom.xml" type="application/atom+xml" rel="alternate" title=":: RaPrism :: Atom Feed" />
<!--[if IE]>
<script src="http://html5shiv.googlecode.com/svn/trunk/html5.js"></script>
<![endif]-->
</head>
<body id="index" class="home">
<header id="banner" class="body">
<h1><a href="./">:: RaPrism :: </a></h1>
<nav><ul>
<li><a href="./category/publishing.html">publishing</a></li>
<li><a href="./category/python.html">python</a></li>
</ul>
</nav>
</header><!-- /#banner -->
<aside id="featured" class="body">
<article>
<h1 class="entry-title"><a href="./PyDataBerlin2015.html">PyData Berlin 2015</a></h1>
<footer class="post-info">
<span>Thu 18 June 2015</span>
<span>| tags: <a href="./tag/python.html">python</a><a href="./tag/data.html">data</a><a href="./tag/berlin.html">berlin</a><a href="./tag/pydata.html">pydata</a></span>
</footer><!-- /.post-info --><p>After a visit of <a href="http://pydata.org/berlin2015">PyData Berlin 2015</a>
event, which took place 29.-30. May at location
<a href="http://www.betahaus.com">Betahaus</a>, I wrote down some links as a
reminder, and decided then to put them here alongside with some
thoughts on given topics.</p>
<p>Actually there were talks as given on meeting site - 'related to the
use of Python in data management and analysis' - from variety of
fields, but with emphasis on application of machine learning.</p>
<p>In 2014 PyData Berlin event was a <a href="https://www.youtube.com/watch?v=d9Qm3PPoYNQ">talk of Travis
Oliphant</a> , during which
also 'PyData: the First 20 Years' were summarized on a slide. Actually
this story began with basic matrix calculation packages that were
precursor of current standard <a href="http://www.numpy.org">NumPy</a> and
extensions like <a href="https://www.scipy.org">SciPy</a> and
<a href="http://matplotlib.org">matplotlib</a>. With
<a href="http://pandas.pydata.org">pandas</a> and several machine learning
packages, e.g. <a href="http://scikit-learn.org">scikit-learn</a>, the PyData
ecosystem rivals <a href="http://www.r-project.org">R</a> and interacts with
'big' Java-driven 'big data' solutions centered on
<a href="https://projects.apache.org/indexes/category.html#big-data">apache.org</a> -
as described
e.g. <a href="http://www.blue-yonder.com/blog-e/2014/11/12/environment-choose-data-science">here</a>.</p>
<p>Topics presented and discussed in Berlin this year were commented in
'official'
<a href="https://twitter.com/hashtag/PyDataBerlin?src=hash">#PyDataBerlin</a>
twitter feeds, and <a href="https://www.youtube.com/user/PyDataTV">videos</a> are
on-line.</p>
<p>The keynote from <a href="http://matthewrocklin.com">Matthew
Rocklin</a> from <a href="http://continuum.io">Continuum
Analytics</a> was about
<a href="http://dask.pydata.org/">Dask</a>, which seems to be a quite elegant
approach to get 'instantly' multi-core performance for a large subset
of NumPy functionality by parallel processing. This topic leads
usually to hints about limitations for multi-threaded data management
because of <a href="https://wiki.python.org/moin/GlobalInterpreterLock">GIL</a>,
and consequently one of Matthew's message was to get rid of this in
main parts of PyData module ecosystem (<a href="http://docs.cython.org/src/userguide/external_C_code.html#nogil">with
nogil</a>
statement in cython).</p>
<p><a href="http://haenel.co">Valentin Haenel</a> talked about Blosc and related
higher-level packages (see links on his homepage). Especially the
'columnar data container' <a href="https://github.com/Blosc/bcolz">bcolz</a> could
be considered as an light-weight alternative for HDF5 file format (actually
<a href="http://www.pytables.org">PyTables</a> offers also Blosc-based
compression filter).</p>
<p><a href="https://github.com/c-abird">Claas Abert</a>'s talk about numerical
treatment of PDE with Numpy gave impresison about one 'classic' use
case for PyData packages, i.e. physical simulations by
finite-difference and finite-elements methods of <a href="http://micromagnetics.org">micromagnetic
problems</a>. The finite-elements package is
based on <a href="http://fenicsproject.org/">FEnICS</a>. Number crunching
involves also optimization of Python code, and an overview on
possibilities was given. Thanks for getting a hint about new JIT
compiler: <a href="https://github.com/cosmo-ethz/hope">HOPE</a>!</p>
<p>Mobile app marketing was one example of 'new' 'Big Data' and their
challenges. Nakul Selvaraj from <a href="http://www.trademob.com">Trademob</a>
explained demands for real-time monitoring, and his colleague Tobias
Kuhn gave some insights about statistical methods with specific
algorithms like <a href="https://github.com/trademob/t-digest">t-digest</a> used
e.g. for tracking of anomalies, or how to decompose trends on top of
seasonal variations.</p>
<p>There were 2 talks from <a href="https://pivotal.io">Pivotal</a> folks. <a href="http://cloudfoundry.org">Cloud
Foundry</a> was mentioned as their
<a href="https://en.wikipedia.org/wiki/Platform_as_a_service">PaaS</a> product,
and it's interesting to see what open source is <a href="https://pivotal.io/open-source">used
@Pivotal</a>. Here smart GPS tracking of
cars was given as one example of
<a href="https://en.wikipedia.org/wiki/Internet_of_Things">IoT</a>. In this study
presented by Ronert Obst <a href="http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm">Random
Forests</a>
were used to learn and classify possible driven routes. Although
<a href="https://github.com/apache/spark">Spark</a> usage was mentioned here and
in other talks, it was also hinted to
<a href="https://github.com/apache/flink">Flink</a> as alternative with different
memory model and ability to process data
<a href="http://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/languagebinding/api/java/python/streaming/PythonStreamer.html">streams</a>.</p>
<p>On Friday the presentation session ended with 'Get Together', but one
should not forget to mention the (en)lightning talks just before. I
pick two: Matthew Rocklin gave short example to show what the benefit
of <a href="http://pandas.pydata.org/pandas-docs/dev/categorical.html#categorical">pandas' dtype
category</a>
is. And the presentation from <a href="http://tdj.si">Tadej Štajner</a> about
<a href="http://tdj.si/automl_pydataberlin.pdf">automatic machine learning</a>
with some <a href="https://github.com/tadejs/autokit">code</a>. So how different
will be future PyData events, if one takes one statement on
<a href="http://automl.org">AutoML</a> site too serious: 'taking the human expert
out of the loop'?</p>
<p>It was not only a presentation about how <a href="https://www.ascribe.io">ascribe</a>
uses Python modules, like
<a href="https://github.com/ascribe/transactions">transactions</a> as 'Bitcoin
for Humans', but in his keynote Trent McConaghy also gave an
interesting view on how blockchains and a suitable protocol
(<a href="https://github.com/ascribe/spool">Spool</a>) under the hood can help
e.g. artists to keep in control of intellectual property of digital
objects.</p>
<p>The keynote of Felix Wick from <a href="http://www.blue-yonder.com">Blue
Yonder</a> contained really lots of
information not only about technical stuff a data scientist might or
should know, when doing e.g <a href="http://www.gartner.com/it-glossary/predictive-analytics">predictive
analytics</a>. So
in his/her life it might be not bad to know a bit about hype cycles -
but do not care too much. And you should really have a look to the
<a href="https://www.youtube.com/watch?v=Fo0Ne2pYWW4">video</a>, because it
appears to be almost a lecture, which gives a nice introduction on
'data science' in our Big Data world (be it a peak or not ...).</p>
<p>Peadar Coyle very nicely presented an interesting example how <a href="http://nbviewer.ipython.org/format/slides/github/springcoil/Probabilistic_Programming_and_Rugby/blob/master/Bayesian_Rugby.ipynb#/">sports
analytics</a> -
in this case <a href="http://www.rbs6nations.com">6 Nations</a> playing Rugby -
can be done by applying bayesian statistical models with
<a href="https://github.com/pymc-devs/pymc">PyMC</a>. Someone in the audience
mentioned that it could be already worth switching to the successor
<a href="https://github.com/pymc-devs/pymc3">PyMC 3</a> to perform Markow chain
Monte Carlo fitting. Actually
<a href="http://nbviewer.ipython.org/github/aloctavodia/Doing_bayesian_data_analysis/blob/master/IPython/Kruschkes_Doing_Bayesian_Data_Analysis_in_PyMC3.ipynb">this</a>
notebook seems to be a good tutorial for learning or migration
purposes.</p>
<p>The second presentation of Blue Yonder folks was an
introduction to
<a href="https://spark.apache.org/docs/latest/api/python/index.html">PySpark</a>
DataFrame objects, which are created to make use of - for instance -
columnar data stored on <a href="http://hadoop.apache.org/">Hadoop</a> HDFS
clusters in <a href="http://parquet.apache.org">Parquet</a> format. It was
concluded that such an API for distributed file access is too costly
in terms of efficiency compared to NumPy arrays or Pandas dataframes,
when data fits on one machine.</p>
<p><a href="http://albahnsen.com">Alejandro C. Bahnsen</a> presented his work
<a href="https://github.com/albahnsen/CostSensitiveClassification">CostCla</a>,
which examplifies usage of scikit-learn classifications on financial
topics like <a href="http://nbviewer.ipython.org/github/albahnsen/CostSensitiveClassification/blob/master/doc/tutorials/tutorial_edcs_credit_scoring.ipynb">credit
scoring</a>.</p>
<p>Brian Carter (IBM Software Group) gave an overview on web text mining
processes. In this field the <a href="http://www.nltk.org">NLTK</a> module seems
to be the standard for language processing with Python.</p>
<p>When dealing with similarity of words like Miguel Fernando Cabrera (TrustYou) did with hotel reviews, <a href="https://github.com/danielfrg/word2vec">word2vec</a> has the metrices needed.</p>
<p>I didn't attend the tutorial sessions, so it will help to have a look
at respective videos, if one wants to learn and understand better
e.g. how interactive graphics e.g. within <a href="http://ipython.org/notebook.html">IPython
notebooks</a> can be created with
<a href="http://bokeh.pydata.org">Bokeh</a>, how
<a href="https://docs.docker.com">Docker</a> could be seen in between tools like
virtualenv and 'normal' virtual machines, and what one could use
instead of
<a href="https://ipython.org/ipython-doc/dev/interactive/magics.html?highlight=timeit#magic-timeit">%timeit</a>
for code profiling.</p>
<p>Someone referred to the
<a href="https://amplab.cs.berkeley.edu/projects/velox">velox</a> model server,
and I noted also <a href="https://github.com/spotify/luigi">Luigi</a> for
pipeline creation of batch jobs - if you remember the right context,
talk or discussion, those were given then feel free to message me
<a href="https://twitter.com/RaPrism">@RaPrism</a>.</p>
<!-- Local Variables: -->
<!-- mode: rst -->
<!-- End: --> </article>
</aside><!-- /#featured -->
<section id="content" class="body">
<h1>Other articles</h1>
<ol id="posts-list" class="hfeed">
<li><article class="hentry">
<header>
<h1><a href="./test-post.html" rel="bookmark"
title="Permalink to Trapattoni '98">Trapattoni '98</a></h1>
</header>
<div class="entry-content">
<footer class="post-info">
<span>Sun 17 May 2015</span>
<span>| tags: <a href="./tag/pelican.html">pelican</a><a href="./tag/publishing.html">publishing</a></span>
</footer><!-- /.post-info --> <p>This is a test page.</p>
<a class="readmore" href="./test-post.html">read more</a>
</div><!-- /.entry-content -->
</article></li>
</ol><!-- /#posts-list -->
<p class="paginator">
Page 1 / 1
</p>
</section><!-- /#content -->
<section id="extras" class="body">
<div class="blogroll">
<h2>blogroll</h2>
<ul>
<li><a href="http://planetpython.org/">Planet Python</a></li>
</ul>
</div><!-- /.blogroll -->
<div class="social">
<h2>social</h2>
<ul>
<li><a href="https://raprism.github.io/feeds/all.atom.xml" type="application/atom+xml" rel="alternate">atom feed</a></li>
<li><a href="https://github.com/prismv">GitHub #source</a></li>
<li><a href="https://github.com/RaPrism">GitHub #publish</a></li>
<li><a href="https://twitter.com/RaPrism">Twitter</a></li>
</ul>
</div><!-- /.social -->
</section><!-- /#extras -->
<footer id="contentinfo" class="body">
<p>Powered by <a href="http://getpelican.com/">Pelican</a>. Theme <a href="https://github.com/blueicefield/pelican-blueidea/">blueidea</a>, inspired by the default theme.</p>
</footer><!-- /#contentinfo -->
</body>
</html>