diff --git a/machine_learning/4_clustering_and_retrieval/assigment/week5/.ipynb_checkpoints/5_lda_graphlab-checkpoint.ipynb b/machine_learning/4_clustering_and_retrieval/assigment/week5/.ipynb_checkpoints/5_lda_graphlab-checkpoint.ipynb new file mode 100644 index 0000000..14273eb --- /dev/null +++ b/machine_learning/4_clustering_and_retrieval/assigment/week5/.ipynb_checkpoints/5_lda_graphlab-checkpoint.ipynb @@ -0,0 +1,2406 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Latent Dirichlet Allocation for Text Data\n", + "\n", + "In this assignment you will\n", + "\n", + "* apply standard preprocessing techniques on Wikipedia text data\n", + "* use GraphLab Create to fit a Latent Dirichlet allocation (LDA) model\n", + "* explore and interpret the results, including topic keywords and topic assignments for documents\n", + "\n", + "Recall that a major feature distinguishing the LDA model from our previously explored methods is the notion of *mixed membership*. Throughout the course so far, our models have assumed that each data point belongs to a single cluster. k-means determines membership simply by shortest distance to the cluster center, and Gaussian mixture models suppose that each data point is drawn from one of their component mixture distributions. In many cases, though, it is more realistic to think of data as genuinely belonging to more than one cluster or category - for example, if we have a model for text data that includes both \"Politics\" and \"World News\" categories, then an article about a recent meeting of the United Nations should have membership in both categories rather than being forced into just one.\n", + "\n", + "With this in mind, we will use GraphLab Create tools to fit an LDA model to a corpus of Wikipedia articles and examine the results to analyze the impact of a mixed membership approach. In particular, we want to identify the topics discovered by the model in terms of their most important words, and we want to use the model to predict the topic membership distribution for a given document. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Note to Amazon EC2 users**: To conserve memory, make sure to stop all the other notebooks before running this notebook." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Text Data Preprocessing\n", + "We'll start by importing our familiar Wikipedia dataset.\n", + "\n", + "The following code block will check if you have the correct version of GraphLab Create. Any version later than 1.8.5 will do. To upgrade, read [this page](https://turi.com/download/upgrade-graphlab-create.html)." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "import graphlab as gl\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt \n", + "\n", + "%matplotlib inline\n", + "\n", + "'''Check GraphLab Create version'''\n", + "from distutils.version import StrictVersion\n", + "assert (StrictVersion(gl.version) >= StrictVersion('1.8.5')), 'GraphLab Create must be version 1.8.5 or later.'" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
URInametext
<http://dbpedia.org/resou
rce/Digby_Morrell> ...
Digby Morrelldigby morrell born 10
october 1979 is a former ...
<http://dbpedia.org/resou
rce/Alfred_J._Lewy> ...
Alfred J. Lewyalfred j lewy aka sandy
lewy graduated from ...
<http://dbpedia.org/resou
rce/Harpdog_Brown> ...
Harpdog Brownharpdog brown is a singer
and harmonica player who ...
<http://dbpedia.org/resou
rce/Franz_Rottensteiner> ...
Franz Rottensteinerfranz rottensteiner born
in waidmannsfeld lower ...
<http://dbpedia.org/resou
rce/G-Enka> ...
G-Enkahenry krvits born 30
december 1974 in tallinn ...
<http://dbpedia.org/resou
rce/Sam_Henderson> ...
Sam Hendersonsam henderson born
october 18 1969 is an ...
<http://dbpedia.org/resou
rce/Aaron_LaCrate> ...
Aaron LaCrateaaron lacrate is an
american music producer ...
<http://dbpedia.org/resou
rce/Trevor_Ferguson> ...
Trevor Fergusontrevor ferguson aka john
farrow born 11 november ...
<http://dbpedia.org/resou
rce/Grant_Nelson> ...
Grant Nelsongrant nelson born 27
april 1971 in london ...
<http://dbpedia.org/resou
rce/Cathy_Caruth> ...
Cathy Caruthcathy caruth born 1955 is
frank h t rhodes ...
\n", + "[59071 rows x 3 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.\n", + "
" + ], + "text/plain": [ + "Columns:\n", + "\tURI\tstr\n", + "\tname\tstr\n", + "\ttext\tstr\n", + "\n", + "Rows: 59071\n", + "\n", + "Data:\n", + "+-------------------------------+---------------------+\n", + "| URI | name |\n", + "+-------------------------------+---------------------+\n", + "| Learning a topic model" + ], + "text/plain": [ + "Learning a topic model" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
       Number of documents     59071
" + ], + "text/plain": [ + " Number of documents 59071" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
           Vocabulary size    547462
" + ], + "text/plain": [ + " Vocabulary size 547462" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
   Running collapsed Gibbs sampling
" + ], + "text/plain": [ + " Running collapsed Gibbs sampling" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
+-----------+---------------+----------------+-----------------+
" + ], + "text/plain": [ + "+-----------+---------------+----------------+-----------------+" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| Iteration | Elapsed Time  | Tokens/Second  | Est. Perplexity |
" + ], + "text/plain": [ + "| Iteration | Elapsed Time | Tokens/Second | Est. Perplexity |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
+-----------+---------------+----------------+-----------------+
" + ], + "text/plain": [ + "+-----------+---------------+----------------+-----------------+" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| 10        | 6.24s         | 1.4602e+07     | 0               |
" + ], + "text/plain": [ + "| 10 | 6.24s | 1.4602e+07 | 0 |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| 20        | 11.64s        | 1.47799e+07    | 0               |
" + ], + "text/plain": [ + "| 20 | 11.64s | 1.47799e+07 | 0 |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| 30        | 16.98s        | 1.50132e+07    | 0               |
" + ], + "text/plain": [ + "| 30 | 16.98s | 1.50132e+07 | 0 |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| 40        | 22.37s        | 1.53911e+07    | 0               |
" + ], + "text/plain": [ + "| 40 | 22.37s | 1.53911e+07 | 0 |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| 50        | 27.87s        | 1.49653e+07    | 0               |
" + ], + "text/plain": [ + "| 50 | 27.87s | 1.49653e+07 | 0 |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| 60        | 34.00s        | 1.38525e+07    | 0               |
" + ], + "text/plain": [ + "| 60 | 34.00s | 1.38525e+07 | 0 |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| 70        | 39.98s        | 1.46159e+07    | 0               |
" + ], + "text/plain": [ + "| 70 | 39.98s | 1.46159e+07 | 0 |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| 80        | 45.59s        | 1.33147e+07    | 0               |
" + ], + "text/plain": [ + "| 80 | 45.59s | 1.33147e+07 | 0 |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| 90        | 51.41s        | 1.1672e+07     | 0               |
" + ], + "text/plain": [ + "| 90 | 51.41s | 1.1672e+07 | 0 |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| 100       | 57.79s        | 1.27921e+07    | 0               |
" + ], + "text/plain": [ + "| 100 | 57.79s | 1.27921e+07 | 0 |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| 110       | 1m 4s         | 1.36326e+07    | 0               |
" + ], + "text/plain": [ + "| 110 | 1m 4s | 1.36326e+07 | 0 |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| 120       | 1m 10s        | 1.29229e+07    | 0               |
" + ], + "text/plain": [ + "| 120 | 1m 10s | 1.29229e+07 | 0 |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| 130       | 1m 17s        | 1.16539e+07    | 0               |
" + ], + "text/plain": [ + "| 130 | 1m 17s | 1.16539e+07 | 0 |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| 140       | 1m 23s        | 1.16374e+07    | 0               |
" + ], + "text/plain": [ + "| 140 | 1m 23s | 1.16374e+07 | 0 |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| 150       | 1m 29s        | 1.37665e+07    | 0               |
" + ], + "text/plain": [ + "| 150 | 1m 29s | 1.37665e+07 | 0 |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| 160       | 1m 35s        | 1.41662e+07    | 0               |
" + ], + "text/plain": [ + "| 160 | 1m 35s | 1.41662e+07 | 0 |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| 170       | 1m 41s        | 1.08636e+07    | 0               |
" + ], + "text/plain": [ + "| 170 | 1m 41s | 1.08636e+07 | 0 |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| 180       | 1m 49s        | 1.142e+07      | 0               |
" + ], + "text/plain": [ + "| 180 | 1m 49s | 1.142e+07 | 0 |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| 190       | 1m 55s        | 1.24898e+07    | 0               |
" + ], + "text/plain": [ + "| 190 | 1m 55s | 1.24898e+07 | 0 |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| 200       | 2m 2s         | 1.34335e+07    | 0               |
" + ], + "text/plain": [ + "| 200 | 2m 2s | 1.34335e+07 | 0 |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
+-----------+---------------+----------------+-----------------+
" + ], + "text/plain": [ + "+-----------+---------------+----------------+-----------------+" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "topic_model = gl.topic_model.create(wiki_docs, num_topics=10, num_iterations=200)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "GraphLab provides a useful summary of the model we have fitted, including the hyperparameter settings for alpha, gamma (note that GraphLab Create calls this parameter beta), and K (the number of topics); the structure of the output data; and some useful methods for understanding the results." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "text/plain": [ + "Class : TopicModel\n", + "\n", + "Schema\n", + "------\n", + "Vocabulary Size : 547462\n", + "\n", + "Settings\n", + "--------\n", + "Number of Topics : 10\n", + "alpha : 5.0\n", + "beta : 0.1\n", + "Iterations : 200\n", + "Training time : 123.0197\n", + "Verbose : False\n", + "\n", + "Accessible fields : \n", + "m['topics'] : An SFrame containing the topics.\n", + "m['vocabulary'] : An SArray containing the words in the vocabulary.\n", + "Useful methods : \n", + "m.get_topics() : Get the most probable words per topic.\n", + "m.predict(new_docs) : Make predictions for new documents." + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "topic_model" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "It is certainly useful to have pre-implemented methods available for LDA, but as with our previous methods for clustering and retrieval, implementing and fitting the model gets us only halfway towards our objective. We now need to analyze the fitted model to understand what it has done with our data and whether it will be useful as a document classification system. This can be a challenging task in itself, particularly when the model that we use is complex. We will begin by outlining a sequence of objectives that will help us understand our model in detail. In particular, we will\n", + "\n", + "* get the top words in each topic and use these to identify topic themes\n", + "* predict topic distributions for some example documents\n", + "* compare the quality of LDA \"nearest neighbors\" to the NN output from the first assignment\n", + "* understand the role of model hyperparameters alpha and gamma" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Load a fitted topic model\n", + "The method used to fit the LDA model is a _randomized algorithm_, which means that it involves steps that are random; in this case, the randomness comes from Gibbs sampling, as discussed in the LDA video lectures. Because of these random steps, the algorithm will be expected to yield slighty different output for different runs on the same data - note that this is different from previously seen algorithms such as k-means or EM, which will always produce the same results given the same input and initialization.\n", + "\n", + "It is important to understand that variation in the results is a fundamental feature of randomized methods. However, in the context of this assignment this variation makes it difficult to evaluate the correctness of your analysis, so we will load and analyze a pre-trained model. \n", + "\n", + "We recommend that you spend some time exploring your own fitted topic model and compare our analysis of the pre-trained model to the same analysis applied to the model you trained above." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": { + "collapsed": false, + "scrolled": true + }, + "outputs": [], + "source": [ + "topic_model = gl.load_model('topic_models/lda_assignment_topic_model')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Identifying topic themes by top words\n", + "\n", + "We'll start by trying to identify the topics learned by our model with some major themes. As a preliminary check on the results of applying this method, it is reasonable to hope that the model has been able to learn topics that correspond to recognizable categories. In order to do this, we must first recall what exactly a 'topic' is in the context of LDA. \n", + "\n", + "In the video lectures on LDA we learned that a topic is a probability distribution over words in the vocabulary; that is, each topic assigns a particular probability to every one of the unique words that appears in our data. Different topics will assign different probabilities to the same word: for instance, a topic that ends up describing science and technology articles might place more probability on the word 'university' than a topic that describes sports or politics. Looking at the highest probability words in each topic will thus give us a sense of its major themes. Ideally we would find that each topic is identifiable with some clear theme _and_ that all the topics are relatively distinct.\n", + "\n", + "We can use the GraphLab Create function get_topics() to view the top words (along with their associated probabilities) from each topic.\n", + "\n", + "__Quiz Question:__ Identify the top 3 most probable words for the first topic. " + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
topicwordscore
0university0.0337723780773
0research0.0120334992502
0professor0.0118011432268
\n", + "[3 rows x 3 columns]
\n", + "
" + ], + "text/plain": [ + "Columns:\n", + "\ttopic\tint\n", + "\tword\tstr\n", + "\tscore\tfloat\n", + "\n", + "Rows: 3\n", + "\n", + "Data:\n", + "+-------+------------+-----------------+\n", + "| topic | word | score |\n", + "+-------+------------+-----------------+\n", + "| 0 | university | 0.0337723780773 |\n", + "| 0 | research | 0.0120334992502 |\n", + "| 0 | professor | 0.0118011432268 |\n", + "+-------+------------+-----------------+\n", + "[3 rows x 3 columns]" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "topic_model.get_topics([0], num_words=3)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "__ Quiz Question:__ What is the sum of the probabilities assigned to the top 50 words in the 3rd topic?" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "text/plain": [ + "0.21034366078939654" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "sum(topic_model.get_topics([2], num_words=50)['score'])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's look at the top 10 words for each topic to see if we can identify any themes:" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[['university',\n", + " 'research',\n", + " 'professor',\n", + " 'international',\n", + " 'institute',\n", + " 'science',\n", + " 'society',\n", + " 'studies',\n", + " 'director',\n", + " 'national'],\n", + " ['played',\n", + " 'season',\n", + " 'league',\n", + " 'team',\n", + " 'career',\n", + " 'football',\n", + " 'games',\n", + " 'player',\n", + " 'coach',\n", + " 'game'],\n", + " ['film',\n", + " 'music',\n", + " 'album',\n", + " 'released',\n", + " 'band',\n", + " 'television',\n", + " 'series',\n", + " 'show',\n", + " 'award',\n", + " 'appeared'],\n", + " ['university',\n", + " 'school',\n", + " 'served',\n", + " 'college',\n", + " 'state',\n", + " 'american',\n", + " 'states',\n", + " 'united',\n", + " 'born',\n", + " 'law'],\n", + " ['member',\n", + " 'party',\n", + " 'election',\n", + " 'minister',\n", + " 'government',\n", + " 'elected',\n", + " 'served',\n", + " 'president',\n", + " 'general',\n", + " 'committee'],\n", + " ['work',\n", + " 'art',\n", + " 'book',\n", + " 'published',\n", + " 'york',\n", + " 'magazine',\n", + " 'radio',\n", + " 'books',\n", + " 'award',\n", + " 'arts'],\n", + " ['company',\n", + " 'business',\n", + " 'years',\n", + " 'group',\n", + " 'time',\n", + " 'family',\n", + " 'people',\n", + " 'india',\n", + " 'million',\n", + " 'indian'],\n", + " ['world',\n", + " 'won',\n", + " 'born',\n", + " 'time',\n", + " 'year',\n", + " 'team',\n", + " 'championship',\n", + " 'tour',\n", + " 'championships',\n", + " 'title'],\n", + " ['born',\n", + " 'british',\n", + " 'london',\n", + " 'australian',\n", + " 'south',\n", + " 'joined',\n", + " 'years',\n", + " 'made',\n", + " 'england',\n", + " 'australia'],\n", + " ['music',\n", + " 'de',\n", + " 'born',\n", + " 'international',\n", + " 'la',\n", + " 'orchestra',\n", + " 'opera',\n", + " 'studied',\n", + " 'french',\n", + " 'festival']]" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "[x['words'] for x in topic_model.get_topics(output_type='topic_words', num_words=10)]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We propose the following themes for each topic:\n", + "\n", + "- topic 0: Science and research\n", + "- topic 2: Team sports\n", + "- topic 3: Music, TV, and film\n", + "- topic 4: American college and politics\n", + "- topic 5: General politics\n", + "- topic 6: Art and publishing\n", + "- topic 7: Business\n", + "- topic 8: International athletics\n", + "- topic 9: Great Britain and Australia\n", + "- topic 10: International music\n", + "\n", + "We'll save these themes for later:" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "themes = ['science and research','team sports','music, TV, and film','American college and politics','general politics', \\\n", + " 'art and publishing','Business','international athletics','Great Britain and Australia','international music']" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Measuring the importance of top words\n", + "\n", + "We can learn more about topics by exploring how they place probability mass (which we can think of as a weight) on each of their top words.\n", + "\n", + "We'll do this with two visualizations of the weights for the top words in each topic:\n", + " - the weights of the top 100 words, sorted by the size\n", + " - the total weight of the top 10 words\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here's a plot for the top 100 words by weight in each topic:" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZYAAAEZCAYAAAC0HgObAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzs3Xl8FfX1+P/Xubm5udkTlgQSIKyCgggogqIQRQVXrEtr\n3a1atLX2a3+tW1Vwq1qXun3qVhe0VbB1t+6VoIgiCijKjgSyQCBk32+S8/tjJnCJ2ZB7A4TzfDzu\ngzsz75k5c3O5Z97LzIiqYowxxoSKZ08HYIwxpmuxxGKMMSakLLEYY4wJKUssxhhjQsoSizHGmJCy\nxGKMMSakLLHsY0Rkhoi88BPXvUhEPm1j+TsickFLZUWkXET6t7HudyIy8afE9VOIiF9E3hKREhGZ\n01n73Z+IyBQRWfMT1ntWRP4Yjph+ChGZLiIf7gVxvCgiV+3pODqDJZZOICLZIlIlImUissn9jxez\nG5vcnYuPWl1XVU9S1RdaKquq8aqaDdt/OG5rtu4IVf1kN+LaVWcBPYFkVf1F8AIRecxNhGUiUisi\nde77MhH5byiDEJG+boLbJCKNIpLSbLlfRJ4XkVIRyRWR3zZbPlZElohIhYh8ISLDW9lPpogUNpv3\nQgvznheRB0J1fPyE75qqXqKq94UwhlBo9zhExBf0vSkXkQYRqQya97PdCkD1XFV9dHe2sa+wxNI5\nFDhZVROAMcBhwE0tFRQR6czA9mEZwGpt4QpfVb3STYQJwF+A2aqa4L5ODnEcDcBbwNm0/ON1F9Ab\n6AOcCMxoqtmJiB94HXgcSAb+A7wmIi39v/wCiG2WeI4CtjSbNxGYt6sHISIRu7pOV6OqdU3fG1WN\nBwqA44LmvbanY9xXWGLpPAKgqpuAd4ERACIyV0TuEJH5IlIJDBCR3iLyhohsE5HVInJZs21Fi8hs\n9yzqKxEZuX0nIteJyFp32XcicnqzdT0i8ojbhLRcRI4NWneuiPyqxeCds/GBInI5cB5wrbuPN9zl\n65u2JY7r3Ti2urEmucuims60RaRYRBaKSM9W9jnMjalYRJaJyKnu/JnALcA5bgyXdOQP0GzbZ4rI\n9yJSJCIfiMjgoGWbRORPIrLCjfNxEYlsaTuqmq+qTwJLcP/GzVwAzFTVclVdBjwLXOwuOwGoVtUn\nVDUA3A/E4ySM5vupAb7GSRyISF+gGngjaF4/oC8wv6mMiPzX/R6tFJELg47xLhH5l/u3KQV+ISIx\n7rxiEfkGGN3sM7tZRPLd2tf3IjKhlc/2JRG50X0/RUTWiMgNIrJFRHJE5NyW1nPLJ4vILPdvsEFE\nbglaNtT9PmwTkQIReU5EYoOWZ4jI6+53bouI3Bu0aY+IPOQe2xoRmdxaDMHh0Oxv6n5GT4jIZhHZ\nKCJ3Np0IiMg093t6l/u9WiMipwWt+5qI/CFo+lwR+db9Dq9o7fPcF1li6WTuD8JJwOKg2ecDl+H8\nqGwEZrv/9sI5E/6LiGQGlT8NmINzlvsS8LrsOONcC0xwz9ZvBf4pIqlB644D1gDdgZnAq00/+u1Q\nAFV9CvgX8Ff3LG5aC2WvdmM8GkgDioG/u8suAhKAdKAbcAXOD+RORMSLUxN4D6fJ62rgXyIyRFVn\nsnNN5NkOxB+87YNxfuCvAFKAT4A3ZOeawjlAJjAUp5b5p13Zh7ufXjh/o2+DZn8DNNUwDnKnAXBr\nX8uCljf3CW4Scf/9BCeJTHLnHQ2sUNVt7vS/gRVAKs7JwN9E5Iig7Z0BPKuqicCrOJ9pCtAP5+93\ncdCxjHSnR7rlTwZy2/kImmTgfH96A78DHpfWm4L/hfN96Q8cDkwTt9/Pdasb48HAAcCf3fi8OCds\n3+Mk177AK0HrTQS+xPnO/R/wjw7G3tzdwEB334fj/F/+fdDyYUCVG+PVwIsi0rv5RkTkeOAh4Ar3\n/+oUYNNPjGmvY4ml87wuIkU4PwZzcZpImjynqitVtREnmRwJXKeqAVX9Buc/wYVB5b9W1ddUtQF4\nAPAD4wFU9RVVLXDf/xsniRwetG6Bqj6sqg2q+jKwCudHoj270kQ3Hfizqm5yz8RvA85yf7gDOEnt\nAHUsUdWKFrYxHohV1XtUtV5V5wJvA7/chTha8wvgVVX9VFXrcX5Qe+I0UTZ5UFUL3B/pu37ifuMA\nVLU8aF4ZzglE0/LSZusEL29uHk7ywP33U2ABMCFo3jwAERkCjMT5O9Sr6tfALJwa1Pbtqer7bow1\nOCcxt7m1qw04P8BN6nG+ZyNEJEJVs90yHVGpqne737nXcZLM4OaF3BrX0cD/p6q17vf4EdzPXlVX\nqWqWu50tOD/MwUk1XlX/rKo17vpfBG1+par+y03es4B+IpLQwfiDnQvcpKplqroZ57sR/JlWAn9x\nP/N3gc9wEnhzlwIPq+oC99g2quoPPyGevZIlls4zTVW7qeoAVf2dqtYGLcsJep8GFKlqVdC8DThn\n+D8q7/5HyXXXQ0QuFKczuFhEinHOfnsErZvXLK4NTeuGUAZOX0GRm0yX4ySUVOAF4H1gtjid2XdL\ny+37aez8uTTFmt5C2V2V5m4LADeh5zXbdvDZ+E/9jCoARCQuaF4iUB60vPmPW/Dy5j4DUtxmu4nA\np6paDBQHzWsaQNEb2Nrse9bq90hEBOekpvlxA6Cqy4HrgTuBAnGaM3caqNCGrc2mq3CTbjMZQDSw\n1f3uFAMP4iR9xGkiftn93pTgnHA1fbf7AOvbiGFzs/1LKzG0SkR8ODWejUGzm3+mBe4JX/Dylr47\nfYF1u7L/fYklls7T1hl/cKdvPtAtuO0Yp2kiOCH03b5R5wehD5DvnvE9CfxGVZNVNRmnaSB4381/\nmPu5+9wV7Y2w2Qic6CbSbm4ssW4Npl5Vb1fV4Tg1s1PZuTbWJJ+g4wyKtXli/CnycX7EAHBrUuns\n/KMavO8MOvYZ7fS5uGe0RcAhQbMPwfmb4P67fZn7txwRtHznjTs1u6XAmUCMqjb9wH3qzhvKjsSS\nD/QUkaigTTT//IJH/SnOj2/z4w7e/wuqOgGnKSgauL2lOHdDDlDe7HuTpKpj3eX34iTjg1Q1Caf5\nWILW7R/ieHaiqnXANnb+XDLY+TNNbXai1Nr/rxxgUMiD3EtYYtnLqGouTvPGXeJ0dI/EqTYHDwM+\nVEROd7/A1wA1uKOGgEagUEQ84nRqj2i2i1QR+Z2IeEXkbJw24V0dgluA8+PSmidw+oX6AYhIz6ZO\nTHGGzY5wf8wrcGoyjS1sYyFQJSLXurFmAqfg9CntrjnAz0TkKLdt/gagEKdzvMnVItJLRHoA1+H0\ne7XI/fH24/zI+d0z2yb/BG4RkYSgfoqmPqEPcQZi/Npd5484tZX5bcT+Kc7fPLjMZ+68H9zBIajq\nWpz+mjvEGUY7BieBt3UN1L+BP7uxZgBXBh3jgSIy0Y2zFqdfrKW/20+mznD2L0TkryISJ47BQZ3a\n8TjfmQr3u/WHoNXnA+UicruIRIszzPsIQm82cKuIJLl9J9ex82caC9zgfmen4gzEeLWF7fwDuEpE\njgSnGVBEukyiscTSOdo6w29p2S+BAThnOq8AN7t9DE3ewOknKMbplP2Z2+68Amdk0Rc4Z5/D+fGP\n1BfAEJwf0tuBM1W1ZBfjfBoY7jZXvNrC8ofcGD8QZ8TRAnb08/TCGVZbinNmPpcWfuzcvplTcTpH\nC4FHgQtUdZcv2Gth28twkvWTwBbgGJymyuAfytlubKtwOtjvbb4d2J5Uqt3tKJANlAQVuREnEefi\ndC7PUNVP3ThqgGk4P+DFONfmnN4sjubm4TQNBV/o+qk7r/kw47NxvgObcRLyH1X18za2fRPOGflG\nnIETs4KWReN8t7binKHHAje3sp32arRtLf8lkASsdGOZjdMRDs5IwKNxPt9XcL5HzgadvrKTgFE4\nn/UGoPmIyF2JsbUy1+P8jVfiDAZ4H3g4aPkKIAbn+/AocG5TsmfnGuJHOCcDT4pIOc4gleBBNvs0\n0TA/6MvN2g/iJLGnVfWeFso8jDPGvxK4WFWXuv9hPwF8gBf4j6re6pafAVyO88cDuFFV3wvrgZj9\nhohswkm4C/Z0LGbfISLTgNtVdWS7hbs4bzg37jZ3PApMxjn7XiQib6jqyqAyJwKDVHWIiIzDuVhs\nvKrWisgxqlrlNvl8JiLvquqX7qoPqGoorzA2xhgTAuFuCjscWKOqG9ymjdk4Vf9g04DnAVR1IZDY\ndN1F0MioKJwkGFy9sivUTbjY87qN2Q3hTizp7DxkNJcfj0pqXmb7sE+3A3oJThvxh6q6KKjcVSKy\nVET+ISKJoQ/d7K9UNc2awcyuUtU3rBnMsVd33qtqo6qOxhlOO05EDnIX/R0YqKqjcJKONYkZY8xe\nIqx9LDi1j35B03348XUIeew8dv5HZVS1TETmAlOB5aoafMHVUzgjWH5ERKxJwxhjfgJV/cndDeGu\nsSwCBotzczgfzv2X3mxW5k3cC+REZDxQoqoFItKjqYlLRKKB43GG+DXdg6nJGcB3rQWgqvZSZcaM\nGXs8hr3lZZ+FfRb2WbT92l1hrbGoaoM4D7b5gB3DjVeIyHRnsT6pqu+IyEkishZnuHHTnWp7A7Pc\nkWUeYI6qvuMu+6uIjMK5QCsb595Uxhhj9gLhbgpDnetLhjab90Sz6R89VU2di9jGtLLNlm4BYowx\nZi+wV3fem9DJzMzc0yHsNeyz2ME+ix3sswidsF95vyeJiHbl4zPGmHAQEXQv7rw3xhizn7HEYowx\nJqQssRhjjAkpSyzGGGNCyhKLMcaYkLLEYowxJqQssRhjjAkpSyzGGGNCyhKLMcaYkLLEYowxJqQs\nsRhjjAkpSyzGGGNCyhKLMcaYkLLEYowxJqQssRhjjAkpSyzGGGNCyhKLMcaYkLLEYowxJqQssRhj\njAkpSyzGGGNCyhKLMcaYkAp7YhGRqSKyUkRWi8h1rZR5WETWiMhSERnlzosSkYUiskRElonIjKDy\nySLygYisEpH3RSQx3MdhjDGmY8KaWETEAzwKTAGGA78UkWHNypwIDFLVIcB04HEAVa0FjlHV0cAo\n4EQROdxd7XrgI1UdCnwM3BDO4zDGGNNx4a6xHA6sUdUNqhoAZgPTmpWZBjwPoKoLgUQRSXWnq9wy\nUYAX0KB1ZrnvZwGnh+0IjDHG7JJwJ5Z0ICdoOted11aZvKYyIuIRkSXAZuBDVV3klklR1QIAVd0M\npLQWQGPjbsVvjDFmF3n3dABtUdVGYLSIJACvi8hBqrq8paKtbeOWW2bidY8yMzOTzMzMcIRqjDH7\nrKysLLKyskK2PVFt9Td59zcuMh6YqapT3enrAVXVe4LKPA7MVdU57vRKYFJTjSSo3M1Apao+ICIr\ngExVLRCRXu76B7awfy0tVRISwnaIxhjT5YgIqio/df1wN4UtAgaLSIaI+IBzgDeblXkTuBC2J6IS\nN2H0aBrtJSLRwPHAyqB1LnbfXwS80VoAdXUhOhJjjDEdEtamMFVtEJGrgA9wktjTqrpCRKY7i/VJ\nVX1HRE4SkbVAJXCJu3pvYJY7sswDzFHVd9xl9wAvi8ivgA3Az1uLwRKLMcZ0rrA2he1pIqLr1yv9\n++/pSIwxZt+xtzeF7XFWYzHGmM7V5RNLbe2ejsAYY/YvXT6xWI3FGGM6lyUWY4wxIWWJxRhjTEhZ\nYjHGGBNSXT6xWOe9McZ0ri6fWKzGYowxncsSizHGmJCyxGKMMSakLLEYY4wJKUssxhhjQqrLJxYb\nFWaMMZ2ryycWq7EYY0znssRijDEmpCyxGGOMCSlLLMYYY0KqyycW67w3xpjO1eUTi9VYjDGmc1li\nMcYYE1KWWIwxxoSUJRZjjDEhZYnFGGNMSIU9sYjIVBFZKSKrReS6Vso8LCJrRGSpiIxy5/URkY9F\n5HsRWSYiVweVnyEiuSKy2H1NbW3/NirMGGM6lzecGxcRD/AoMBnIBxaJyBuqujKozInAIFUdIiLj\ngMeB8UA98AdVXSoiccDXIvJB0LoPqOoD7cVgNRZjjOlc4a6xHA6sUdUNqhoAZgPTmpWZBjwPoKoL\ngUQRSVXVzaq61J1fAawA0oPWk44EYInFGGM6V7gTSzqQEzSdy87JoaUyec3LiEh/YBSwMGj2VW7T\n2T9EJLG1ACyxGGNM5wprU1gouM1g/wF+79ZcAP4O3KaqKiJ3AA8Al7a0fnb2TGbOdN5nZmaSmZkZ\n7pCNMWafkpWVRVZWVsi2J6oaso39aOMi44GZqjrVnb4eUFW9J6jM48BcVZ3jTq8EJqlqgYh4gbeB\nd1X1oVb2kQG8paojW1imw4cr330X8kMzxpguS0RQ1Q51N7Qk3E1hi4DBIpIhIj7gHODNZmXeBC6E\n7YmoRFUL3GXPAMubJxUR6RU0eQbQauqwpjBjjOlcYW0KU9UGEbkK+AAniT2tqitEZLqzWJ9U1XdE\n5CQRWQtUAhcDiMgE4DxgmYgsARS4UVXfA/7qDktuBLKB6a3FYInFGGM6V1ibwvY0EdHevZX8/D0d\niTHG7Dv29qawPc5qLMYY07kssRhjjAmpLp9Y7JYuxhjTubp8Yqmrgy7cjWSMMXudLp9YvF6or9/T\nURhjzP6jyycWn8/6WYwxpjNZYjHGGBNSlliMMcaE1H6RWGxkmDHGdJ4un1iioqzGYowxnanLJxZr\nCjPGmM5licUYY0xIWWIxxhgTUvtFYrHOe2OM6TxdPrFY570xxnSuLp9YrCnMGGM6lyUWY4wxIWWJ\nxRhjTEhZYjHGGBNSHUosIvKqiJwsIvtcIrJRYcYY07k6mij+DpwLrBGRu0VkaBhjCikbFWaMMZ2r\nQ4lFVT9S1fOAMUA28JGILBCRS0QkMpwB7i5VSyzGGNOZOty0JSLdgYuBy4AlwEM4iebDsEQWIlu3\nWmIxxpjO1NE+lteAT4EY4FRVPU1V56jq74C4dtadKiIrRWS1iFzXSpmHRWSNiCwVkVHuvD4i8rGI\nfC8iy0Tk6qDyySLygYisEpH3RSSxtf3n51tiMcaYztTRGstTqnqQqt6lqpsARCQKQFUPa20lt7P/\nUWAKMBz4pYgMa1bmRGCQqg4BpgOPu4vqgT+o6nDgCOC3QeteD3ykqkOBj4EbWoshJ8c6740xpjN1\nNLHc0cK8zzuw3uHAGlXdoKoBYDYwrVmZacDzAKq6EEgUkVRV3ayqS935FcAKID1onVnu+1nA6a0F\nUFgIZWUdiNQYY0xIeNtaKCK9cH7Mo0VkNCDuogScZrH2pAM5QdO5OMmmrTJ57ryCoDj6A6OAL9xZ\nKapaAKCqm0UkpbUA+vWD7OwORGqMMSYk2kwsOE1YFwN9gAeC5pcDN4Yppp2ISBzwH+D3qlrZSjFt\nfQsz+eILmDkTMjMzyczMDHmMxhizL8vKyiIrKytk2xPVNn6TmwqJnKmqr+zyxkXGAzNVdao7fT2g\nqnpPUJnHgbmqOsedXglMUtUCEfECbwPvqupDQeusADLdMr3c9Q9sYf/6xz8qs2bBli27Gr0xxuyf\nRARVlfZLtqzNPhYROd99219E/tD81YHtLwIGi0iGiPiAc4A3m5V5E7jQ3d94oKSpmQt4BlgenFSC\n1rnYfX8R8EZrAQwZAsXFUFragWiNMcbstvY672Pdf+OA+BZebVLVBuAq4APge2C2qq4Qkeki8mu3\nzDvAehFZCzwBXAkgIhOA84BjRWSJiCwWkanupu8BjheRVcBk4O5WDyAWuneH+fPbi9YYY0wodKgp\nbF8lIjr73w3cNsPDSSfBvffu6YiMMWbvt7tNYe2NCnu4reWqenVby/cG/4vNJzGxDwsW7OlIjDFm\n/9DeqLCvOyWKMJrj38CwmBTK8nx7OhRjjNkvtJlYVHVWW8v3BZO1B58dkYPvuUF7OhRjjNkvtNcU\n9qCq/j8ReYsWrhVR1dPCFlmIJG1+m6KDjiK6fCA7ru80xhgTLu01hb3g/ntfuAMJl1ez7yI1/xPy\nK51b6IvlFmOMCas2hxur6tfuv/Nw7g1WDBQBn7vz9nq/GPhbSiMfQEXtLsfGGNMJOnrb/JOBdcDD\nOHcrXuvelXivd8Xw66mJ+gyiG6io2NPRGGNM19deU1iT+4FjVHUtgIgMAv4LvBuuwEIlISaabp/f\nwxZ/Peu2VNK9e9KeDskYY7q0jt42v7wpqbh+wLkR5V7P5wPfD6fhia7nd/+9k0Zt3NMhGWNMl9be\nvcLOEJEzgK9E5B0RuVhELgLewrkP2F7P53Me9OWPE4oro7g169Y9HZIxxnRp7TWFnRr0vgCY5L7f\nCkSHJaIQi4pyHk3sj1XOHPobnlk6jjG9xzBtWPPnjRljjAmF9i6QvKSzAgkXn89JLN1jobLaz3/O\n/g+nvHQKE/pNoEdMjz0dnjHGdDkd6rwXET9wKc5z6/1N81X1V2GKK2SaEktcHGwta2Rcn3EM7jaY\nVYWr6NHPEosxxoRaRzvvXwB64TxRch7OEyX3ic77iAhobIT4ONhW7tw8ID0+ndyy3D0cmTHGdE0d\nTSyDVfVmoNK9f9jJwLjwhRU6Ik6tJTFWKC5zRoT1SehDXnneHo7MGGO6po4mloD7b4mIjAASgZTw\nhBR6TmLxUFKxo8aSV2aJxRhjwqGjieVJEUkGbsZ5LPBynKc47hOioiApxkO5e+V9ekI6ueXWFGaM\nMeHQoc57Vf2H+3YeMDB84YSHzwfJMR4q3cTSJ6GP1ViMMSZMOnqvsO4i8oj73PmvReRBEeke7uBC\nxeeDbn4v1RWCqjpNYdbHYowxYdHRprDZwBbgTOAsoBCYE66gQs3ng3i/B62OoLKhgbT4NPLL8+32\nLsYYEwYdTSy9VfV2VV3vvu4AUsMZWCj5fBAZCb5aL1sDAaIjo4n3xVNYVbinQzPGmC6no4nlAxE5\nR0Q87uvnwPvhDCyUoqKcxBJR42VLwBnglp5g17IYY0w4tHcTynIRKQMuB14E6tzXbODX4Q8vNJpq\nLFIdwVb3aV/WgW+MMeHR3hMk41U1wf3Xo6pe9+VR1YSO7EBEporIShFZLSLXtVLmYRFZIyJLRWR0\n0PynRaRARL5tVn6GiOS6gwkWi8jUtmLw+cDrBaojdtRYrAPfGGPCoqMP+kJETgMmupNZqvp2B9bx\n4DxxcjKQDywSkTdUdWVQmROBQao6RETGAY8B493FzwKPAM+3sPkHVPWBdgNvbMTn8xARAQ1VEWwN\nSizWFGaMMaHX0eHGdwO/x7kwcjnwexG5qwOrHg6sUdUNqhrAaUJrfr/6abiJQ1UXAokikupOzweK\nWwurI7GzdSs+H3g8EKgStgQ3hVmNxRhjQq6jnfcnAcer6jOq+gwwFed+Ye1JB3KCpnPdeW2VyWuh\nTEuucpvO/iEiia2WysnB53PeNgSEguodnffWx2KMMaHX4aYwIAkoct+3/kPeOf4O3KaqKiJ3AA/g\n3Nb/R2beey+rVx/ISy+BL2oim0p7Ak6NxZrCjDEGsrKyyMrKCtn2OppY7gKWiMhcnCaoicD1HVgv\nD+gXNN3Hnde8TN92yuxEVbcGTT6F86jkFs2cMIH1/qs59lj49PNGCsqWAdZ5b4wxTTIzM8nMzNw+\nfeutu/cI93abwkREgPk4HeqvAq8AR6hqR668XwQMFpEMEfEB5+DcxDLYm8CF7r7GAyWqWhAcAs36\nU0SkV9DkGcB3rUbgNoXV1TnPZCl0b52f5E8i0BCgvHafeKyMMcbsM9qtsbjNTe+o6sH8OCm0t26D\niFwFfICTxJ5W1RUiMt3d9JOq+o6InCQia4FKYPvjkEXkRSAT6C4iG4EZqvos8FcRGQU0AtnA9FaD\nyM3F181JLAnxwsayRlQVEdnegT8satiuHJYxxpg2dLQpbLGIjFXVRbu6A1V9DxjabN4TzaavamXd\nc1uZf2GHA8jJwdcLamshIU6IrI2kvKGBBK93ewf+sB6WWIwxJlQ6OipsHPCFiKwTkW9FZFnzixb3\nWhs3EhW147n3CXW+nYYcWwe+McaEVkdrLFPCGkU45ecT5W2gri6CuDiIr/OxNRBgMNaBb4wx4dBm\nYhERP3AFMBhYhtNHUt8ZgYWMz0e3us1sJZ24OIip8+10W5eVhSvb2YAxxphd0V5T2CzgMJykciJw\nf9gjCrX6erpVbNjeFBZdG7nTjSjtEcXGGBNa7TWFHeSOBkNEnga+DH9IodXoj2LAhnl8NfhI4uIg\nqiaSLYEawK6+N8aYcGivxhJoerPPNYG5iodmkLHuf9trLN4a7/YbUVrnvTHGhF57ieUQESlzX+XA\nyKb37nNa9nqrxw6gZ+4S5wLJeIioidg+Kiw1NpWi6iICDYF2tmKMMaaj2nseS4T7PJamZ7J4g953\n6Hkse9qS4T3w1lWRXLiGuDigekeNJcITQUpsCpsqNu3ZII0xpgvp6HUs+6wfYmqoj47nkOw3iIuD\nxirP9hoLWHOYMcaEWpdPLOulBGlsZHTe28TFQX2VZ3uNBawD3xhjQq3LJ5Z11TlEVpfSt+w74uKg\nzk0sqgpAn3irsRhjTCh1+cSydtty8qfGEtlYR2L9NqoqhGiPh9J6Z5DbqF6jWJC7YA9HaYwxXUeX\nTyxebyzfn1NDZW8PyVtWUVEBg6Oj+VtuLo2qnHLAKXyw7gOqA9V7OlRjjOkSunxiGZA8GM8rR1Cb\nUUvjyteoqIC3Dj6YD4uLOf2774j0JTGm9xg+/OHDPR2qMcZ0CV0+sQxMHkjhgFiKig+iZOEsiooU\nKYoia9QoMvx+xi5ezJEHnMNrK1/b06EaY0yX0OUTy4CkAWxMgZH9Ukiv9FJfrxx4IBTkeXhkyBB+\nm5bGe96RvLnqLeob98mbCxhjzF6lyyeWgckDWR9TC7W1JG+Nw++v4rjj8ljkPrLsd336UI+X5D6n\n8smGT/ZssMYY0wV0+cQyIGkAP0SUw+bNeH7IIT6mnsmT72DZMmd5hAj3DBxIadrZvLLi9T0brDHG\ndAFdPrEMTB7I+oZtUFQESUkcGLuN2Nil5OTseA7LlG7dGBSbxIuFJduvbzHGGPPTdPnEkpGUwcay\njTT8/CyIjubAiLWoHo/H8/H2MiLC/w07mIreZ/Bp7qI9GK0xxuz7unxi8Xv99IzpSe7px0JxMUNZ\nSULCEaQyVfDTAAAgAElEQVSnz6M66NKVQ+PjGRJRzY1rvrVaizHG7IYun1jqCusYkDyA9QOSweNh\nYNX3eDxjGD16Ht9/v3MCuWfQEL6oi2Xg5/OZsX49Kysr91DUxhiz7+ryiSXnnhwGJg/kh5L1cNxx\nZJQvIxBIRSSKVatW7VT21IzDeLZ3AyVLr2XZtnUc8803XLBiBYVBd0M2xhjTtrAnFhGZKiIrRWS1\niFzXSpmHRWSNiCwVkdFB858WkQIR+bZZ+WQR+UBEVonI+yKS2Nr+Nz2zib6evqwvXg8XXUSf6jVU\nlDVSUzOJbdvm/aj8BYecz9un3Mfn8y7gTxHf0DMykhGLFvGvggJrIjPGmA4Ia2IREQ/wKDAFGA78\nUkSGNStzIjBIVYcA04HHghY/667b3PXAR6o6FPgYuKG1GHpf2pv4ufH8UPIDnHAC8VpGzZLlJCRk\nEhHx48QCMKHfBD771Wf8feH9TKz/jrcOPpi/btzIlatXW3Ixxph2hLvGcjiwRlU3qGoAmA1Ma1Zm\nGvA8gKouBBJFJNWdng8Ut7DdacAs9/0s4PTWAuh3XT/iP4pnVf4q8HopjknHN+8jBg+eRFpaVquJ\nYmDyQB496VGu++g6RsVG89no0XxWVsZj+fkdP3pjjNkPhTuxpAM5QdO57ry2yuS1UKa5FFUtAFDV\nzUBKawUju0dy7NnHsmLrCkprSinuPZzoVUsY0LcfDQ0e8vPXtrqTKYOm0DehL08veZo4r5fXR4zg\n1uxsPi0paSc8Y4zZf3n3dAAh0mr71MyZM2moaCDpf0k82P9BTsoYQULBejyvvkJewSRWr55HevqQ\nFtcVEe457h5OfelUzh95PoOi45g1bBi/WL6cL8eMoY/fH7YDMsaYzpKVlUVWVlbIthfuxJIH9Aua\n7uPOa16mbztlmisQkVRVLRCRXsCW1grOnDkTgLJTytjg3UBVv6NJXrkG7riDwIW/o6QkC7is1R0d\nmnYoxww4hvsX3M+MzBlM7d6dq9PT+cXy5cwfPRoRaSdUY4zZu2VmZpKZmbl9+tZbb92t7YW7KWwR\nMFhEMkTEB5wDvNmszJvAhQAiMh4oaWrmcon7ar7Oxe77i4A32gtkcs/JfLDhA+oyhpBSkwsxMRy8\nuQ6fb167HfJ3HHMHD3/5MJsrNgNwXb9+bAsE+LS0tL3dGmPMfiesiUVVG4CrgA+A74HZqrpCRKaL\nyK/dMu8A60VkLfAE8Jum9UXkRWABcICIbBSRS9xF9wDHi8gqYDJwd3uxjB4zmsbaRnL7RdCrbDXc\ndBNHvvkUDQ311NSsb3PdAckDuGTUJZz36nkUVRchIlyVns4jee1VrIwxZv8jXXn4rIho0/GVLy7n\ngocuoM/xmdx5ye0kbvyO+ikn8ewZ8Rx30fH07/9nnNHRLatvrOf6j67ntZWv8dovXqN/94Po/8UX\nfHvYYdbXYozpUkQEVf3J7fxd/sr7JrEjYzl02aF8XfQenyacAk89hffWWzj031vYlP8GX355IPn5\nT9DQUN3i+l6Pl/tOuI87jrmDyc9P5q3lczg3JYXHbfixMcbsZL+psQB8ctwnnDjxZI58fgEflhwD\nq1ezvn8mm6++kwP/mMDGjfdSV7eZww77qs3tfrP5G8579Twi4wexIeMa8iccjT8iItyHY4wxncJq\nLLugz9g+DA4MY0n3TXDqqfDwwyycMoMeD95C4ZajOfjgN6iuXktd3dY2t3NIr0P45opv+MPIs6gu\n/Y7D37iOLZWtDkwzxpj9yn6VWBKOSGBsznjKU9+Dm26CRx/lZw8fQ1yyl7sP/TfXXRdBbOx4ysoW\ntLutCE8EFxxyAf8adxYFiUfwu3d/1wlHYIwxe7/9K7GMT+DQBSOp6/ceOnAQTJtG1N//Ru9n/sLj\nKTdTvCXA449P4JlnPuO++2Dx4va3Oa1nKtExaSwo2sLbq98O/0EYY8xebr9KLL4UH4c0DofoIs75\n97k8d0of6h99mPpRI/Fm9OGpCc9xww0TGD36M7KzYcoU+PTTtrcZIcKpPXpw4uE389t3fkt5bXmn\nHIsxxuyt9qvEApB8RDIJLy5gQu/j+dy7mdcPFJ77zZHU3D4DbruNfj1HEhX1DQ89VMOVV8Jbb7W/\nzcykJPI8PTh2wLHc9PFN4T8IY4zZi+13iSXhiATiyvpyXPdLeOLUJ/jZDc8zfmUFZ2+8j8Cho4l4\nYhYxMcOoqPiak06Cd99tf5sTExOZX1rK3cfdy8vLX2Zh7sLwH4gxxuyl9svEcoq/gHPPheJiiJg4\nieHZVUQGGjn/1Doa7r2HxJixlJZ+xtixsGkT5OS0vc2ePh99o6LIafDx0NSHOP+18ymtsdu9GGP2\nT10+sUyePJk33niDhoYGAGIPjuW86vUc3rOCqcc2Uu5JQIYO5aV+f2BbZD2XnxtP/LxtlJZ+RkQE\nnHBCx2otmUlJZJWU8PPhP+eEgSdwyRuX2EPBjDH7pS6fWC699FLuvvtuDjnkELKzs/F4PRz4z2H8\nvvtGUldv5ZgepRSUjaD2gfd58ZAXWT0oiVvnv0lJ8aeoaoebwyYlJTHPvSnlA1MeIK88j/s/vz/M\nR2eMMXuf/ebK+0ceeYS7776bt99+m9GjRwNQX69cdHYDVW++wzU8zG8jXyPeV8j3506jW6A/mdVP\nc90tPRk3DrZuBZ+v9X1tqavjgIUL2XbUUUSIsKFkA+P+MY6Xz36ZiRkTO+NwjTEmJHb3yvv9JrEA\nvPrqq1xxxRW88MILTJkyZfv82oISIgf15fO3tpGzBnLmr+C+tOPxf382pwbuZlFxPHfdBcce2/b+\nDvryS1448EAOjY8H4L2173Hpm5ey6PJFpMWnheUYjTEm1OyWLrvgjDPO4LXXXuO8885j3bp12+dH\npSbhOWAIE6K+4pxf+/jT84fwfvcB1I99jmd+mMvP+hZ1uJ9lXtBji6cOnsqVh13JmS+fSW19bTgO\nyRhj9jr7VWIBmDBhAieffDIfffTRzgsmTYJ587ZPDj7jdq5OrSbqtJtY899Slr3S/oWPTR34wW48\n+kbS4tO46p2rrDPfGLNf2O8SCziP4fzR852bJZbYgZM5YmAkKVEbeHnEfC4bcj/fvvoWpV9vo2JZ\nBXVb6n603YmJiXxaWkpDUALxiIfnpj3HgtwFPPH1E+E6JGOM2Wvst4ll7ty5O9cgJk6EBQsgEABA\nJIIhB/2d62K9aOZNvBndly3bbmDJlgwWf3wSX1x1C9W5ZTttt1dUFL18Pr6tqNhpfnxUPK//4nVu\nmXsLv37r1zy9+GmWFSyjobEh7MdqjDGdbb/qvA/Wv39/3nvvPYYNG7Zj5iGHwJNPwrhx22fpxo2M\nueMQ1i+/l1P6X8bGjYXExX3IxZOfIDljDTVRdzJhwvl06+YF4MrVqxHgwtRU+riJxutx8veabWt4\nb+17fJn/JV/kfkGyP5kPLviAJH9S2D4DY4zZVTYqrA1tJZaLL76Y8ePHc8UVV+yYefXVkJ4O1123\nU9lP/3knJ3z9KNPy1zBqdBwJCVBR1EDgn+9R/7O38SXkkpIyg0svPYxvKyqYkZ1NXm0tebW1bA0E\n6OXz0S8qin5+P79MSeGU7t0BuOb9a1iQs4APL/iQRH9i2D4HY4zZFZZY2tBWYnnuued49913mTNn\nzo6Zr74KjzwCH34IXu9O5SddO5iKTSkMSbmZ2IpjqCjxs3V5Dduy69kUHcHxxz/E8ccP4+KLT99p\nvUBjI/l1dWyoqWF1VRUP5+UR4/Fw58CBHJuUxNXvXs2i/EV8cMEHJEQlhPwzMMaYXWWJpQ1tJZbs\n7GzGjRvH5s2bEXE/v/JyOP102LwZ7rsPpk4Fd1le8UaevPssPqz6jmW9PIzsfQixkbFUfl5JZJ94\nFi7OYNKILxnR+zBuvOAeusV0a3G/jarM2bKFGdnZpPp8XJ2ezv8W3cqXuZ9z3sHncWjaoYzpPcaS\njDFmj7HE0oa2EgvAgAEDeOeddzjwwAN3zFR17pX/pz9B//7w7LOQFnRx44MPUvrgPSx+YiZ1g/pT\n9n0Z6/66jrUnRfCvr7Lpd9iLlEktz078f4wfMYX4+LFERMT8aN/1jY28VljIQ7m5bKytZaynkKKS\nNeSWrCWnZB1/OPh07jziih1JzxhjOokllja0l1guueQSxo4dy29+85sfLwwE4M474V//gv/9D/r1\n27Hstddg+nT4/HMYNIiN92yk+KNiHvu2OwuKYxlxwdl8mraEmwdmMLBvNg0Nt9Kz529JT/eQkrK9\nErTd1+XlvFRQQGlDAzWNjWytKefDrTl0j05mWko6l/TqxZGJ1gdjjOkce31iEZGpwIM4Q5ufVtV7\nWijzMHAiUAlcrKpL21pXRGYAlwNb3E3cqKrvtbDdNhPLrFmz+O9//8vLL7/c+gE8+CA89BB89BEM\nGrRj/kMPwQsvwGefQVQUAI2NcMopUFMcIHDQ5ZTIN1y97hZSf/kX6mlk7v33sq50NINPS+CE0yI4\n5hiIjW15t59t/IxT37iKy497imeKavlw5EhGubeKMcaYcNqrE4uIeIDVwGQgH1gEnKOqK4PKnAhc\npaoni8g44CFVHd/Wum5iKVfVB9rZf5uJZcOGDYwdO5aCgoK2m5yeeALuuAOeecYZipyQ4DSZnXGG\nU5N56KHtRSsr4aWXYPacRj7pcT6JyQUcsfGPDIz0EDP4dY5WwffUaeSRwkc1PSg8Kp3jTo7g5JNh\n6NCddztr6Sxu++Q2bvjZB9yRu4VFhx5Kz7buhGmMMSGwu4nF236R3XI4sEZVNwCIyGxgGrAyqMw0\n4HkAVV0oIokikgoMaGfd3e58yMjIIC4ujuXLlzN8+PDWC06f7iSTW26Bb791hiSPHg0jR8Ljj8Oo\nUXDJJYBTA7nsMrjsMg85+c/xh//cx/xe1zK3oZikvJ/xWvEA/jz7Usb7/szBj06k+JNcFtdlcNy9\nvTlopIcbbnBuAiACF426iBWFK/jDP0cjAy+nf9EyDtr0LD2ik+gR04Me0T04deipHDugnbtjGmNM\nJwp3jeVMYIqq/tqdPh84XFWvDirzFnCXqi5wpz8ErsNJLC2u69ZYLgZKga+A/09Vf/TIxvZqLAC/\n+tWvKCoq4swzz2TMmDEMHToUr7eNfFtfD6tXw+LFsHSpcxuYr7+Gs85yRpQNGQLDh0PMzh32y7cu\n59/fvcLfX/+K6vhvqIvK4aAkP7/sPZWjXr4IWZrKistHcddTUXTvDtdeC6edBhERUFhVSGltOb9a\nl0+SNHBhbDmVtdvYVL6JpxY/xYDkAfzl2L8wNn1s238QY4zpgL29KeynJJaPgGtpO7H0BApVVUXk\nDqC3ql7awv51xowZ26czMzPJzMzcqUx+fj4vvPACS5YsYfHixWzatInDDz+co446iqOOOopJkybh\na6/56ZFH4MYb4bjjIDvbeZ7x9dfDFVeA379T0bIyOPpoOPvccvpPfoinFj/J0q25HJvYkyErjyTz\nssvJ/n44sx7pS+FWD7//PVxwASQnQ2l9PectX86npaUcl5zMGT17coDfx7urXuOxhX9jfOpB/O2E\ne+if1L8Dfx1jjHFkZWXtdP/EW2+9da9OLOOBmao61Z2+HtDgDnwReRyYq6pz3OmVwCScxNLmuu78\nDOAtVR3Zwv7brbE0V1xczOeff878+fOZN28eq1at4swzz+Tcc8/l6KOPxuNp5fZq06dDZCQ8+qjT\nXHbTTU6N5sYb4eSToW/f7UVzc2H8eLjySjjqKIhNW8Pb629j8Yr3yKkvY3Ojn9K6WtL9faEwg6Kl\nR1Pz+Z9IT4mhXz84/qwAsccV8mF1IRtqaqhsbKSyoZ6iumoai77i1G4J/N+4C+gVbZ39xphdt7fX\nWCKAVTgd8JuAL4FfquqKoDInAb91O+/HAw+6nfetrisivVR1s7v+NcBYVT23hf3vcmJpLicnh5de\neolZs2Zx9NFH8/jjj7dcsKjIaQJ7800Y6zZJLVgADzzgNJfFxztVlYEDITmZ7yoH8GDWIazYlMSK\nDbF4I4WJk4ShBV8x6sRn6J65itzqUjZUlvFRXg7flzZw/sBJHB19Pe++OZQ33+zBxInRHHwweDxO\nk5kkBFjVZzX/9S5iW3QqMYFCEmpzSazL57jEOK4+9FcM6T5ktz4PY0zXt1cnFtg+ZPghdgwZvltE\npuPUPp50yzwKTMUZbnyJqi5ubV13/vPAKKARyAamq2pBC/ve7cTSpKSkhCFDhjB//nyGNh++1eSF\nF+Bvf4Mvv9z5ljCqsHIlzJ/vVFeKina8iovR4hJyt0Yxt+xQPow6jferT6BEo2gUoREhPh4yDv+c\n0lF/pij2ay7zTqVfbCnR0RWo9qO6+hhKSk6muDidnBzYuBFWVazkoJ8VccDJ9WyMamBxVR2SM4fD\nyOWaw3/LaUNPs4svjTEt2usTy54UysQCcNddd/HNN98we/bslguoOs8vzsiAc8+FCRNav1ClJTU1\nkJdH4/qNFP5zIxvfiqWxIkBk4DPWxGSwwj+Id0cuYdlhL3HIF59RWdqD4uJaSkqUyko/UVG1REQo\nHo/i9TojyyorvSQkRJE6oIGClBLKupfi9S8gacA2zpo0iQEpqQxOiGJwdDT9/X58rTX1GWP2G5ZY\n2hDqxFJZWcngwYN59913GTVqVMuF8vKcIchZWbBkiXMr/iOOgMMPd66B6dvXabvqAFWl5OMSit4r\npHFLGQ2F5TSsyuXeA15kSb887pt1H36vHzxQ7w3QOGYLST8XosYFqK7dQmHhWgoKsikubmDNmmNY\nvvx4Vmw7gGKU+i2gJX7oWYEkePH0qKMxqY6YaCHFF0maP5Jh3fyMGxxFQgLExTmhDxq0a7nSGLPv\nscTShlAnFoBHHnmE999/n7fffrv9wpWV8MUXsHCh81q0CLZudYZ49ewJgwdDZiYcc4xzTUxHEo4q\njbfO5Oe5fyNq0nE8d9ocBEEblYolFeTcl0PF0grSrkwjbnQcUWlReFLLKeU9tha+TFnZF3Tvfgpp\nadPZWj2Iy595kHUbKzmaG9i0No286gClEbVURgaoigqg5RHEbIuhW52fxpoICgshKcm5dOeoo5xK\n2WGHOV1I1rJmTNdgiaUN4UgstbW1HHDAAbz00ksceeSRu76BQAC2bXMSzIoVMHcufPwx5OfvGJos\n4owxvvPOHw1XblJ19+0cl3Mna1Mj6RORTJonkdToHiSmDyK6MQX5ykNKXgopOSmkrEnBU+jB4/Pg\nSS/Dc/LHNB73BhGxPtL7T2deYTHXzvs/bpl0O1cctuPGl42qvJdfyv+tKmBu41akJoKIjTFUfZVA\nWnkCw5Oiyf3Wx7plEQQCTl9QQgJ07w49eji5MynJuaQnNtZJPmlp0KePc41pr14QHf1T/xLGmHCx\nxNKGcCQWgGeeeYbHHnuMOXPmMHDgwNBstKRk+2ORqayEP/4RVq1yboI58kcjqQFoeOF5Chb+j3yp\nIM9TQUHlFspy1lLmF4r69SQ7rp61keVke8rwerx4PV4iI6KIIBLqgEAdHm8N8f4GEqMDxKqPOPWR\n6o0jPaobfXzd6e7z09MXQ0ykl4DfS3m0l21+Dyu3eMkrjcTfK5LCSB/R3oEcFj2aCTGHECiPpbAQ\nCgudw6qqcg6ptNTJn3l5kJMDBQXObdZSUuCgg2DaNDj1VGfaGLPnWGJpQ7gSS319PXfccQePPPII\nZ555Jn/+85/JyMgI7U5U4fnnnQTz8587t/Dv2dN5JSc7VYGkJOe0P7gJTRXWr3ea3YqLobKShvJS\nqnOzCaxZRf3aVQRqq2n0RdIQGUF1ZAwFcWnkJSSQ20vJjy8jP6qMAl8Fhb5qiiMDFHlr8ajQKxBH\nel0yfWqTSfR4UX8DNdKIJyZAZGIFEXGlRPnL8HsjifTF4o9KoNE/kLK4qXgSj2FgbDKn9+xNzyg/\nHvGg6lwwWlDg3Lzg9dfh/fdh2DDncNPSoHdvJ9GkpDiH3q2bU/OJj3eSkjW/GRN6lljaEK7E0qSo\nqIj777+fxx57jJiYGKKjo4mOjiYxMZHU1FRSUlLo27cvU6ZMYfTo0T9teO/69fDvf8OWLTteJSXO\n6X9RETQ0OH00xx4LRx7pXCfT1l2QVaG62rk1TX091NVBRYXzC19a6oxMc5dVVZRQW1FCXXkJRRVb\nWVuxkbW1m1hfv42K8ioiyxqojIylxOenOsJDjUeo90QRQRQVMeVsTdpGTXQleGsJaAO1jUIjSqMq\nghDtjSTRF01CVAzJ/kRSYlPpHt2H+tIMApUJ1JTHUlUaS2VpFOWlPipKfFRt6U1t9mgqyrz4fM7N\nDk47zbmrtNV0jAkNSyxtCHdiaVJVVcW2bduorq6mqqqK0tJStmzZQkFBAWvXruXtt98mEAhw2mmn\nccQRR3DggQcydOhQYmJ+/ACwXZab6/TT/O9/Ti0lO9vplxkwwGlfGjECDj7YGQLdrZvzCtUdkuvr\nnbsMLF4M2dk0rv+BwLo1eNZvpKY4gYKYKWxlCKX1iXi7NSJjv6E2rpFSPxQlKJXxFdRGlFIvFdT5\nS6iOLaQmsogaqaA84KO8zk9lwEdVTTzVNUlU18VR5imkJjKPA/wTGR47kepN/dm4PI2Vi9KIqOpN\nTJSPmBinXychge39PomJO15DhjjXsA4aZDUeY1piiaUNnZVY2qOqLF++nLfeeoslS5awfPly1q5d\ny7hx43j00UcZMWJEKHfmDAz44Qf4/nv47jtYtszp3Ni2zanldOvmjEabPNm5lXJystOc5vE4vezu\n82V2S3U1bNgAK1fSuHQpmz97n4ivswnUJ7GiTwY/JKVQFBlNcXR3KiN74g0kkbotnt5bY+lZFEF0\nZAW1vYqp7b2NhkEb8A5eTXT/VXiiqli3cAqfr+3Lmqj1lPi3UObfRrl/KxX+bfiquuErTcNbmQr1\nsUhDHNTH429IJaohjWhNo6o0hcL8RALlyaQmJRDj9+L3OwMJkpJ2JKLevZ183L+/0zzXq9fufyzG\n7AsssbRhb0ksLQkEAjz99NPcfPPNXH755dx8881Ed8YQKVXn0vymWs78+U7PekOD86qqchJLjx7O\ncOhrroETTwzNqb0qrFvn7Dcry7lZZ2kplJWhBQVopJfaPr2pTEmiBqW2LoKaOi+lDRFsiPTzQ0wy\n9f2EvuNW02fg92wtGklNbU8aAtE0BKIpr4ghoioKqYX6cj81JfFU1VVR3lDBNtlGoW8r26IKKY8q\npTKqgnJ/FdVRlYgK3kA0kXXRxJb2Ia54INGlA/HWptGoiVRVJ7OpJJme0T6OPCiCzAlRDBocS6/e\nMaT1iyMhORZPhF1YaroOSyxt2JsTS5NNmzZxzTXXkJWVRVpaGpGRkURGRhIREbG9TGJiIqNHj2bM\nmDGMHDmS2NhYvF4vERERxMXF7VR2t6lCebkzpGvhQrj7biepXHstHHig04wWFeW0NSUmOv+GKukU\nFTl9Snl5TpJTdV5VVU7yKS2lKucHalcsg/x11AwupibZQ22MUBMn/NBL2DS0DyQnkkIeivCdZxwb\nvGOp8fSiwZsMEd3B2x1fnRBd1khsFSShxGo9voZqioty2VyUTX5ZNpvKtrG1rJJAZAW+mErqGuuo\n0jpqPLU0RNTS6K2ByGrw1OOpjSeiLg5fXRze+ih8DX589X58DdH4GqPxNfrdl4+oBj8+vPgiPPi8\nQlSkkJ4QR0b3JHp060F8fDwRvgi8Pi+R/kh8ST4iEyPxdfORHJ9Mz9iezoWxxoSJJZY27AuJpcna\ntWspKyujrq6OQCBAY2Pj9mWFhYXbb+u/bNkyampqaGhooL6+nurqarp160Zqairp6ekMHjyYIUOG\ncMABBzBs2DD69evX+h2ZO0IV3nnHeTRAQQHU1jqvykqnwz8QcC5cGTLEaS864ADnMn1wEk5kpDMd\n/Goa1hUT4/QH+Xw/LTnV1+8YeFBURP3771L5/NOwaRMLB0dRkRFFYAREDqhF4+rx+AP4fQEC6mF9\nTTfy6tMo9PSlNqoHdd7uVHm7k09fKuX/b+9MYyw7rvv+O3W3t/cyPd2zaZRRLFOLLToEnMh0DCdR\nEEtOLBmIIVIIDMdOAAdQEiUOHEvKB33Ih1hwFtjIIgh2BNtQvMiEbAZ2HEERZNlh5IWULVkiOSQE\nczhb93Cmp/v1e+9uVScf6t3Xr4czwxGnh6S66zc46HvrVd2l5r37v6fqVFWXnAhLhIqhrAzbGtGf\nWI5tVBy5aGFLKEcRxTBmZz2h2EzIR46iqLBqibKSKC0wcY1kI0hGkORTK9CoABSHw+IYs4Nmm2Td\nq0TpCDEWMRakJsZhsERaU2bbDDtbxC5mebTMseExju0cY3W0ykK5wKAaMKgHtKM2WZqRZRnthTb9\n030GZwYMXj/ApAYxghhhsbNIt9PFpIaoGxEfiTFx8L4OO0FYbsM3k7C8XOq65sqVK6yvr3P+/Hme\neeYZnn32Wc6ePcuTTz7J5uYm9913H2tra6RpSpqmHD9+nIceeogHH3zw7ieiLEvfp3P2rB93c/as\n719p6r0RoZ2dXRsOvY3Hu1Foi4t+zpgb7dQpL1xNT3y36yf4jKJbi9HTT/tABue855Pn3hN65hk4\ne5bcXGPr9IjhmYLxiRqXOlwbqj4Uq9C6KHSfFZIXDK422DqidBHPrpzm8dNv4fHTb+FydpSr5gjX\n4wWKOKNdTuiPhnTyHZIqx9QFphqzMFzndfmEN2iLM+YkK3qabnmCUdnmCkc5Xx9jY9zDOmEy8RHi\nFy/6FkPwVbCy4vV5NIKzZxVF+da3DFl7w0Wi5b/A9Z6jaD1Paa6Ry3UmbFLphNqVWC2obIG1BZXN\nqV0BgE7/jZIRxhkW80WO7Bxh7eoaxyfHOaEn6Nd9OrZDp+6QaUZEREJCmqR0l7v0Vnr0j/ZpDVok\ng4RWv0XSSfxA3NQQL8R03twh6uyjRx14RQjCchsOg7C8FFtbWzz11FNcvXqVsiwpy5KzZ8/yyU9+\nkhDMEVUAABMbSURBVLIsefjhhzl16hRZltFqtV70t9vtziyOY4wxGGNm4dX7Ql37J+rzz+/a+fO7\n25ub3isZDvf2B8Wx93gaa4b9HzniY4+PH/eDYZoBMSdO+PSbNR1WFeQ51k4YTb7KcPQlyvwCttik\nLrew9RCnE6xOsDqm1i1KruMoMJqgMLUIXAtnM+o6Y6NY5cnqNF9Mv41n2mcYxW0qk+DqCOdiIucw\n6ujkBb3JmN5kTLsqyayjZR1xBc6Cq7ylZY25lpKvLzG6vsywXGZULVKUPZIyIqsi0toQEUFkwERU\nGjEpY7ZHEZvDiCRWFgeOpQWl13O0B9vEgw1c/yJF53nG6XNMkudJekPizhDSbZwpsVpTa0XtSsq6\nYFJPyF2OxVJTY7EYNV6MbIde0aN/vc8yy6x0V2gnbRJJSExC3/RZSVdYzVZZ7ayytrTG2pE1uke6\nJGsJ6bGUuHevV04P3IogLLchCMutUVWeeOIJHnnkEa5du0ZRFOR5TlEUs+0mfHo0GjEajbDW4pzD\nWst4PAZgeXmZhYUF4jie2dGjRzl58iSnTp3i1KlTnD59mtOnT3PixAm63e7dNc01OOcFqSi8RzKZ\n+ECAq1e9bWz44IBLl7wLcOmS77u5ds03wTV9RUniBSqOfdqJEz4U7PRp7yo089EsLPi0171uT9Sc\ncyXO+WYtUJyrsHYHa4fU9Saj0VfZ2fkSw+ETlOVFn19LnJ1A1Eejo1hdxGqPynUoXIetwrEx2WJj\nvM0LkxHXioTNPGZYtynpYE0bG3VpuzZLRcRSYchcypWlNS4fWePiyipFmrI4HLK0PaQ3GhPXlth6\n0zLGjjNvRYorE2yVYHKhN8zp7+S0titG15e4MjzJ5fFJJnUXdREVMZaY+5Y3eOCNQ+6/X1hYMqSp\nkLWEuK/IkkWWFTsoyOUFtjYucu3iJfJyQmlLSluyXW9zpbrCC/YFXnAvcI1rbJpNWrbFymiFI5tH\nWB2t0pe+b6ZLfJ9T1IqI2zFxO6Yz6NBd6tI70mNpeYmT/ZMcGxzj+OJxup0urbhFbII4vRyCsNyG\nICz3DlVlMpmwubnJ1tYWdV1jraWqKjY2Nrhw4QLnz5/n/PnznDt3jnPnznHhwgXyPKfVatHpdPZ4\nR4PBgLW1NY4dO8bq6ir9fp9er0ev15t5SiJCu91mdXWVtbU1VlZW6HQ631jwQlV5r6csd60ZLFoU\nXnyee87b5qZvrhuP/fa5c/7zZgK0NPXC1JxfxAvUwoLP0wQ3tNveksTnMQYV8Z6P26ZmSLUUU67F\nFKuGcglc6lAqrB1Rlpcpy0sUxSWcm6BqAQvI1AxIhJoFKhmQa5dSIyoVKoUKwaqhrCGvhVITchcz\nsTGlE8paKa1SOaiimDqKqU3CUPoMdYGhLDNMjjJsH8PlFfHVHVpPZWRPD5DnlpFJilQxUsW4Kqau\nEuo6pSwzyrpNXnWoXEIWlWRJSSsuSYxFxFeHGIgig0kE6Q1hsIHrX6LuXESTESYCEUWoicURYzHU\niOaoHePcmDq+zqS7wbi/zrhzFRsXVLFv9os0QjAIQsu2Wa5WOFIfZdkeYcEs0o8X6CcD0ij11wMk\nacRgoc/icp/FIwO6rTYt06IVt+gt91hYXaCbdemlPdrJwZvwLgjLbQjC8trDOUee54xGoz1e0tbW\nFuvr61y+fJmNjQ12dnZmVtc1qopzjvF4zMbGBhsbG1y5coXJZEIURbRardnMB51Oh36/P5v94OjR\no6RpijGGKIrIsoxOpzOzpky73SZNU+I4JkmSWVNgky9JEsQ5H8QwmXiRqirvPTXfs6ryntPW1u5E\naZPJbv4m0q2JerPWi9r6+m7T3/p0zbpGoFZXYW3N/516Swq+qStJIE3Q2GDdCOt2qO0OKhaNpk/u\nSFADagSNFBdbrKmwUYVrC7YX4/oJthvjegmuY1Cj1PUmVXWVqrqKtUOsHWPdGHUTVFIcGZUm1E5x\n6nA4So2YuIxcU0qX+Ae6GKxLmRQ9xnmfUdGntC0qTamIKV1KqRmVZpQ2pSCj0Ba5a1O5BHWG2iY4\nF2OLBFck2EmCK2PvcRUJLo+xRYyWMa5I0CKGIkILQUtBS4MrDVYKXOsK2t7Ata9AawvSLTTbAlMD\nAgoqNZpM0HiEJhNIJkg8gXiCJBOfFk/QbBvUEOVLROUiogYQBAEijIsR9Wam/yKJiFxG5NrELiPW\nFolkJGS0TMZCK2OpnbHUyUgi/0JlJCJLYha6LRZ7bQa9lDSNSBJDEgtxYohN5M9gDHHigzCSOKGX\n9ehlPQbZgG7W9S9iAqZlMMnNWw+CsNyGICwHH1WlqiryPCfPc8bjMePxmO3tbdbX12cC1ETaWWsp\nimLWzDcej/dsV1U1s+Z4o9GI8XhMXdckSUKWZaRpOvO2Gs+r1WrNBKyxJl+r1SKOY0QEEdkjhq1W\naxZi3lhc17TLkk6ec8Q5FsuShaIgFSGJIi9+In4fSIA0jjEiXrCc2w1emLe63hXEJrBiOpaIrS3f\nj7Wz4z2sLPNeWZp6T2w6iFab13rxQQC0p6NLO200idAI1DhcYqDXRnsdtNdGYy9YGoEm4FLBZYJL\nFWfs1CosJZUW5HVOYSuqymHzAlsVuLrG2Rq1NWCRqSEOIosaC8ZC5CACIodLwbWADDRVTOyQxCGx\n82vTTq9JIlAVnBqsRtQuoXIJpUuoXUJRtSjrjFoj1MZIbXB5zKhssZXHDAuDUwMKomCdwSrUDmob\nUZUZZZVR1jFWlQpLpZZaHaWD2imlFca1YVIbcqfTBlb/DHM4alNiTYUzlQ+/kOmn4kAcIj6aUMSB\nWMRUkI5w2RBNd9A4x1Qd4qJPVKdeCEXoFCtc+dgfzn5XdyssoQEy8E2NiMyi3QaDwT09l6pSliVF\nUVCW5Yv6pJpt5xyqOmsabPqrGs+r+axJn0wms3zWWqy1s3x1Xc88t+FwOBPQ+X6wyWQyE78sy+j3\n+yRJMrvuOI7p9/uz5sX5IIy40yEeDGbCJiJEIrSdo5skdOKYXpKQRhGRMSTGkEQRramYtrOMpXab\npSxjKU3JRDCqGOeI6po4zzE7O8ho5MWsEbey3O0by/O9ojcvgrPxTD4oYs9n856itbuC2cwS3niH\nTX9cXYNTpmoC0zrGOcRaX8QIagwaGTQSXAQuUupMsCnYxOEMgPWHRnF6DacWpd79ruCFwBmHE4dK\njYhDjAPjUAN1ItSJeM/SKBo5xCgumgp3BESKRAqxQqQz0VJRJPZG4rAtqNqGqhVRpgl5kpDHGXmU\n4cRQuWUKe5xx2WGzbHM9yRhVLcoqpaxSSPdXCoLHEggcEJqmwuFwiJ0+KMHP8jAcDvc0LTbeWyN+\ndV2/SPgaz64RvmbsVFVVewI7Njc3uXr1KlevXmU8Hs+O2+S11pIkyZ6gjVarxeLiIktLSwwGg9nL\nwY2eW9O3Nm9NWpIks3Lz/XVZls0GEM8fZ/5YAMYY+v0+g8GAwWBAEsdg7UwYM2NIRUiBtKpIy5Ik\nz4mc8w5bI1rzAga7+41XmOdeSOfzFcVuE2ld76bf7HmlijrFOcVVFldZtPIvH7Pm1Dz34j3yIi7j\nHWS0g4xHXjStF2lxFpzD2BoVQaMYjSKKpRU6l56fnTJ4LIFAAPAPyibg4bWEc46qqph/yZtMJly/\nfp3Nzc09A4PLspwFgjjnZt7fjeaco67rWQh948Xt7Ozs8fzmj9NYQ+MNbm9vs729PRNWYCa4jYc6\nv914lE2fXSNwSZLsiY6ct8YbbKxJn59loxHNWXPotOm0ofn8RpGUSJBeDL0FRBZnTa3Ntc33Ic4H\nuhhV34wqwqDb5Yf38f88CEsgELinGGPIbpjYtNVqsbS0xJkzZ16lq7o75sWt6ZMry3Lmqc1b47U1\notV4i025efGc9yTrur7pORuxnBfa5vP54zfnaLzO4XC4R1jn8+33y0hoCgsEAoHAHu62KeyeTwok\nIu8UkadE5KyI/NQt8vyciDwjIn8qIt/xUmVFZElEPiMiT4vI/xaRhXt9H4FAIBC4M+6psIiIAf4z\n8H3AW4H3icibbsjzLuAvq+obgR8HPnYHZT8IfFZV7wM+B3zoXt7HQeDzn//8q30JrxlCXewS6mKX\nUBf7x732WP4q8IyqPqeqFfCrwHtuyPMe4JcAVPUPgQURWXuJsu8BfnG6/YvAD97b2/jmJ/xodgl1\nsUuoi11CXewf91pYTgLPz+2fn6bdSZ7blV1T1XUAVb0MhNXOA4FA4DXCa3HhhZfTYRR66AOBQOC1\nws1ixPfLgLcDvzu3/0Hgp27I8zHgobn9p4C125UFnsR7LQDHgCdvcX4NFixYsGDfuN3Ns/9ej2P5\nY+BbROT1wCXgYeB9N+R5FHg/8Gsi8nbguqqui8gLtyn7KPAPgY8CPwL81s1OfjfhcoFAIBB4edxT\nYVFVKyL/FPgMvtntF1T1SRH5cf+xflxVf0dEvl9EngVGwI/eruz00B8Ffl1Efgx4DnjvvbyPQCAQ\nCNw5B3qAZCAQCAReeV6Lnfd3zZ0MyjyoiMgpEfmciHxVRL4iIv98mn5oB5WKiBGRJ0Tk0en+oawL\nEVkQkU+JyJPT78dfO8R18S9F5M9F5Msi8kkRSQ9LXYjIL4jIuoh8eS7tlvcuIh+aDmB/UkT+zp2c\n48AJy50Myjzg1MBPqOpbge8C3j+9/8M8qPQDwNfm9g9rXfws8Duq+mbgfnygzKGrCxE5Afwz4AFV\nfRu+S+B9HJ66+AT++TjPTe9dRN6C72p4M/Au4L/K/MyYt+DACQt3NijzwKKql1X1T6fbO/gIulMc\n0kGlInIK+H7g5+eSD11diMgA+B5V/QSAqtaqusUhrIspEdAVkRhoAxc4JHWhqn8AbN6QfKt7fzfw\nq9Pvy18Az+CfsbflIArLnQzKPBSIyF8CvgP4Iod3UOl/An4SH0LZcBjr4gzwgoh8Ytos+HER6XAI\n60JVLwL/ATiHF5QtVf0sh7Au5li9xb3f+Dy9wB08Tw+isAQAEekBvwF8YOq53BilceCjNkTk7wLr\nUw/udu77ga8LfHPPA8B/UdUH8BGYH+Rwfi8W8W/orwdO4D2Xf8AhrIvbcFf3fhCF5QJwem7/1DTt\n0DB1738D+GVVbcb4rE/nYENEjgEbr9b1vYJ8N/BuEfk68CvA3xKRXwYuH8K6OA88r6p/Mt1/BC80\nh/F78beBr6vqNVW1wKeBBzmcddFwq3u/ALxuLt8dPU8PorDMBmWKSIofWPnoq3xNrzT/Hfiaqv7s\nXFozqBRuM6j0IKGqH1bV06r6Bvz34HOq+sPA/+Tw1cU68LyIfOs06R3AVzmE3wt8E9jbRaQ17Yh+\nBz644zDVhbDXi7/VvT8KPDyNmjsDfAvwRy958IM4jkVE3omPgGkGVv70q3xJrxgi8t3AF4CvsDs9\nw4fxX4Zfx799PAe8V1Wvv1rX+UojIt8L/CtVfbeILHMI60JE7scHMSTA1/GDkSMOZ118BP+yUQFf\nAv4x0OcQ1IWI/A/gbwBHgHXgI8BvAp/iJvcuIh8C/hG+rj6gqp95yXMcRGEJBAKBwKvHQWwKCwQC\ngcCrSBCWQCAQCOwrQVgCgUAgsK8EYQkEAoHAvhKEJRAIBAL7ShCWQCAQCOwrQVgCgTlE5D82Sw1M\n939XRD4+t//vReRf3MXxPyIiP3G313mLYw/vxXEDgW+UICyBwF7+L356D6ajslfwyy80PAg8dicH\nEpHo5VzAyy3H4Z7bKvAaIghLILCXx5gKC15Q/hwYThfJSoE3AU8AiMjPTBdT+zMRee807XtF5Asi\n8lv4KVMQkX8zXUDpC8B9NzvpdNbh/yYiXwQ+KiLfKSKPicjjIvIHIvLGab4fEZFHROR/TY/50Zsc\na2Va9l37WjOBwB1yT9e8DwS+2VDVSyJSTddxabyTk/hF07aBr6hqLSJ/H3ibqn67iKwCfywivzc9\nzF8B3qqq50TkAfxCSW8DUrwo/Qk356Sqvh1ms1P/dVV1IvIO4N8BPzTNdz9+OYQKeFpEfk5VL0zL\nreLnd/qwqn5u3yomEPgGCMISCLyYx/AzIz+IX7fj1HR/C99UxnT/VwBUdUNEPg98JzAE/khVz03z\nfQ/waVUtgEKmyyPfgk/NbS8CvzT1VJS9v9X/M10KARH5Gn769wt44fos8H5V/f2Xcd+BwL4QmsIC\ngRfTNId9G74p7It4j+W7uHX/yvxMsaOXed75cv8WPxvztwM/ALTmPivmti27olMDjwPvfJnnDwT2\nhSAsgcCLeQz4e8A19WziPYh5Yfl94CERMSJyFO+Z3Gw68S8APygimYj08SJxJwzYXffiR++wjAI/\nBrxJRP71HZYJBPadICyBwIv5Cn5K8f93Q9p1Vb0GoKqfBr4M/Bm++eknVfVFC0Op6peAX5vm/W1u\nvZbFjRFdPwP8tIg8zu1/p/PlVP105e8D/qaI/JPblAsE7hlh2vxAIBAI7CvBYwkEAoHAvhKEJRAI\nBAL7ShCWQCAQCOwrQVgCgUAgsK8EYQkEAoHAvhKEJRAIBAL7ShCWQCAQCOwrQVgCgUAgsK/8f8OX\ni0+RexIkAAAAAElFTkSuQmCC\n", + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "for i in range(10):\n", + " plt.plot(range(100), topic_model.get_topics(topic_ids=[i], num_words=100)['score'])\n", + "plt.xlabel('Word rank')\n", + "plt.ylabel('Probability')\n", + "plt.title('Probabilities of Top 100 Words in each Topic')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In the above plot, each line corresponds to one of our ten topics. Notice how for each topic, the weights drop off sharply as we move down the ranked list of most important words. This shows that the top 10-20 words in each topic are assigned a much greater weight than the remaining words - and remember from the summary of our topic model that our vocabulary has 547462 words in total!\n", + "\n", + "\n", + "Next we plot the total weight assigned by each topic to its top 10 words: " + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYYAAAEZCAYAAACTsIJzAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XucHHWd7vHPk8RwJ4JIZklIgsCC4AoqQhTBXtElgBJ1\nVzcohwV3MbvKkpWjgriaOUePihcWWN2DWSMKclNAzdGo6EKDHDUGSOSWmCASEiCRqwRQCeG7f9Rv\nkqqmZ6Znpqu7Jnner9e8puv+dE1Nf7t+v6puRQRmZmZ9xnQ7gJmZVYsLg5mZFbgwmJlZgQuDmZkV\nuDCYmVmBC4OZmRW4MFSQpG0kPSdpjw5sa7akHw9z2aMlrRxg+kWSPthsXkl3SzpsONsdYkZJulTS\nY5LqZW9vSyVpiaS3D3GZl0l6oKxMw5GOg1d2OcMhku7pZobBuDC0SNJ6SU+kn42Sns6NO2GQZQd8\nAe1HvzeYSPqFpD+kba+T9E1Juw1x/S1tayTLRsQpEfH5ZvNGxD4RsQhA0qclzRtBhoEcBRwGTIyI\nWn6CpN7c3/APkjakx+slLW5nCEkTJF0j6b5U9F/eMH2spC+mF651kj42wLp+IenU3PABaZ35cQdK\nekbS9u18HkMREXdEROlvbsog6TO5Y+OPuWPjCUk3jWTdEXFzRLykXVnL4MLQoojYKSJ2joidgVXA\ncblxlw+yuBj6i68GigO8J2V5KdADfLbpSqSt/W88DbgnIp5pnBARvX1/Q+BfgOvT33OniHh1m3ME\n8F/AO4Cnmkz/IDAd2IeskL1H0jv7WdeNwJG54SOBZQ3jjgBujYinhxJS0tihzL+lioizcsfGmcAP\n+v7/I+J13c5Xtq39RWO4RMMLt6RtJX1J0gPpXeFn07vAXYFrgJfk3oHsIum16Z3fY5LWSDp3iC/i\nAoiIR4HvAC9LOS6XdL6kH0laD0xP27tM0u8k/UbShxrWNVbShZJ+L+kOSUfkntd7JS1LuVdIOqUx\nR3rn/Uha99/kJlwu6eym4aUH0z6YCZwB/F3axi8kndj4rkzS2ZKaFmBJe0r6fsqwXNJJafw/Af8O\n1NK6zxp0rz5/3W9MzSiPSbpJ0sG5aUskfTz9fjTt4x2arScinoiIL/WdJTVxEvDpiHgkIu4FLgBO\n7mfeG8le+PscQfbG4HUN427MZf2ApHvSMXClpBel8RPS2cZ7Jf0GuDmNf5uy5r5HJH26YZ+8TNLP\n0vGyrr+zPUkHpWOwb3hJ+jsuTst+R9KO/TxHJL1T0u1p314nad/ctE9I+m36uy6V9KaGZedI+nWa\nviS/LHC4pLvSei8a7psnScdI+lVazw2SXpabtkzSRyXdlqZfLGm7NO0wSQ/l5v2z9DdZJ+khSV8e\nTp62igj/DPEH+C3whoZxnwVuAHYBXgz8EvhImnY0sKJh/kOAV6XHewErgPem4W2A54A9+tn+z4F3\npce7Az8FLkzDlwMPA4ek4fHAN4Erge2AvYF7gBPS9NnAhvR7LPA/gEeAHdP0NwNT0uM3AE8DL809\nrw3AJ4FxZM02TwFTc1nObrYPgAeB16bHnwbm5aZtDzzet5407i5gRj/74xfA51OGV6X8r8k9v2tb\n+Js+bz5gD2A98Ja0b2YDDwDbpelLgJXp77cj8CPggha2tR54eW54LPAssG9u3BuAVf0s/8I0f99+\nvjcdd7fnxt0HvDk9fjuwGtgvHVtfAxakaRPSsXZ1eg7bAHumv+Ob0j6dCzwDvD0tsxB4X3q8LTC9\nn5wHAU/khpcAt6X17wAsBj7cz7KvT8/hL8jeBJ2Wnp/S9L8FXpSm/T3wGLBzmnYq2f/TAWl4P7Km\nRNJ816X9tTvZ2f87B/l7zenbX7lx04AnyY7rscDpaV3j0/Rl6WcKsHPa5mfTtMOA36XHInut+Hey\n4/4FwOFlv4YNeox2O8Bo/KF5YVgDvD43fDxwV3r8vMLQZJ1nApemx60UhvXAo+mf56vAC9O0y0lF\nIg2PJ/ciksadDixMj2cDv2lY/1Lgr/vZ9g+AU3PP6+m+f4Y07rvA/8xlGXJhSOPmAx9Njw8B1gJj\nmuTZJ2XYJjfuXOA/cs9vuIXhNOCHDePuAo5Pj5eQe2EDXgM81MK2GgvDjunvvXtu3CHAowOsYwlZ\nEZ8G3J7G/d/cuA3AhDT+KuCs3LK7p+3tzObCcFBu+j/3HR9peBzZC2pfYfgu8DnSi+0AGZsVhvfl\nhj8CXNbPspcBH2gY9yDwFwP8Tx6RHv8COLGf+R4DjskNfxn41CDPo1lh+CDwnYZxvwH+Kj1eBvxz\nbtpfAqvT43xheAXZ//G4wY6bTv64Kal9eshepPusAib1N7Okl0paKGmtpN8DHwOG0oH83ojYNSKm\nRMR7IuLx3LTVDbnUMK4x25qGdd9H9m4ZScdLWpSaFB4jO8DzOR+KYvv9qr5lR+hi4N3p8buByyPi\nuSbz7ZEy/KkhQ7/7fgj2SOvKa1x3437dVdILhridP7D5hbrPBLIC0p++foYjyc4YAW4ie6d9JHBH\nRPw+jS88j4j4HfAn+j8G9iD3vCLiWbIX5T7vJzsr/pWkW9V/X0gza3OPnyYris1MBf53aoZ5NB17\nO/VllvSPuWamx4DJbD4u9yQ7K+7PuhYzDKTZsXEf/e/TVWT/i40mA2vSPq4MF4b2eZDsYO4zFbg/\nPW7W8fyfwC3AXhExAfgEA3c4Nxqsc7rPWrIXnSm5cVNy2SA7OGmcruyKlm8C/wvYLSJ2Aa5v2PZu\nksY3LDvUSxSft38i4gZgW2WXtJ4AXNLPsg8AL5a0TWP+IWbob93TGsZNofgPv2fu8VSyd/kbhrKR\niNgILCd7h93nIODOARa7kawIvI7NheGnZEWh0L9A9jw2HZuSdic7k8wfn/m/wYPknpeyDuk/y+Vd\nExEnR0QP8GHgG5JePOgTHZrVZGdju6afXSJix4j4YWrL/yxwUt80sr+Jcsvu3eY8jYZzbKzl+VYD\nkyWNa2u6EXJhaJ8rgLmSdk3/eGez+cVsHbB7Q8fkjsDvI+IPkg4kaxdtu/Ru/tvApyRtL2lvslPj\n/AvtlNT5OFbSiWSF4sdkfRLjgIcgO3sAag2bGA98TNILJL0BeCNZ08VQrCNrp2/0DWAe8HBE3NrP\n87ubrO35k5LGK7tG/ST6LyRD8W3gNZKOS/vmVLK26f/KzfMPkl4iaWfg42THQVMp37ZpcJuGYnYJ\ncJak3STtRdaMddEA2W4ka0Z7C6kwRMR9ZM2Qb6ZYGC4H/knSfqkD9DPA9yLiib5oTZ73kanjfRzw\nUbI+gb7nMUvSxDT4BFlR2djf0x7gOQzky8AZfZ39knaS9NaUZyey5tFHJI2TNIfim5uvAP+a/q+Q\ntH8ub7tcDbxB0l+lY+M0sv+V/H6fLWmKpAnAv9Lk2IiIpWT9VF+QtEM6Rg5vc9Yhc2EYnmZnAB8n\na3++E7iV7J/1cwAR8StgAbAqnfq+kOxKnFMlPUHW8dR40Ax0eetQp80m+wddBfyErD0/f4XPDWxu\n6/wI8LaIWB8Rj5C1pX6PrEP7eOD7Dev+Ldk/6Vqyf8iTI6LvFLvVnFcAO6R9k78a6WKyzseLB1gP\nZJeAHpgyXA58MCJ+Psgyg4qI+8k6bv8P2fM/GTg2ipeAXkL2IrGKrMP8IwOsch1Zp+72ZO3gT6eC\nAlnn+S/IXiQWAV+NiG8NkO0hsrOMP0ZE/l3qTWTNPDfm5r0aOA/4IVlzx45kHbabZmlY92qyvoov\np8zbAnfkZjkCWJqO3a+TvXN/tL+o/W1nIBFxPdnZyFdTU9FdwN9kk+LnZMfEr8jece9K9uagb9mv\nkPW3fCc1015GVkyGlGGQfL8F3kn2d3s4PX5zQ5PmpWT/O/eSnWHM7Wd1byVrgrqX7IzjxHZkHIm+\nHn6zykmXMq4F9m948asESUuAT0TENd3OYtUiaRnZRRgLu51lOHzGYFV2OlCvYlEw25JVqsPDrI+k\nB8maXY7vdpYB+HTb+jOqjw03JZmZWYGbkszMrGCLaEqS5NMeM7MhioimlxNvMWcMI7n9e+7cuV2/\nBb0qOaqQoSo5qpChKjmqkKEqOaqQoR05BrLFFAYzM2sPFwYzMytwYQBqtVq3IwDVyFGFDFCNHFXI\nANXIUYUMUI0cVcgA5ebYIi5XlRRbwvMwM+sUScSW3vlsZmbt4cJgZmYFLgxmZlbgwmBmZgWlFwZJ\nMyQtl7RC0plNpu8n6WeS/ijpjCbTx6SvD1xQdlYzMyu5MEgaA3yR7IvgDwROkLR/w2yPkH35+Of6\nWc0csi/pMDOzDij7jOFQYGVErIrse3CvAGbmZ4iIhyPiFrJvASuQNBk4luybwczMrAPKLgyTyL56\nr8+aNK5V/wZ8iFH+2eZmZqNJZT9dVdJxwLqIWCqpxiBfKt7b27vpca1Wq8zdiWZmVVCv16nX6y3N\nW+qdz5KmA70RMSMNn0X2Zd7nNJl3LrA+Is5Nw58i+1LsZ4HtyL7M+5qIOKnJsr7z2cxsCLp55/Ni\nYB9JUyWNB2YBA11dtClkRJwdEVMi4iVpueuaFQUzM2uvUpuSImKjpNOAa8mK0PyIWCZpdjY55kma\nCNxMdkbwnKQ5wAER8WSZ2czMrDl/iJ6Z2VbIH6JnZmYtc2EwM7MCFwYzMytwYTAzswIXBjMzK3Bh\nMDOzAhcGMzMrcGEwM7MCFwYzMytwYTAzswIXBjMzK3BhMDOzAhcGMzMrcGEwM7MCFwYzMytwYTAz\nswIXBjMzK3BhMDOzAhcGMzMrcGEwM7MCF4YO6emZhqQR/fT0TOv20zCzrUDphUHSDEnLJa2QdGaT\n6ftJ+pmkP0o6Izd+sqTrJN0p6XZJp5edtUzr1q0CYkQ/2TrMzMqliChv5dIYYAVwFPAAsBiYFRHL\nc/PsBkwF3go8FhHnpvE9QE9ELJW0I3ALMDO/bG4dUebzaAdJZC/wI1oLVX+eZjY6SCIi1Gxa2WcM\nhwIrI2JVRGwArgBm5meIiIcj4hbg2YbxayNiaXr8JLAMmFRyXjOzrV7ZhWESsDo3vIZhvLhLmgYc\nDCxqSyozM+vXuG4HGExqRroKmJPOHJrq7e3d9LhWq1Gr1UrPZmY2WtTrder1ekvzlt3HMB3ojYgZ\nafgsICLinCbzzgXW9/UxpHHjgO8BP4iI8wfYjvsYzMyGoJt9DIuBfSRNlTQemAUsGGD+xpBfBe4a\nqCjY6ONLd82qrdQzBsguVwXOJytC8yPiM5Jmk505zJM0EbgZ2Al4DngSOAA4CLgRuJ3N12yeHRE/\nbLINnzGMIt4XZt030BlD6YWhE1wYRhfvC7Pu62ZTkpmZjTIuDGZmVuDCYGZmBS4MZmZW4MJgZmYF\nLgxmZlbgwmBmZgUuDGZmVuDCYGZmBS4MZmZW4MJgZmYFLgxmZlawVRQGf8yzmVnrtopPV63Cp3lW\nIUNVeF+YdZ8/XdXMzFrmwmBmZgUuDGZmVuDCYGZmBS4MZmZW4MJgZmYFLgxmZlZQemGQNEPSckkr\nJJ3ZZPp+kn4m6Y+SzhjKsmZm1n6l3uAmaQywAjgKeABYDMyKiOW5eXYDpgJvBR6LiHNbXTa3Dt/g\nNop4X5h1XzdvcDsUWBkRqyJiA3AFMDM/Q0Q8HBG3AM8OdVkzM2u/sgvDJGB1bnhNGlf2smZmNkzj\nuh2gXXp7ezc9rtVq1Gq1rmUxM6uaer1OvV5vad6y+ximA70RMSMNnwVERJzTZN65wPpcH8NQlnUf\nwyjifWHWfd3sY1gM7CNpqqTxwCxgwQDz50MOdVkzM2uDUpuSImKjpNOAa8mK0PyIWCZpdjY55kma\nCNwM7AQ8J2kOcEBEPNls2TLzmpmZv49hKFtxU1KbeF+YdZ+/j8HMzFrmwmBmZgUuDGZmVuDCYGaV\n0NMzDUkj+unpmdbtp7FFcOdz61tx53ObeF9YMz4uOsudz2Zm1jIXBjMzK3BhMDOzgpYKg6RrJB2X\nviPBzMy2YK2+0P8H8C5gpaTPSNqvxExmZtZFLRWGiPhJRLwbeCVwL/CT9HWcp0h6QZkBzcyss1pu\nGpL0IuBk4B+AJcD5ZIXix6UkMzOzrmjp01UlfRvYD7gEeEtEPJgmXSnp5rLCmZlZ57V0g5ukYyNi\nYcO4bSLiT6UlGwLf4Da6eF9YMz4uOqsdN7h9ssm4nw8/kpmZVdWATUmSeoBJwHaSXsHmb1jbGdi+\n5GxmW7yenmmsW7dqROuYOHEqa9fe255AZgzSlCTp78g6nA8h+5a1PuuBr0XENaWma5GbkkYX74vN\nvC82877orIGaklrtY/jriLi67cnaxIVhdPG+2Mz7YjPvi84aqDAM1pR0YkR8A5gm6YzG6RFxbpsy\nmplZRQx2ueoO6feOZQcxM7NqKP37GCTNAM4juwJqfkSc02SeC4BjgKeAkyNiaRr/AeDvgeeA24FT\nIuKZJsu7KalFVejsrMq+qALvi828Lzpr2H0M6QW7XxFx+iAbHgOsAI4CHgAWA7MiYnlunmOA0yLi\nOEmHAedHxHRJewA3AftHxDOSrgS+HxEXN9mOC8MoylGFDFXhfbGZ90VnDbuPAbhlhNs+FFgZEatS\nkCuAmcDy3DwzgYsBImKRpAmSJqZpY4EdJD1HdnnsAyPMY2ZmgxiwMETE10e4/knA6tzwGrJiMdA8\n9wOTIuJWSV8A7gOeBq6NiJ+MMI+ZmQ1isKuSzouIf5H0/2hyjhcRx5cVTNILyc4mpgK/B66S9K6I\nuKysbZqZ2eBNSZek358f5vrvB6bkhiencY3z7NlknjcC90TEo5B9WRDwWqBpYejt7d30uFarUavV\nhhnZzLZmVbhAowz1ep16vd7SvC1flSRpPLA/2ZnDr5tdHdRkmbHAr8k6nx8EfgmcEBHLcvMcC7w/\ndT5PB85Lnc+HAvOBVwN/Ai4CFkfEl5psx53PoyhHFTJUhffFZlXZF1XJUbaRdD73reA44ELgN2Sf\nl7SXpNkR8YOBlouIjZJOA65l8+WqyyTNzibHvIhYKOlYSXeTXa56Slr2l5KuIvvuhw3p97xW8pqZ\n2fC1+pEYy4E3R8TdaXhvsktH9y85X0t8xjC6clQhQ1V4X2xWlX1RlRxla8fHbq/vKwrJPWQfpGdm\nZluYwa5Kent6eLOkhcA3yUrpO8huVjMzsy3MYGcMb0k/2wLrgNcDNeAhYLtSk5mVrKdnGpJG9NPT\nM63bT8O2QN0+Nkv/rKROcB/D6MpRhQxVyVGFDFVRlX1RhRydyNCOq5K2JfswuwPJzh4AiIj3DC2o\nmZlVXaudz5cAPcDRwA1kN6G589nMbAvUamHYJyI+BjyVPj/pOOCw8mKZmVm3tFoYNqTfj0t6GTAB\n2L2cSGZm1k0t9TEA8yTtAnwMWED2jW4fKy2VmZl1ja9Kan0rlb/KYLTkqEKGquSoQoaqqMq+qEKO\nbl+V1FJTkqQXSfp3SbdKukXSeZJeNMy0ZmZWYa32MVwB/A74a+BvgIeBK8sKZWZm3dPqh+jdEREv\naxh3e0T8RWnJhsBNSaMrRxUyVCVHFTJURVX2RRVyjIqmJOBaSbMkjUk/7wR+NIykZmZWcQOeMUha\nT1a2BOwAPJcmjQGejIidS0/YAp8xjK4cVchQlRxVyFAVVdkXVcjR7TOGAS9XjYidRpjMzMxGmVbv\nY0DS8cCRabAeEd8rJ5KZmXVTq5erfgaYA9yVfuZI+nSZwczMrDtavSrpNuDgiHguDY8FlkTEy0vO\n1xL3MYyuHFXIUJUcVchQFVXZF1XI0e0+hlavSgJ4Ye7xhCEsZ2Zmo0irfQyfBpZIup7sCqUjgbNK\nS2VmZl0z6BmDsnOam4DpwDXA1cBrIqKlO58lzZC0XNIKSWf2M88FklZKWirp4Nz4CZK+JWmZpDsl\n+aO+zcxKNugZQ0SEpIXpLucFQ1m5pDHAF4GjgAeAxZK+GxHLc/McA+wdEfumF/4LyYoQwPnAwoh4\nh6RxwPZD2b6ZmQ1dq30Mt0p69TDWfyiwMiJWRcQGss9cmtkwz0zgYoCIWARMkDRR0s7AERFxUZr2\nbEQ8MYwMZmY2BK32MRwGnCjpXuApsn6GaOGqpEnA6tzwGrJiMdA896dxG4GHJV0EHATcDMyJiD+0\nmNnMzIah1cJwdKkpmhsHvBJ4f0TcLOk8sg7vuV3IYma21RiwMEjaFvhHYB/gdmB+RDw7hPXfD0zJ\nDU9O4xrn2bOfeVZHxM3p8VVA085rgN7e3k2Pa7UatVptCDHNzLZs9Xqder3e0ryDfYjelWTf9/xT\n4BhgVUTMaTVIuhHu12Sdzw8CvwROiIhluXmOJTsrOE7SdOC8iJiept0AnBoRKyTNBbaPiOcVB9/g\nNrpyVCFDVXJUIUNVVGVfVCFHt29wG6wp6YC+71yQNJ/shb1lEbFR0mnAtWQd3fMjYpmk2dnkmBcR\nCyUdK+lusv6LU3KrOB24VNILgHsappmZWQkGO2O4NSJe2d9wVfiMYXTlqEKGquSoQoaqqMq+qEKO\nqp8xHCSp7xJRAdul4b6rkirxfQxmZtY+g30fw9hOBTEzs2oYyofomZnZVsCFwczMClwYzMyswIXB\nzMwKXBjMzKzAhcHM6OmZhqQR/fT0TOv207A2aek7n6vON7iNrhxVyFCVHFXIUJUcVchQlRzdvsHN\nZwxmZlbgwmBmZgUuDGZmVuDCYGZmBS4MZmZW4MJgZmYFLgxmZlbgwmBmZgUuDGZmVuDCYGZmBS4M\nZmZW4MJgZmYFpRcGSTMkLZe0QtKZ/cxzgaSVkpZKOrhh2hhJt0paUHZWMzMruTBIGgN8ETgaOBA4\nQdL+DfMcA+wdEfsCs4ELG1YzB7irzJxmZrZZ2WcMhwIrI2JVRGwArgBmNswzE7gYICIWARMkTQSQ\nNBk4FvhKyTnNzCwpuzBMAlbnhtekcQPNc39unn8DPsTIP5jczMxaNK7bAfoj6ThgXUQslVQDmn6h\nRJ/e3t5Nj2u1GrVarcx4ZmajSr1ep16vtzRvqd/gJmk60BsRM9LwWUBExDm5eS4Ero+IK9PwcuD1\nZH0LJwLPAtsBOwHXRMRJTbbjb3AbRTmqkKEqOaqQoSo5qpChKjm29G9wWwzsI2mqpPHALKDx6qIF\nwEmwqZA8HhHrIuLsiJgSES9Jy13XrCiYmVl7ldqUFBEbJZ0GXEtWhOZHxDJJs7PJMS8iFko6VtLd\nwFPAKWVmMjOzgZXalNQpbkoaXTmqkKEqOaqQoSo5qpChKjm29KYkMzMbZVwYzMyswIXBzMwKXBjM\nzKzAhcHMzApcGMzMrMCFwczMClwYzMyswIXBzMwKXBjMzKzAhcHMzApcGMzMrMCFwczMClwYzMys\nwIXBzMwKXBjMzKzAhcHMzApcGMzMrMCFwczMClwYzMyswIXBzMwKSi8MkmZIWi5phaQz+5nnAkkr\nJS2VdHAaN1nSdZLulHS7pNPLzmpmZiUXBkljgC8CRwMHAidI2r9hnmOAvSNiX2A2cGGa9CxwRkQc\nCLwGeH/jsmZm1n5lnzEcCqyMiFURsQG4ApjZMM9M4GKAiFgETJA0MSLWRsTSNP5JYBkwqeS8ZmZb\nvbILwyRgdW54Dc9/cW+c5/7GeSRNAw4GFrU9oZmZFYzrdoDBSNoRuAqYk84cmurt7d30uFarUavV\nSs9mZjZa1Ot16vV6S/MqIkoLImk60BsRM9LwWUBExDm5eS4Ero+IK9PwcuD1EbFO0jjge8APIuL8\nAbYTAz0PScBIn6cYyb6qQoaq5KhChqrkqEKGquSoQoaq5OhEBklEhJpNK7spaTGwj6SpksYDs4AF\nDfMsAE6CTYXk8YhYl6Z9FbhroKJgZmbtVWpTUkRslHQacC1ZEZofEcskzc4mx7yIWCjpWEl3A08B\nJwNIOhx4N3C7pCVk5fPsiPhhmZnNzLZ2pTYldYqbkkZXjipkqEqOKmSoSo4qZKhKji29KcnMzEYZ\nFwYzMytwYTAzswIXBjMzK3BhMDOzAhcGMzMrcGEwM7MCFwYzMytwYTAzswIXBjMzK3BhMDOzAhcG\nMzMrcGEwM7MCFwYzMytwYTAzswIXBjMzK3BhMDOzAhcGMzMrcGEwM7MCFwYzMysovTBImiFpuaQV\nks7sZ54LJK2UtFTSwUNZ1szM2qvUwiBpDPBF4GjgQOAESfs3zHMMsHdE7AvMBi5sddn2qZez2iGr\ndzsA1cgA1chR73aApN7tAFQjA1QjR73bAZJ6aWsu+4zhUGBlRKyKiA3AFcDMhnlmAhcDRMQiYIKk\niS0u2yb1clY7ZPVuB6AaGaAaOerdDpDUux2AamSAauSodztAUi9tzWUXhknA6tzwmjSulXlaWdbM\nzNqsip3P6nYAM7OtmSKivJVL04HeiJiRhs8CIiLOyc1zIXB9RFyZhpcDrwf2GmzZ3DrKexJmZluo\niGj6RnxcydtdDOwjaSrwIDALOKFhngXA+4ErUyF5PCLWSXq4hWWB/p+cmZkNXamFISI2SjoNuJas\n2Wp+RCyTNDubHPMiYqGkYyXdDTwFnDLQsmXmNTOzkpuSzMxs9Kli53NHVeEmOknzJa2TdFs3tp8y\nTJZ0naQ7Jd0u6fQuZNhG0iJJS1KGuZ3OkMsyRtKtkhZ0McO9kn6V9scvu5hjgqRvSVqWjo/DOrz9\nP0/74Nb0+/fdOD5Tlg9IukPSbZIulTS+CxnmpP+P0v5Pt+ozhnQT3QrgKOABsj6RWRGxvMM5Xgc8\nCVwcES/v5LZzGXqAnohYKmlH4BZgZhf2xfYR8bSkscD/B06PiI6/KEr6APAqYOeIOL7T208Z7gFe\nFRGPdWP7uRxfA26IiIskjQO2j4gnupRlDNml64dFxOrB5m/ztvcAbgL2j4hnJF0JfD8iLu5ghgOB\ny4FXA88CPwD+MSLuaed2tvYzhg7eRNe/iLgJ6Oo/f0SsjYil6fGTwDK6cN9IRDydHm5D1gfW8Xcu\nkiYDxwJf6fS2G6PQ5f9RSTsDR0TERQAR8Wy3ikLyRuA3nS4KOWOBHfoKJNkbyk56KbAoIv4UERuB\nG4G3t3sjW3th8E10TUiaBhwMLOrCtsdIWgKsBX4cEYs7nQH4N+BDdKEoNQjgx5IWSzq1Sxn2Ah6W\ndFFqypmuTZuAAAADi0lEQVQnabsuZQH4W7J3zB0XEQ8AXwDuA+4nu4LyJx2OcQdwhKRdJG1P9gZm\nz3ZvZGsvDNYgNSNdBcxJZw4dFRHPRcQrgMnAYZIO6OT2JR0HrEtnT6K7N1weHhGvJPvnf39qcuy0\nccArgS+lLE8DZ3UhB5JeABwPfKtL238hWYvCVGAPYEdJ7+pkhtS0ew7wY2AhsATY2O7tbO2F4X5g\nSm54chq3VUqnx1cBl0TEd7uZJTVXXA/M6PCmDweOT+37lwN/Kaljbch5EfFg+v0Q8G2yps9OWwOs\njoib0/BVZIWiG44Bbkn7oxveCNwTEY+mZpxrgNd2OkREXBQRh0REDXicrJ+0rbb2wrDpBrx0dcEs\nshvuuqHb704BvgrcFRHnd2PjknaTNCE93g54E9DRzu+IODsipkTES8iOh+si4qROZoCsEz6dvSFp\nB+CvyJoROioi1gGrJf15GnUUcFencyQn0KVmpOQ+YLqkbSWJbF90/N4qSS9Ov6cAbwMua/c2yr7z\nudKqchOdpMuAGvAiSfcBc/s6+zqY4XDg3cDtqY0/gLMj4ocdjPFnwNfTlSdjgCsjYmEHt18lE4Fv\np497GQdcGhHXdinL6cClqSnnHtJNqJ2U2tPfCLy309vuExG/lHQVWfPNhvR7XheiXC1p15ThfWVc\nDLBVX65qZmbPt7U3JZmZWQMXBjMzK3BhMDOzAhcGMzMrcGEwM7MCFwYzMytwYTBrkaRdcx///KCk\nNbnhId0TlD5qfd+yspqNhO9jMBsGSR8HnoyIc7udxazdfMZgNjyFjy+R9OH0xSm3pbvpkbR3+lKX\nyyXdJekKSdukaT+V9PL0+DhJt6Szj07eaW7WlAuD2QhJOpTsc3xeRfahau9LX6gCcABwbkQcAPwJ\nmN2w7ETgP8i+FOkVZJ/PZNZVLgxmI/c64OqIeCZ9VPl3gCPStHty3ynxjTRv3mvIPqhvDUBEPN6J\nwGYDcWEw66xmnXrd/lRdswIXBrOR+ynwNknbpI/KnpnGAewl6VXp8bty4/v8DKilj1BG0i6dCGw2\nkK36Y7fN2iEiFku6HLiZ7IzgSxFxp6S9yT6v/wxJrwBuA/6zb7G07O8k/RPw3ewj/nkAOK7Tz8Es\nz5ermpUkFYarUqey2ajhpiSzcvmdl406PmMwM7MCnzGYmVmBC4OZmRW4MJiZWYELg5mZFbgwmJlZ\ngQuDmZkV/DfkpZeqO72QKwAAAABJRU5ErkJggg==\n", + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "top_probs = [sum(topic_model.get_topics(topic_ids=[i], num_words=10)['score']) for i in range(10)]\n", + "\n", + "ind = np.arange(10)\n", + "width = 0.5\n", + "\n", + "fig, ax = plt.subplots()\n", + "\n", + "ax.bar(ind-(width/2),top_probs,width)\n", + "ax.set_xticks(ind)\n", + "\n", + "plt.xlabel('Topic')\n", + "plt.ylabel('Probability')\n", + "plt.title('Total Probability of Top 10 Words in each Topic')\n", + "plt.xlim(-0.5,9.5)\n", + "plt.ylim(0,0.15)\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here we see that, for our topic model, the top 10 words only account for a small fraction (in this case, between 5% and 13%) of their topic's total probability mass. So while we can use the top words to identify broad themes for each topic, we should keep in mind that in reality these topics are more complex than a simple 10-word summary.\n", + "\n", + "Finally, we observe that some 'junk' words appear highly rated in some topics despite our efforts to remove unhelpful words before fitting the model; for example, the word 'born' appears as a top 10 word in three different topics, but it doesn't help us describe these topics at all." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Topic distributions for some example documents\n", + "\n", + "As we noted in the introduction to this assignment, LDA allows for mixed membership, which means that each document can partially belong to several different topics. For each document, topic membership is expressed as a vector of weights that sum to one; the magnitude of each weight indicates the degree to which the document represents that particular topic.\n", + "\n", + "We'll explore this in our fitted model by looking at the topic distributions for a few example Wikipedia articles from our data set. We should find that these articles have the highest weights on the topics whose themes are most relevant to the subject of the article - for example, we'd expect an article on a politician to place relatively high weight on topics related to government, while an article about an athlete should place higher weight on topics related to sports or competition." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Topic distributions for documents can be obtained using GraphLab Create's predict() function. GraphLab Create uses a collapsed Gibbs sampler similar to the one described in the video lectures, where only the word assignments variables are sampled. To get a document-specific topic proportion vector post-facto, predict() draws this vector from the conditional distribution given the sampled word assignments in the document. Notice that, since these are draws from a _distribution_ over topics that the model has learned, we will get slightly different predictions each time we call this function on a document - we can see this below, where we predict the topic distribution for the article on Barack Obama:" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "+--------------------------+---------------------------+\n", + "| predictions (first draw) | predictions (second draw) |\n", + "+--------------------------+---------------------------+\n", + "| 0.0403225806452 | 0.0456989247312 |\n", + "| 0.0591397849462 | 0.0376344086022 |\n", + "| 0.0215053763441 | 0.0161290322581 |\n", + "| 0.10752688172 | 0.177419354839 |\n", + "| 0.604838709677 | 0.594086021505 |\n", + "| 0.0134408602151 | 0.0161290322581 |\n", + "| 0.0483870967742 | 0.0376344086022 |\n", + "| 0.0510752688172 | 0.0268817204301 |\n", + "| 0.0188172043011 | 0.0215053763441 |\n", + "| 0.0349462365591 | 0.0268817204301 |\n", + "+--------------------------+---------------------------+\n", + "+-------------------------------+\n", + "| topics |\n", + "+-------------------------------+\n", + "| science and research |\n", + "| team sports |\n", + "| music, TV, and film |\n", + "| American college and politics |\n", + "| general politics |\n", + "| art and publishing |\n", + "| Business |\n", + "| international athletics |\n", + "| Great Britain and Australia |\n", + "| international music |\n", + "+-------------------------------+\n", + "[10 rows x 3 columns]\n", + "\n" + ] + } + ], + "source": [ + "obama = gl.SArray([wiki_docs[int(np.where(wiki['name']=='Barack Obama')[0])]])\n", + "pred1 = topic_model.predict(obama, output_type='probability')\n", + "pred2 = topic_model.predict(obama, output_type='probability')\n", + "print(gl.SFrame({'topics':themes, 'predictions (first draw)':pred1[0], 'predictions (second draw)':pred2[0]}))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To get a more robust estimate of the topics for each document, we can average a large number of predictions for the same document:" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "def average_predictions(model, test_document, num_trials=100):\n", + " avg_preds = np.zeros((model.num_topics))\n", + " for i in range(num_trials):\n", + " avg_preds += model.predict(test_document, output_type='probability')[0]\n", + " avg_preds = avg_preds/num_trials\n", + " result = gl.SFrame({'topics':themes, 'average predictions':avg_preds})\n", + " result = result.sort('average predictions', ascending=False)\n", + " return result" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "+---------------------+-------------------------------+\n", + "| average predictions | topics |\n", + "+---------------------+-------------------------------+\n", + "| 0.591397849462 | general politics |\n", + "| 0.137231182796 | American college and politics |\n", + "| 0.0524193548387 | Business |\n", + "| 0.0503494623656 | team sports |\n", + "| 0.0413709677419 | science and research |\n", + "| 0.0343817204301 | international athletics |\n", + "| 0.0254301075269 | Great Britain and Australia |\n", + "| 0.0232258064516 | art and publishing |\n", + "| 0.0231720430108 | international music |\n", + "| 0.0210215053763 | music, TV, and film |\n", + "+---------------------+-------------------------------+\n", + "[10 rows x 2 columns]\n", + "\n" + ] + } + ], + "source": [ + "print average_predictions(topic_model, obama, 100)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "__Quiz Question:__ What is the topic most closely associated with the article about former US President George W. Bush? Use the average results from 100 topic predictions." + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "+---------------------+-------------------------------+\n", + "| average predictions | topics |\n", + "+---------------------+-------------------------------+\n", + "| 0.42769005848 | general politics |\n", + "| 0.190029239766 | American college and politics |\n", + "| 0.102339181287 | Business |\n", + "| 0.0605847953216 | science and research |\n", + "| 0.0466666666667 | art and publishing |\n", + "| 0.045350877193 | team sports |\n", + "| 0.0388888888889 | Great Britain and Australia |\n", + "| 0.0366081871345 | international athletics |\n", + "| 0.0299707602339 | music, TV, and film |\n", + "| 0.0218713450292 | international music |\n", + "+---------------------+-------------------------------+\n", + "[10 rows x 2 columns]\n", + "\n" + ] + } + ], + "source": [ + "george_bush = gl.SArray([wiki_docs[int(np.where(wiki['name']=='George W. Bush')[0])]])\n", + "print average_predictions(topic_model, george_bush, 100)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "__Quiz Question:__ What are the top 3 topics corresponding to the article about English football (soccer) player Steven Gerrard? Use the average results from 100 topic predictions." + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "+---------------------+-------------------------------+\n", + "| average predictions | topics |\n", + "+---------------------+-------------------------------+\n", + "| 0.49124 | team sports |\n", + "| 0.17276 | Great Britain and Australia |\n", + "| 0.12868 | international athletics |\n", + "| 0.0372 | international music |\n", + "| 0.0316 | general politics |\n", + "| 0.03112 | music, TV, and film |\n", + "| 0.02932 | Business |\n", + "| 0.02752 | art and publishing |\n", + "| 0.02644 | American college and politics |\n", + "| 0.02412 | science and research |\n", + "+---------------------+-------------------------------+\n", + "[10 rows x 2 columns]\n", + "\n" + ] + } + ], + "source": [ + "steven_gerrard = gl.SArray([wiki_docs[int(np.where(wiki['name']=='Steven Gerrard')[0])]])\n", + "print average_predictions(topic_model, steven_gerrard, 100)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Comparing LDA to nearest neighbors for document retrieval\n", + "\n", + "So far we have found that our topic model has learned some coherent topics, we have explored these topics as probability distributions over a vocabulary, and we have seen how individual documents in our Wikipedia data set are assigned to these topics in a way that corresponds with our expectations. \n", + "\n", + "In this section, we will use the predicted topic distribution as a representation of each document, similar to how we have previously represented documents by word count or TF-IDF. This gives us a way of computing distances between documents, so that we can run a nearest neighbors search for a given document based on its membership in the topics that we learned from LDA. We can contrast the results with those obtained by running nearest neighbors under the usual TF-IDF representation, an approach that we explored in a previous assignment. \n", + "\n", + "We'll start by creating the LDA topic distribution representation for each document:" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "wiki['lda'] = topic_model.predict(wiki_docs, output_type='probability')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next we add the TF-IDF document representations:" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "wiki['word_count'] = gl.text_analytics.count_words(wiki['text'])\n", + "wiki['tf_idf'] = gl.text_analytics.tf_idf(wiki['word_count'])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For each of our two different document representations, we can use GraphLab Create to compute a brute-force nearest neighbors model:" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "text/html": [ + "
Starting brute force nearest neighbors model training.
" + ], + "text/plain": [ + "Starting brute force nearest neighbors model training." + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
Starting brute force nearest neighbors model training.
" + ], + "text/plain": [ + "Starting brute force nearest neighbors model training." + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "model_tf_idf = gl.nearest_neighbors.create(wiki, label='name', features=['tf_idf'],\n", + " method='brute_force', distance='cosine')\n", + "model_lda_rep = gl.nearest_neighbors.create(wiki, label='name', features=['lda'],\n", + " method='brute_force', distance='cosine')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's compare these nearest neighbor models by finding the nearest neighbors under each representation on an example document. For this example we'll use Paul Krugman, an American economist:" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "text/html": [ + "
Starting pairwise querying.
" + ], + "text/plain": [ + "Starting pairwise querying." + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
+--------------+---------+-------------+--------------+
" + ], + "text/plain": [ + "+--------------+---------+-------------+--------------+" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| Query points | # Pairs | % Complete. | Elapsed Time |
" + ], + "text/plain": [ + "| Query points | # Pairs | % Complete. | Elapsed Time |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
+--------------+---------+-------------+--------------+
" + ], + "text/plain": [ + "+--------------+---------+-------------+--------------+" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| 0            | 1       | 0.00169288  | 16.669ms     |
" + ], + "text/plain": [ + "| 0 | 1 | 0.00169288 | 16.669ms |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| Done         |         | 100         | 324.125ms    |
" + ], + "text/plain": [ + "| Done | | 100 | 324.125ms |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
+--------------+---------+-------------+--------------+
" + ], + "text/plain": [ + "+--------------+---------+-------------+--------------+" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
query_labelreference_labeldistancerank
Paul KrugmanPaul Krugman0.01
Paul KrugmanElise Brezis0.7444980172622
Paul KrugmanMaitreesh Ghatak0.815649848313
Paul KrugmanKai A. Konrad0.8237005644064
Paul KrugmanDavid Colander0.8346259277595
Paul KrugmanRichard Blundell0.8379342678746
Paul KrugmanGordon Rausser0.839415347067
Paul KrugmanEdward J. Nell0.8421785000158
Paul KrugmanRobin Boadway0.8423742605969
Paul KrugmanTim Besley0.84308810925310
\n", + "[10 rows x 4 columns]
\n", + "
" + ], + "text/plain": [ + "Columns:\n", + "\tquery_label\tstr\n", + "\treference_label\tstr\n", + "\tdistance\tfloat\n", + "\trank\tint\n", + "\n", + "Rows: 10\n", + "\n", + "Data:\n", + "+--------------+------------------+----------------+------+\n", + "| query_label | reference_label | distance | rank |\n", + "+--------------+------------------+----------------+------+\n", + "| Paul Krugman | Paul Krugman | 0.0 | 1 |\n", + "| Paul Krugman | Elise Brezis | 0.744498017262 | 2 |\n", + "| Paul Krugman | Maitreesh Ghatak | 0.81564984831 | 3 |\n", + "| Paul Krugman | Kai A. Konrad | 0.823700564406 | 4 |\n", + "| Paul Krugman | David Colander | 0.834625927759 | 5 |\n", + "| Paul Krugman | Richard Blundell | 0.837934267874 | 6 |\n", + "| Paul Krugman | Gordon Rausser | 0.83941534706 | 7 |\n", + "| Paul Krugman | Edward J. Nell | 0.842178500015 | 8 |\n", + "| Paul Krugman | Robin Boadway | 0.842374260596 | 9 |\n", + "| Paul Krugman | Tim Besley | 0.843088109253 | 10 |\n", + "+--------------+------------------+----------------+------+\n", + "[10 rows x 4 columns]" + ] + }, + "execution_count": 29, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "model_tf_idf.query(wiki[wiki['name'] == 'Paul Krugman'], label='name', k=10)" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "text/html": [ + "
Starting pairwise querying.
" + ], + "text/plain": [ + "Starting pairwise querying." + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
+--------------+---------+-------------+--------------+
" + ], + "text/plain": [ + "+--------------+---------+-------------+--------------+" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| Query points | # Pairs | % Complete. | Elapsed Time |
" + ], + "text/plain": [ + "| Query points | # Pairs | % Complete. | Elapsed Time |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
+--------------+---------+-------------+--------------+
" + ], + "text/plain": [ + "+--------------+---------+-------------+--------------+" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| 0            | 1       | 0.00169288  | 2.839ms      |
" + ], + "text/plain": [ + "| 0 | 1 | 0.00169288 | 2.839ms |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| Done         |         | 100         | 24.178ms     |
" + ], + "text/plain": [ + "| Done | | 100 | 24.178ms |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
+--------------+---------+-------------+--------------+
" + ], + "text/plain": [ + "+--------------+---------+-------------+--------------+" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
query_labelreference_labeldistancerank
Paul KrugmanPaul Krugman0.01
Paul KrugmanAllen Feldman0.002640543220932
Paul KrugmanCleo Paskal0.005032438384563
Paul KrugmanLene Auestad0.007253012279254
Paul KrugmanBen Klemens0.008782019935645
Paul KrugmanSteve Lawrence (computer
scientist) ...
0.01096904775946
Paul KrugmanRandall Hansen0.01103365010057
Paul KrugmanMetta Spencer0.01104501228228
Paul KrugmanSteven Skiena0.01146910093069
Paul KrugmanMohamed Osman Elkhosht0.011552559081210
\n", + "[10 rows x 4 columns]
\n", + "
" + ], + "text/plain": [ + "Columns:\n", + "\tquery_label\tstr\n", + "\treference_label\tstr\n", + "\tdistance\tfloat\n", + "\trank\tint\n", + "\n", + "Rows: 10\n", + "\n", + "Data:\n", + "+--------------+-------------------------------+------------------+------+\n", + "| query_label | reference_label | distance | rank |\n", + "+--------------+-------------------------------+------------------+------+\n", + "| Paul Krugman | Paul Krugman | 0.0 | 1 |\n", + "| Paul Krugman | Allen Feldman | 0.00264054322093 | 2 |\n", + "| Paul Krugman | Cleo Paskal | 0.00503243838456 | 3 |\n", + "| Paul Krugman | Lene Auestad | 0.00725301227925 | 4 |\n", + "| Paul Krugman | Ben Klemens | 0.00878201993564 | 5 |\n", + "| Paul Krugman | Steve Lawrence (computer s... | 0.0109690477594 | 6 |\n", + "| Paul Krugman | Randall Hansen | 0.0110336501005 | 7 |\n", + "| Paul Krugman | Metta Spencer | 0.0110450122822 | 8 |\n", + "| Paul Krugman | Steven Skiena | 0.0114691009306 | 9 |\n", + "| Paul Krugman | Mohamed Osman Elkhosht | 0.0115525590812 | 10 |\n", + "+--------------+-------------------------------+------------------+------+\n", + "[10 rows x 4 columns]" + ] + }, + "execution_count": 30, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "model_lda_rep.query(wiki[wiki['name'] == 'Paul Krugman'], label='name', k=10)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Notice that that there is no overlap between the two sets of top 10 nearest neighbors. This doesn't necessarily mean that one representation is better or worse than the other, but rather that they are picking out different features of the documents. \n", + "\n", + "With TF-IDF, documents are distinguished by the frequency of uncommon words. Since similarity is defined based on the specific words used in the document, documents that are \"close\" under TF-IDF tend to be similar in terms of specific details. This is what we see in the example: the top 10 nearest neighbors are all economists from the US, UK, or Canada. \n", + "\n", + "Our LDA representation, on the other hand, defines similarity between documents in terms of their topic distributions. This means that documents can be \"close\" if they share similar themes, even though they may not share many of the same keywords. For the article on Paul Krugman, we expect the most important topics to be 'American college and politics' and 'science and research'. As a result, we see that the top 10 nearest neighbors are academics from a wide variety of fields, including literature, anthropology, and religious studies.\n", + "\n", + "\n", + "__Quiz Question:__ Using the TF-IDF representation, compute the 5000 nearest neighbors for American baseball player Alex Rodriguez. For what value of k is Mariano Rivera the k-th nearest neighbor to Alex Rodriguez? (Hint: Once you have a list of the nearest neighbors, you can use `mylist.index(value)` to find the index of the first instance of `value` in `mylist`.)" + ] + }, + { + "cell_type": "code", + "execution_count": 38, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "text/html": [ + "
Starting pairwise querying.
" + ], + "text/plain": [ + "Starting pairwise querying." + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
+--------------+---------+-------------+--------------+
" + ], + "text/plain": [ + "+--------------+---------+-------------+--------------+" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| Query points | # Pairs | % Complete. | Elapsed Time |
" + ], + "text/plain": [ + "| Query points | # Pairs | % Complete. | Elapsed Time |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
+--------------+---------+-------------+--------------+
" + ], + "text/plain": [ + "+--------------+---------+-------------+--------------+" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| 0            | 1       | 0.00169288  | 20.877ms     |
" + ], + "text/plain": [ + "| 0 | 1 | 0.00169288 | 20.877ms |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| Done         |         | 100         | 302.418ms    |
" + ], + "text/plain": [ + "| Done | | 100 | 302.418ms |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
+--------------+---------+-------------+--------------+
" + ], + "text/plain": [ + "+--------------+---------+-------------+--------------+" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# Get a list of 'reference_label' based on knn\n", + "alex_rodriguez_tfidf = list(model_tf_idf.query(wiki[wiki['name'] == 'Alex Rodriguez'], label='name', k=5000)['reference_label'])" + ] + }, + { + "cell_type": "code", + "execution_count": 40, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "value of k: 52\n" + ] + } + ], + "source": [ + "print 'value of k:', alex_rodriguez_tfidf.index('Mariano Rivera')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "__Quiz Question:__ Using the LDA representation, compute the 5000 nearest neighbors for American baseball player Alex Rodriguez. For what value of k is Mariano Rivera the k-th nearest neighbor to Alex Rodriguez? (Hint: Once you have a list of the nearest neighbors, you can use `mylist.index(value)` to find the index of the first instance of `value` in `mylist`.)" + ] + }, + { + "cell_type": "code", + "execution_count": 41, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "text/html": [ + "
Starting pairwise querying.
" + ], + "text/plain": [ + "Starting pairwise querying." + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
+--------------+---------+-------------+--------------+
" + ], + "text/plain": [ + "+--------------+---------+-------------+--------------+" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| Query points | # Pairs | % Complete. | Elapsed Time |
" + ], + "text/plain": [ + "| Query points | # Pairs | % Complete. | Elapsed Time |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
+--------------+---------+-------------+--------------+
" + ], + "text/plain": [ + "+--------------+---------+-------------+--------------+" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| 0            | 1       | 0.00169288  | 3.993ms      |
" + ], + "text/plain": [ + "| 0 | 1 | 0.00169288 | 3.993ms |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| Done         |         | 100         | 23.926ms     |
" + ], + "text/plain": [ + "| Done | | 100 | 23.926ms |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
+--------------+---------+-------------+--------------+
" + ], + "text/plain": [ + "+--------------+---------+-------------+--------------+" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "value of k: 729\n" + ] + } + ], + "source": [ + "# Get a list of 'reference_label' based on knn\n", + "alex_rodriguez_lda = list(model_lda_rep.query(wiki[wiki['name'] == 'Alex Rodriguez'], label='name', k=5000)['reference_label'])\n", + "print 'value of k:', alex_rodriguez_lda.index('Mariano Rivera')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Understanding the role of LDA model hyperparameters\n", + "\n", + "Finally, we'll take a look at the effect of the LDA model hyperparameters alpha and gamma on the characteristics of our fitted model. Recall that alpha is a parameter of the prior distribution over topic weights in each document, while gamma is a parameter of the prior distribution over word weights in each topic. \n", + "\n", + "In the video lectures, we saw that alpha and gamma can be thought of as smoothing parameters when we compute how much each document \"likes\" a topic (in the case of alpha) or how much each topic \"likes\" a word (in the case of gamma). In both cases, these parameters serve to reduce the differences across topics or words in terms of these calculated preferences; alpha makes the document preferences \"smoother\" over topics, and gamma makes the topic preferences \"smoother\" over words.\n", + "\n", + "Our goal in this section will be to understand how changing these parameter values affects the characteristics of the resulting topic model.\n", + "\n", + "__Quiz Question:__ What was the value of alpha used to fit our original topic model? " + ] + }, + { + "cell_type": "code", + "execution_count": 43, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "text/plain": [ + "5.0" + ] + }, + "execution_count": 43, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "topic_model['alpha']" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "__Quiz Question:__ What was the value of gamma used to fit our original topic model? Remember that GraphLab Create uses \"beta\" instead of \"gamma\" to refer to the hyperparameter that influences topic distributions over words." + ] + }, + { + "cell_type": "code", + "execution_count": 44, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "text/plain": [ + "0.1" + ] + }, + "execution_count": 44, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "topic_model['beta']" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We'll start by loading some topic models that have been trained using different settings of alpha and gamma. Specifically, we will start by comparing the following two models to our original topic model:\n", + " - tpm_low_alpha, a model trained with alpha = 1 and default gamma\n", + " - tpm_high_alpha, a model trained with alpha = 50 and default gamma" + ] + }, + { + "cell_type": "code", + "execution_count": 45, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "tpm_low_alpha = gl.load_model('topic_models/lda_low_alpha')\n", + "tpm_high_alpha = gl.load_model('topic_models/lda_high_alpha')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Changing the hyperparameter alpha\n", + "\n", + "Since alpha is responsible for smoothing document preferences over topics, the impact of changing its value should be visible when we plot the distribution of topic weights for the same document under models fit with different alpha values. In the code below, we plot the (sorted) topic weights for the Wikipedia article on Barack Obama under models fit with high, original, and low settings of alpha." + ] + }, + { + "cell_type": "code", + "execution_count": 46, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAakAAAEbCAYAAABgLnslAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XmcFNXV//HPYVgU2UFRGBhQwR1/IiK4seUhoEE0Piig\nxqBRNEET8+SJQUIgRqMxxhiiPK7BhaBx30CDCY47YgRxCZsQBhhwQUABlfX8/qjbQ88w01Mj1HQz\n832/Xv2arqrbVaebYU7XrVv3mLsjIiKSi+pkOwAREZGKKEmJiEjOUpISEZGcpSQlIiI5S0lKRERy\nlpKUiIjkrESTlJndY2Yfm9m7GdpMMLNFZvaOmf2/JOMREZE9S9JnUpOAb1e00cwGAge5eydgJHB7\nwvGIiMgeJNEk5e6vAmszNBkM3B/avgk0NbPWScYkIiJ7jmxfk2oLLE9bLg7rREREsp6kREREKlQ3\ny8cvBtqlLeeHdTsxM00yKCJSg7m7lV1XHWdSFh7leRr4HoCZ9QDWufvHFe3I3WM9xo0bF7ttdT4U\n154dk+La82NSXLkbU0USPZMysylAb6ClmS0DxgH1o3zjd7r7NDM71cw+BDYCI5KMR0RE9iyJJil3\nHx6jzagkYxARkT1XjRw40bt372yHUC7FFV8uxgSKqypyMSZQXFWRCzFZpr7AXGJmvqfEKiIiVWNm\neDkDJ7I9uk9EaokOHTpQVFSU7TAkywoKCli6dGns9jqTEpFqEb4pZzsMybKKfg8qOpOqkdekRESk\nZlCSEhGRnKUkJSIiOUtJSkRqvY4dOzJjxoysHb9OnTosWbJkt7etCZSkRCRr8gs6YGaJPfILOmT7\nLcZiVtHMcbvWtibQEHQRyZriZUVcP3tzYvsf3bV+Yvvenaoy6rG2jZDUmZSISJrNmzfzk5/8hLZt\n25Kfn8+VV17Jli1bgGgGhieeeAKA1157jTp16vDcc88BMGPGDI455phy9/nWW29xwgkn0Lx5c9q2\nbcvll1/O1q1by207YsQILrvsMvr370+TJk3o06cPy5YtK9XmhRdeoHPnzrRo0YJRo3bMLLdkyRL6\n9etHq1at2G+//TjvvPP44osvdvkzySYlKRGRNNdeey2zZs3i3XffZe7cucyaNYtrr70WgF69elFY\nWAjAyy+/zEEHHcTLL78MwEsvvVThNEJ5eXnccsstrFmzhjfeeIMZM2YwceLECmOYMmUK48aN47PP\nPuPoo4/m3HPPLbV96tSpvP3228ydO5eHH36Y6dOnA9FZ1tVXX81HH33EvHnzWLFiBePHj9+1DyTL\nlKRERNKkEkTLli1p2bIl48aN44EHHgCiJPXSSy8BUZIaPXp0yfJLL71Er169yt1n165d6d69O2ZG\n+/btueSSS0peV57TTjuNE088kXr16nHdddfxxhtvUFy8o9Te6NGjady4Me3ataNPnz688847ABx0\n0EH069ePunXr0rJlS6688sqMx9kTKEmJiKRZuXIl7du3L1kuKChg5cqVAPTs2ZOFCxfyySefMHfu\nXL73ve+xfPlyPvvsM2bNmsUpp5xS7j4XLVrEoEGDOOCAA2jWrBljxoxh9erVFcbQrt2OWrD77LMP\nLVq0KIkBoHXr1iXPGzZsyIYNGwD45JNPGDZsGPn5+TRr1ozzzjsv43H2BEpSIiJp2rRpU2qOwaKi\nItq0aQPA3nvvzbHHHsuf/vQnjjzySOrWrUvPnj25+eabOfjgg2nRokW5+7zssss47LDDWLx4MevW\nreO6667LOABi+fLlJc83bNjAmjVraNu2baWxX3311dSpU4cPPviAdevWMXny5D1+oIWSlIhImmHD\nhnHttdeyevVqVq9ezW9+8xvOP//8ku2nnHIKt956a0nXXu/evUstl2f9+vU0adKEhg0bMn/+fP7v\n//4vYwzTpk3j9ddfZ/PmzYwdO5aePXuWJMpM1q9fT6NGjWjcuDHFxcX8/ve/j/muc5eSlIjUeun3\nHv3yl7+kW7dudOnShaOPPppu3boxZsyYku29evViw4YNJV17qeVMSeqmm27ir3/9K02aNGHkyJEM\nHTq0wuMDDB8+nPHjx9OyZUvmzJnD5MmTK2ybbty4cbz99ts0a9aMQYMGcdZZZ8X7AHKYZkEXkWpR\n3uzX+QUdKF6WXPmOtu0LWFG0NLH9J2HEiBG0a9eOa665JtuhJKKqs6DrZl4RyZo9LYFI9VN3n4hI\nDqlt0x5VRt19IlItVPRQQEUPRUSkBlGSEhGRnKUkJSIiOUtJSkREcpaSlIiI5CwlKRGRKrrsssu4\n7rrrdnvbTIqKiqhTpw7bt2/f5X1V1YgRI/jVr34Vq23Hjh2ZMWPGbju2buYVkazpkJ9PUVoJit2t\noG1blq5Ysdv3W9nce9+0bWVq4z1UsZKUmQ0FDnL368ysHbCfu7+dbGgiUtMVFRezLsGifM0S2Pf2\n7dupU0edUNWl0k/azG4F+gDnhVUbgduTDEpEpDrNnz+fPn360Lx5c4466iieeeaZkm0jRozghz/8\nIaeddhqNGzemsLBwp+6vG2+8kTZt2pCfn88999xDnTp1WLJkScnrU21feukl2rVrx80330zr1q1p\n27Yt9957b8l+pk2bRteuXWnatCkFBQX8+te/jv0eOnbsyE033cTRRx9N48aNufjii/nkk0849dRT\nadKkCf379+fzzz8vaf/0009z5JFH0qJFC/r27cv8+fNLts2ZM4djjz2Wpk2bMnToUL7++utSx3r2\n2Wc55phjaN68OSeddBLvvfde7DirKs7XgRPcfSTwNYC7rwHqJxaRiEg12rp1K4MGDWLAgAF8+umn\nTJgwgXPPPZdFixaVtHnwwQcZO3Ys69ev58QTTyz1+ueff55bbrmFGTNm8OGHH1JYWJixW+6jjz5i\n/fr1rFy5krvvvpsf/ehHJcmjUaNGPPDAA3z++edMnTqV22+/naeffjr2e3n88cf55z//ycKFC3n6\n6ac59dRTueGGG1i9ejXbtm1jwoQJACxcuJDhw4czYcIEPv30UwYOHMigQYPYunUrW7Zs4cwzz+SC\nCy5gzZo1DBkyhMcee6zkGHPmzOGiiy7irrvuYs2aNYwcOZLTTz+dLVu2xI6zKuIkqS1mVgdwADNr\nCVT/lTsRkQTMnDmTjRs3ctVVV1G3bl369OnDd77zHR588MGSNoMHD6ZHjx4ANGjQoNTrH3nkEUaM\nGMGhhx7KXnvtxfhKuhjr16/P2LFjycvLY+DAgTRq1IgFCxYAUa2qI444AoAjjzySoUOHVqn8++WX\nX06rVq044IADOPnkkzn++OPp0qUL9evX58wzz2TOnDkAPPzww3znO9+hb9++5OXl8bOf/Yyvv/6a\n119/nZkzZ7J161auuOIK8vLyOOusszjuuONKjnHXXXdx6aWX0q1bN8yM888/nwYNGjBz5szYcVZF\nnCR1G/AYsK+Z/Rp4FfhdItGIiFSzlStXlirXDlHJ+OK0AR1lt2d6fbt27TLOUdiyZctS17TSy7+/\n+eab9O3bl/32249mzZpxxx13VKn8e3pZ+b333nun5dRxVq5cSUFBQck2MyM/P5/i4mJWrly5UxXg\n9LZFRUX84Q9/oEWLFrRo0YLmzZuzYsWKUuXtd6dKk5S73w/8ErgJWAsMcfeHEolGRKSatWnTplS5\ndoBly5aV+kOdqfvugAMOYEXaCMJly5Z941F45557LmeccQbFxcWsW7eOkSNHJjIpb5s2bSgqKl3H\na/ny5bRt23an9wPRe0pp164dY8aMYc2aNaxZs4a1a9eyYcMGzjnnnN0eJ2RIUmbWJPUAlgOTgL8A\ny8I6EZE93vHHH0/Dhg258cYb2bp1K4WFhTz77LMMGzYs1uvPPvtsJk2axPz58/nyyy+59tprv3Es\nGzZsoHnz5tSrV49Zs2YxZcqUUtt3V8I6++yzmTp1Ki+++CJbt27lpptuYq+99uKEE06gZ8+e1KtX\njz//+c9s3bqVxx9/nFmzZpW89uKLL+b2228vWbdx40amTZvGxo0bd0tsZWU6k/oAeD/8TD1/P+25\niMger169ejzzzDNMmzaNVq1aMWrUKB544AE6deoElH8Wlb5uwIABXHHFFfTp04fOnTvTs2dPYOdr\nVxVJ39fEiRMZO3YsTZs25dprr93p7CTTGVrZbZnadu7cmcmTJzNq1Cj23Xdfpk6dyjPPPEPdunWp\nV68ejz/+OJMmTaJly5Y88sgjpcrQH3vssdx1112MGjWKFi1a0LlzZ+67775Yx/0mVE9KRKpFeXWE\n9tSbeTOZP38+Rx11FJs2bdL9VOXY7fWkzOx0M2uattzMzL6zy5GKSK23dMUK3D2xR3UlqCeffJLN\nmzezdu1arrrqKk4//XQlqN0kzqd4jbuX3AHm7uuA38Q9gJkNMLP5ZrbQzK4qZ3sTM3vazN4xs/fM\n7Ptx9y0ikgvuuOMO9ttvPzp16kS9evWYOHFitkOqMSrt7jOzue5+dJl177n7UZXuPLq/aiHQD1gJ\nvAUMdff5aW1GA03cfbSZtQIWAK3dfWuZfam7T2QPpvLxAsmUj59jZjeaWUF4/B6YEzOe7sAidy9y\n9y3AQ8DgMm0caByeNwY+K5ugRESkdoqTpEaFdk+FB8APY+6/LdHw9ZQVYV26W4HDzWwlMBf4ccx9\ni4hIDVfpLOjuvgH4WYIxfBuY4+59zewg4AUz6xKOKyIitViFScrM/uDu/2NmTxDm7Uvn7t+Nsf9i\noH3acn5Yl24EcH3Y52Iz+w9wKPCvsjtLnxOrd+/e9O7dO0YIIiKSawoLCyksLKy0XYUDJ8ysu7vP\nMrN+5W13939WunOzPKKBEP2AVcAsYJi7z0trcxvwibv/2sxaEyWno8Ns6+n70sAJkT2YBk4I7MaB\nE+6emgfjMHf/Z/oDOCxOMO6+jeia1nSimSoecvd5ZjbSzC4Jza4FTjCzd4EXgJ+XTVAiIknKVPL8\n1Vdf5bDDYv3JK6kXtTtUZV+787i5Jk5l3guJBjeku6icdeVy9+eBQ8qsuyPt+Sqi61IiUst0KNif\nomUfJ7b/gvatWVr00S7t46STTmLevHmVNwx257RAVdlXTS0tn+ma1DnAUOBAM3s8bVNjYF3SgYlI\nzVe07GM8/t//KrPDkkuAUj0yDUGfRVRLalH4mXqMAfonH5qISPWZM2cORx99NM2bN2fYsGFs3rwZ\n2Lkrbfbs2SUl3s8++2yGDh1aqpS8u1dYHr6se++9l8MPP5wmTZpw8MEHc+edd1bYtmPHjtxwww0c\nccQRtGzZkosuuqgkxsqOuytl6bMt0zWp/wAvAuvKXJOaFW7MFRGpMR555BGmT5/Of/7zH+bOnVvq\nj3yqK23Lli1897vf5cILL2TNmjUMGzaMJ554otR+MpWHL6t169ZMmzaNL774gkmTJnHllVfyzjvv\nVBjjlClTeOGFF1i8eDELFiwoVRYkybL02ZTxZt4w8CFP9aNEpKb78Y9/TOvWrWnWrBmDBg0qN1m8\n8cYbbNu2jVGjRpGXl8eZZ55J9+7dS7XJVB6+rIEDB9KhQwcATj75ZPr3788rr7xSYYyXX345bdq0\noVmzZowZM6ZUifsky9JnU5yBE58Dc81sOlBS1crdf5pYVCIi1Sy91HrDhg1ZtWrVTm1WrVq1U2n1\nsqPqMpWHL+u5557jmmuuYeHChWzfvp2vvvqKLl26VBhjfn5+yfOCgoJSJdsrK0s/evRo3n//fTZv\n3szmzZsZMmRIhcfJJXGmRXqWaJj4LEoXQBQRqVUOOOAAisvUvypbej6uzZs389///d/8/Oc/59NP\nP2Xt2rUMHDgw471k6ccqKiqiTZs2sY5VXWXpk1BpknL3e9IfwHNA08peJyJS0/Ts2ZO8vDxuu+02\ntm3bxlNPPVWqtHpVpM5oWrVqRZ06dXjuueeYPn16xtfcdtttFBcXs2bNGn77298ydOjQWMeqrCx9\nLotVlcvMWpjZJWb2IvA6UJBsWCIi1SfuPUap0up33303zZs3Z8qUKQwaNChjqfiK9t2oUSMmTJjA\nkCFDaNGiBQ899BCDB5ctElHa8OHD6d+/PwcffDCdOnVizJgxsY5bWVn6XJZpWqR9gDOA4cARRDOg\n/7e7l53FvFpoWiSRPVu55eP3gJt5K9OjRw8uu+wyLrjggkSP07FjR+655x769u2b6HGSVtVpkTIN\nnPiEaB698cBL7r7dzE7fXYF+E3G/7bRtX8CKoqXJBiMiuyzpBJKEl19+mUMOOYRWrVoxefJk3nvv\nPQYMGJDtsGqsTElqHNGMEzcDD5rZ3yhnNvTqdP3szZU3AkZ3rZ9wJCJSWy1YsICzzz6bL7/8kgMP\nPJDHHnus1MjApNTUaY8qE6d8fCdgGGGKJKIZJ55w9yXJh1cqDq9KklLXoEhu0SzoAgmUj3f3Re5+\njbsfDvQA9gMqLdMhIiKyq2KN7ktx93fc/Sp375hUQCIiIilVSlIiIiLVSUlKRERyVpy5+0REdllB\nQUGtHaEmOxQUVG0uiEqTlJkdBFwHHA7slVrv7p2rGpyI1F5Lly7NdgiyB4rT3XcvMAkwYCDwMPC3\nBGMSEREB4iWphu7+dwB3X+zuvyRKViIiIomKc01qk5nVARab2aVAMdA42bBERETiJakrgX2AK4iu\nTTUFLkwyKBEREYiRpNz9zfB0PXB+suGIiIjsEGd0X1dgNFENqZL27t41wbhERERidfc9SJSk3gO2\nJxuOiIjIDnGS1Gp3fzzxSERERMqIk6R+bWa3E818vim10t2fTiwqERER4iWpc4EuRMPOU919DihJ\niYhIouIkqR7ufkjikYiIiJQRZ8aJN81MSUpERKpdnDOpY4B3zexDomtSBriGoIuISNLiJKkzEo9C\nRESkHHFmnFgMYGYtSCvVISIikrRKr0mZ2WlmthBYAbwJLAdmJB2YiIhInIET1wEnAgvcvR0wAHgl\n0ahERESIl6S2uvunQB0zM3d/AeiecFwiIiKxBk58bmaNgNeA+83sE+CrZMMSERGJdyZ1BvA18GOg\nkKjo4aAEY9plDfLyMLPYjw75+dkOWUREyhFndN96M9sXOA5YCTwduv9iMbMBwC1ECfEed/9dOW16\nA38E6gGfunufuPsvz6Zt21g3fnzs9s2q0FZERKpPnNF9I4DZwHDgPOBfZnZBnJ2HsvO3At8GjgCG\nmdmhZdo0BW4DvuPuRwJDqvQORESkxopzTeoXQNfU2VM4q3oVuC/Ga7sDi9y9KLz2IWAwMD+tzXDg\nMXcvBnD31fHDFxGRmizONak1wLq05XVhXRxtie6rSlkR1qXrDLQwsxfN7C0zU4l6EREBMpxJmdkV\n4ekC4A0ze5KoRMcZwPu7OYauQF9gn3CsN9z9w914DBER2QNl6u7bN/xcHh4NwvLzVdh/MdA+bTk/\nrEu3gqj679fA12b2MnA0sFOS+sft15Q8P7BbLw7s1qsKoYiISK4oLCyksLCw0nbm7pU3MtsLICSS\n2Mwsj+hMrB+wCpgFDHP3eWltDgX+TDSTRQOiqZfOcfd/l9mXXz97c6zjju5av8qj++J8DiIikgwz\nw92t7PqM16TM7GIzWwJ8BHxkZovN7JK4B3X3bcAoYDrwAfCQu88zs5Gp/bj7fODvwLvATODOsglK\nRERqp0zXpEYDvYEB7r4wrOsM/MnMWrr79XEO4O7PA4eUWXdHmeWbgJuqFrqIiNR0mc6kvg+ckUpQ\nAOH5WcCIhOMSERHJmKTc3Xeao8/dvwS2JxeSiIhIJFOSWhWmKyrFzHoRXaMSERFJVKYh6FcAT5rZ\ni8DbYV03outUKikvIiKJq/BMyt3fA44kGjZ+aHjMAo4K20RERBKVce6+cE3qzmqKRUREpJQ4c/eJ\niIhkhZKUiIjkrMpmnMgzs/urKxgREZF0GZNUmNboQDOrV03xiIiIlIhT9HAx8IqZPQVsTK109wmJ\nRSUiIkK8JLUsPBqGh4iISLWoNEm5+1gAM9s7LO80VZKIiEgSKh3dZ2aHm9lbwCJgkZm9aWaHJR+a\niIjUdnGGoN8JXO3u+e6eD4wB7ko2LBERkXhJqrG7v5BacPd/AI2TC0lERCQSJ0ktNbPRZpYfHr8A\nliYcl4iISKwkdSHQDpgGTAXywzoREZFEZSoff6+7fx8Y5u4/rL6QREREIpnOpLqb2X7AxWbW2Mya\npD+qK0AREam9Mt0ndTfwGtAe+ACwtG0e1ouIiCQmU9HDm929E3C/u7d393ZpDyUoERFJXKUDJ9z9\n4uoIREREpCzVkxIRkZylJCUiIjkrztx9l5lZ0+oIJlsa1Aczi/XoULB/tsMVEak14pTqKABmm9mb\nwF/CtEg1yqbN4PPitbXDPk42GBERKRFn4MQvgE7AX4FLzWyRmV1jZh0Sjk1ERGq5WNek3H070Xx9\nS4HtwAHAU2Z2fWKRiYhIrVdpd5+Z/Qi4APgCuAcY4+6bzKwO8CEwOtkQRUSktopzTaoN0fx9i9NX\nuvt2Mzs9mbBERETidfe1LZugzOxeAHd/P4mgREREIF6S6pK+ELr5jksmHBERkR0qTFJmdpWZrQW6\nmNma8FgLrCaqLSUiIpKoTGdSNwL7An8MP/cFWrl7C3f/3+oITkREardMAycOdvdFZvYAcERqpVlU\nscPd3004NhERqeUyJanRRGXibytnmwOnJBKRiIhIUGGScvcLw8+Tqy8cERGRHSpMUpXdA+XuT8c5\ngJkNAG4huv51j7v/roJ2xwGvA+e4++Nx9i0iIjVbpu6+IRm2OVBpkgrD1W8F+gErgbfM7Cl3n19O\nuxuAv1casYiI1BqZuvvO3w377w4scvciADN7CBgMzC/T7nLgUXT/lYiIpMnU3TfM3R80syvK2+7u\nE2Lsvy2wPG15BVHiSj9OG+AMd+9jZqW2iYhI7Zapu695+LlvwjHcAlyVtmwJH09ERPYQmbr7Joaf\nY3dh/8VA+7Tl/LAuXTfgIYtuwGoFDDSzLeUNzPjH7deUPD+wWy8O7NZrF0ITEZFsKSwspLCwsNJ2\n5u6ZG0TFDf8I9AyrXgP+x92XVrpzszxgAdHAiVXALKIZ1cutg2tmk4BnyhvdZ2Z+/ezNlR0SgNFd\n67Nu/PhYbQGajR9fhcq8UNlnJiIiVWNmuPtOPWlxJph9kGgkX/vweCasq5S7bwNGAdOBD4CH3H2e\nmY00s0vKe0mc/YqISO0Qp57UPu4+KW35XjO7Mu4B3P154JAy6+6ooO2FcfcrIiI1X6bRfU3C02lm\n9jPgIaIznXOAqdUQm4iI1HKZzqQ+IEpKqT7CH6dtc+DqpIISERGBzKP72lVnICIiImXFuSaFmR0K\nHA7slVrn7lOSCkpERARiJCkz+yXQHziUaG69bwOvAkpSIiKSqDhD0M8B+gCrwnx+RwP7JBqViIgI\n8ZLUV+F+p61m1hj4CChINiwREZF416TmmFkz4C/Av4AviGaOEBERSVSlScrdR4ant5nZ34Em7j47\n2bBERETij+47HTiJ6P6oVwElKRERSVyl16TM7M9EN/IuAj4ErjCzOLWkREREdkmcM6lvAYd7mPrb\nzP4CvJ9oVCIiIsQb3fcfojpQKQcAi5MJR0REZIdME8w+QXQNai9gnpnNDJt6AG9WQ2wiIlLLZeru\nu7XaohARESlHpglm/5l6bmatiMq8A/zL3VcnHZiIiEic0X1nEQ05Px/4HvAvMzsz6cBERETijO77\nFXCcu38MYGaticrBP5FkYCIiInFG99VJJajgk5ivExER2SVxzqReMLOpwINheShRyQ4REZFExUlS\n/wMMIZoWCeA+4NHEIhIREQkyJikzywOed/f/Ah6unpBEREQiGa8thTpSeWbWpJriERERKRGnu+9z\nYK6ZTQc2pla6+08Ti0pERIR4SerZ8BAREalWlV2TOgr4DPjA3RdVT0giIiKRCq9JmdnVwJPAuUTD\n0C+stqhERETIfCZ1LtDF3Tea2b7ANOAv1ROWiIhI5tF9m9x9I4C7f1pJWxERkd0u05nUgWb2eHhu\nwEFpy7j7dxONTEREar1MSeqsMsuqLyUiItUqVj0pERGRbNB1JhERyVlKUiIikrOUpEREJGfFKR//\nvJk1S1tuHupLiYiIJCrOmVRrd1+XWnD3tUCb5EISERGJxElS280sP7VgZu0TjEdERKREnCT1K+A1\nM5tkZvcCLwNXxz2AmQ0ws/lmttDMripn+3Azmxser4ZJbUVERCov1eHuU82sO9AzrPq5u38SZ+dm\nVofoJuB+wErgLTN7yt3npzVbApzi7p+b2QDgLqBHVd6EiIjUTJlmQe8UfnYBWhMlkyXA/mFdHN2B\nRe5e5O5bgIeAwekN3H2mu38eFmcCbav2FkREpKbKdCb1C+Ai4LZytjlwSoz9twWWpy2vIEpcFfkB\n8FyM/YqISC2QaVqki8LPk6sjEDPrA4wATqqO44mISO6r9JqUmTUARhIlDwdeAe5y900x9l8MpI8G\nzA/ryh6jC3AnMCAMcS/XP26/puT5gd16cWC3XjFCEBGRXFNYWEhhYWGl7czdMzcwewjYBEwOq4YD\ne7v70Ep3bpYHLCAaOLEKmAUMc/d5aW3aA/8Eznf3mRn25dfP3lzZIQEY3bU+68aPj9UWoNn48eyI\nKDM7DCr7zEREpGrMDHe3susrPZMiqs57eNryC2b27zgHdfdtZjYKmE40SOMed59nZiOjzX4nMBZo\nAUw0MwO2uHum61YiIlJLxElSc83sOHd/C8DMjgXmxD2Auz8PHFJm3R1pzy8GLo67PxERqT3iJKmj\ngDfNbElY7gjMM7M5RGdDXROLTkREarU4SWpw5U1ERER2vzgzTiw2syOA1FD0V9z9g2TDEhERiVeq\nYxTwCNFQ8vbAw2b2w6QDExERidPddwnQ3d03AJjZb4HXgYlJBiYiIhJnFnQD0m9Q2hLWiYiIJKrC\nMykzq+vuW4EHiEb3PRY2nQncVx3BiYhI7Zapu28W0NXdbzSzQnbMqXdp6p4pERGRJGVKUiVdeu4+\niyhpiYiIVJtMSWpfM/tpRRvd/eYE4hERESmRKUnlAY3QIAkREcmSTElqlbtfk2G7VFGH/HyKineq\nVFKugrYBcrr/AAAPlElEQVRtWbpiRcIRiYjktljXpGT3KCoujl1CpPVvxxNNCh9PQfvWLC366BtG\nJiKSmzIlqX7VFoXsZNNmYte4ArDDPk4uGBGRLKnwZl53X1OdgYiIiJQVZ8YJySC/oANmFushIiJV\nE2fuPsmgeFkRVSlrLyIi8elMSkREcpaSlIiI5CwlKRERyVlKUiIikrOUpEREJGcpSYmISM5SkhIR\nkZylJCUiIjlLSUpERHKWkpSIiOQsJakaqCrzCZoZ+QUdYu+7Q8H+sffboWD/5N7kHhBXVWKq7s9L\nZE+huftqoKrMJwhVm1OwaNnHsUuIVGf5kFyMqyoxgcqtiJRHZ1IiIpKzlKREaplc7BoVqYi6+4QG\neXmqd1WL5GLXqEhFlKSETdu2sW78+Fhtm8VsJyKyO6i7T3JSh/z8Ko2ME5GaSWdSkpOKiotjn92B\nzvBqgg4F+1O0LF73YkH71iwt+ijhiHIzJsjduJKgJCVSBR3y8ykqLs52GDvJ1biqIhevleViTJCb\ncVUlcUL85KkkJdUmv6ADxcuKsh3GLqnKGV51nt3lalxSeyR1X6CSlFSbqtxkXJUbjCU31YSzO8m+\nxJOUmQ0AbiEapHGPu/+unDYTgIHARuD77v5O0nGJSLJy9bpiribPXI0r2xJNUmZWB7gV6AesBN4y\ns6fcfX5am4HAQe7eycyOB24HeiQZl0i6mtANKfHlatdoLsa1/7778vHq1dVyrIokfSbVHVjk7kUA\nZvYQMBiYn9ZmMHA/gLu/aWZNzay1u+suQqkW6oYUKd/Hq1dnPXEmfZ9UW2B52vKKsC5Tm+Jy2ojU\nKlWdyT4X4xLZHTRwQiQHJTmT/a7I1bPOXOyyzcWYIHfjqoi5e3I7N+sBjHf3AWH5F4CnD54ws9uB\nF939b2F5PtCrbHefmSUXqIiIZJ2773QKnvSZ1FvAwWZWAKwChgLDyrR5GvgR8LeQ1NaVdz2qvOBF\nRKRmSzRJufs2MxsFTGfHEPR5ZjYy2ux3uvs0MzvVzD4kGoI+IsmYRERkz5Fod5+IiMiuqHGzoJvZ\nADObb2YLzeyqbMcDYGb3mNnHZvZutmNJMbN8M5thZh+Y2XtmdkW2YwIwswZm9qaZzQlxjct2TClm\nVsfMZpvZ09mOJcXMlprZ3PB5zcp2PCnhVpJHzGxe+B07Pgdi6hw+p9nh5+e58HtvZlea2ftm9q6Z\n/dXMcuI+BzP7cfg/mNW/DzXqTCrcPLyQtJuHgaHpNw9nKa6TgA3A/e7eJZuxpJjZ/sD+7v6OmTUC\n3gYGZ/uzAjCzhu7+pZnlAa8BV7h71v8Am9mVwLFAE3c/PdvxAJjZEuBYd1+b7VjSmdm9wEvuPsnM\n6gIN3f2LLIdVIvytWAEc7+7LK2ufYBxtgFeBQ919s5n9DZjq7vdnK6YQ1xHAg8BxwFbgOeBSd19S\n3bHUtDOpkpuH3X0LkLp5OKvc/VUgp/6IuPtHqemn3H0DMI8cuT/N3b8MTxsQXTfN+jcpM8sHTgXu\nznYsZRg59v/YzJoAJ7v7JAB335pLCSr4FrA4mwkqTR6wTyqZE33BzrbDgDfdfZO7bwNeBr6bjUBy\n6pd7N4hz87CUYWYdgP8HvJndSCKhW20O8BHwgru/le2YgD8C/0sOJMwyHHjBzN4ys4uzHUzQEVht\nZpNC19qdZrZ3toMq4xyiM4WscveVwB+AZUQTGaxz939kNyoA3gdONrPmZtaQ6Atau2wEUtOSlFRR\n6Op7FPhxOKPKOnff7u7HAPnA8WZ2eDbjMbPTgI/DmaeFR6440d27Ev0R+VHoWs62ukBX4LYQ25fA\nL7Ib0g5mVg84HXgkB2JpRtTbUwC0ARqZ2fDsRgWh2/93wAvANGAOsC0bsdS0JFUMtE9bzg/rpByh\ne+FR4AF3fyrb8ZQVuoheBAZkOZQTgdPD9Z8HgT5mltVrBinuvir8/BR4gqjLO9tWAMvd/V9h+VGi\npJUrBgJvh88s274FLHH3NaFb7XHghCzHBIC7T3L3bu7eG1hHdL2/2tW0JFVy83AYITOU6GbhXJBr\n38AB/gL8293/lO1AUsyslZk1Dc/3Bv6L0hMSVzt3v9rd27v7gUS/UzPc/XvZjAmiASbhTBgz2wfo\nT9RNk1XhZvzlZtY5rOoH/DuLIZU1jBzo6guWAT3MbC+LJjzsR3R9OOvMbN/wsz1wJjAlG3HUqLn7\nKrp5OMthYWZTgN5ASzNbBoxLXVTOYkwnAucC74XrPw5c7e7PZzMu4ADgvjD6qg7wN3efluWYclVr\n4IkwZVhd4K/uPj3LMaVcAfw1dK0tIUdu0g/XV74FXJLtWADcfZaZPUrUnbYl/Lwzu1GVeMzMWhDF\n9cNsDX6pUUPQRUSkZqlp3X0iIlKDKEmJiEjOUpISEZGcpSQlIiI5S0lKRERylpKUiIjkLCWpWsTM\nWqSVKlhlZivSlqt0z1woP9JpF+NpaGYv7so+0vZ1ZVVLHJhZPzN7opz1F5nZH3dHXFWMp9LP1Mwe\nMLOdZmA3s45mds43OObNoRTDb8us72Nmu2X2Cos8b2ZrzezxMtsOtKg0y0Izmxxmvk9tm2hmi8zs\nHTOrtuoBZnZQuHewou0NzOylcPOtJExJqhYJU68cE+ZT+z/g5tSyu2+t4r4ucvdFuxjSD4CHd3Ef\nhD9sPwX2+gYvr+hGwWq/gXAXP9ODiGbDiC38kR3h7ke5+9VlNvcFen7DWErx6GbM3wEXlLP598AN\n7t4Z+Ar4fohtENDW3TsBPwIm7o5YypOeGNNU+O/v7puAQmBIUjHJDkpStVepb4Fm9vPwjfrdMGtH\n6hvl+2b2oJn928weMrMGYdsrqW+3Znaamb0dzsqeD+v6hm/As83sXxXMgn0u8FRo3ybsc3aIoUdY\nf15YftfMrgvr8sK38j+a2TtEs5PvB7xiZtNDm4Fm9no49oOp44dY55vZv8hcxqWDmRWa2QIzuzq8\n9joz+1HaZ3aDmV1W5nP8hZldGp7/2cz+Hp7/l0U1llKFOcuLLf0zHRmO/YaZ3WVmN6cdpq+ZvWZm\nH5pZ6j1cD/QOn9+oMjGZmf0h/PvONbNUyYVniSY0nZ22DjM7kOgLxM/Cth5m1sGiIpnvmNnfLaqD\nlDqzmxjey3wzK3eeRXd/EdhYJq46wCnAk2HVfcAZ4flg4P7w2teA1mbWsszrh5rZ78Lz/zGzBeF5\nJzMrDM/7h9/LuWZ2h4UeAzNbbmbXm9nbwBlm1i20mQ1cmnaMo8xsVvgc3rGoYgBEv7fnlvdeZTdz\ndz1q4QMYB/w0PO9ONB1LfaAR0TxrRxB9O98OHBfa3UdUgBDgFaAL0dQ8RUB+WN8s/JyW9rqGhNlN\n0o7fAFiRtvxz4H/DcwuvaQv8B2hOVHOnkGi277wQ1+C01y8DGofn+4a2e4Xlq4lm4d6bqJRLh7D+\nUeDxcj6bi0K7JiGOD8J7PQiYFdrUARYDTcu89kSi6YkgKmY3M7yfa4imBio3tjKfaT7RVEJNiKY7\neo3orBfggbT9HwXMC8/7lfdewraziQrpEf69lgGtwue4poLX/Cb1b5327zk0PL8YeCQtnqfD885h\n3/Uq2GepGEMs/05b7gDMDs+fA7qnbSsEupTZX1vgtfD8CaJSM/sCFwK/LuffezLR9D6E9T9J29f7\nQI/w/Oa0OCYCQ8LzekD98DyPaGb8rP9frukPnUkJwEnAY+6+2aNyHU8CJ4dtS3xHPafJoW26nkQT\nrq4AcPd1Yf1rwITwrb6ph//ZafYD1qQtvwX8wMzGAkd5VPjweOCf7r7WoxmipxB98wbY5KVnbk+f\nwPcE4HDgdYuuLQwn+gN4OLDA3ZeGdn/N8Jn83d2/CHE8CZzk7ouBLyyqWjqQqCjc52Ve9xZwnEWT\n5G4Iy8cSfZ6vVBBbQZl9pN73Fx51wz5aZvuTAO7+HlF5h8qcRJhQ1aPJX18BuoVtca+rHA/8LTy/\nn9K/Bw+HfS8kSlK7dK0yLncvBlpYNB/f/iGOXuz4rA+j9L/3/ez4/YHwfsIZ2l7uPjOsfyCtzevA\nWDP7X6C9u28Ox94GbE/1LEhylKSkqsrrq9/pD527X0f0jbsRMNPMDirT5CvSriF51B3UG1hFNMHs\nsIr2nfb6ihjwnEfX2o5x9yPd/dK0bXGUfZ+p5XuIzohGEM0iX7pR9EdsJfA9ojOpV4jOINq7+4cV\nxHZZ2f1UEuemmO0qkv6auNfeMrVL32ZV2OenQCuzkgEI6aV1iildZK+isjszic58PyD6rE8mSqiv\np8VTkY0ZtgHg7pOJuiA3Ac9b6Xpd9T26PiUJUpISiP5zn2nRqKVGRNcDXgnbOprZseH58LT1Ka8T\nXQtpD2BmzcPPA939fXe/AZgNHJL+IndfDeyddo2gPVH3yd3AvcAxRN03vS2qDlqXaGBAYdhF2T8+\nXxB1j6Vi6mVmHcO+G5rZwUTdmKlSLkZUsqEi/c2sSfiWPpjozBCiej+DgKO94gqqrwA/Iyq5/SrR\nhf9UbaWKYks3K7zvJhbNIp6pbHfqc1gPNM4Qz9Bwbao10dlcKp6K/oivZ8fnCVEyODs8Pz+8t5Qh\n4b10JkomFQ3+KFWuxt23h9hS7+8CwjVKohI73wv7PQn4yN0/K2efrxJ91i8R/Z59G1gfzoDnEf17\ndwhtz2PH70+JsN+vzOz4sKrkWpOZdXT3Je4+gegaXuqa4X6oVl21UJISQnfeg0R/uF4nqqj6Qdg8\nD/ipmf2bqI//rtTLwms/AS4DngrdV5PD9p+FC/XvEP3BK6+ExD/YUeCtH5C6cH0m8OfQnTOWHX+A\nXvcdpUTKflu/C/iHmU0PMf0A+Fs4/mtAJ3f/KsT6PFEiWJnhY3mL6A/lHKJrQO+G97uJ6A90pnpE\nrxB1Z870qDz45vCa1Od1UdnY0t+Tuy8nGvX2VnjdYuDz9DZpUstzgLphkMCoMm0eJarJ9S7Rv8OV\n4UtCeftLeQo426IBMT2IEu3IEPMQ4Mq0tsUWDUR5CrjYyxkpamavE3Wv9jezZWbWJ2z6OXCVmS0E\n9iH6ggLwDLDSzD4EbgvHL88rRInx5XDcFez4rL8i+qyfMLO5wNfA3RW87wuBO8PvX3oF2uEWDR6a\nQ/TvlPr97gNMrSAm2Y1UqkMqFLroHvWolHsS++8GXObuFyWx/ySEEWlziAZtLE3wOPu4+8ZwBvkU\nMNHdc+6Popk9QDSIIleKi1YLM3uSKNn/J9ux1HQ6k5LKJPYtxqPy4q8mtf/dzcyOBD4EpiWZoILf\nhG/1c4H5uZigglr3Ldeim8YfUYKqHjqTEhGRnKUzKRERyVlKUiIikrOUpEREJGcpSYmISM5SkhIR\nkZylJCUiIjnr/wNRBmUj3gTArwAAAABJRU5ErkJggg==\n", + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "a = np.sort(tpm_low_alpha.predict(obama,output_type='probability')[0])[::-1]\n", + "b = np.sort(topic_model.predict(obama,output_type='probability')[0])[::-1]\n", + "c = np.sort(tpm_high_alpha.predict(obama,output_type='probability')[0])[::-1]\n", + "ind = np.arange(len(a))\n", + "width = 0.3\n", + "\n", + "def param_bar_plot(a,b,c,ind,width,ylim,param,xlab,ylab):\n", + " fig = plt.figure()\n", + " ax = fig.add_subplot(111)\n", + "\n", + " b1 = ax.bar(ind, a, width, color='lightskyblue')\n", + " b2 = ax.bar(ind+width, b, width, color='lightcoral')\n", + " b3 = ax.bar(ind+(2*width), c, width, color='gold')\n", + "\n", + " ax.set_xticks(ind+width)\n", + " ax.set_xticklabels(range(10))\n", + " ax.set_ylabel(ylab)\n", + " ax.set_xlabel(xlab)\n", + " ax.set_ylim(0,ylim)\n", + " ax.legend(handles = [b1,b2,b3],labels=['low '+param,'original model','high '+param])\n", + "\n", + " plt.tight_layout()\n", + " \n", + "param_bar_plot(a,b,c,ind,width,ylim=1.0,param='alpha',\n", + " xlab='Topics (sorted by weight of top 100 words)',ylab='Topic Probability for Obama Article')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here we can clearly see the smoothing enforced by the alpha parameter - notice that when alpha is low most of the weight in the topic distribution for this article goes to a single topic, but when alpha is high the weight is much more evenly distributed across the topics.\n", + "\n", + "__Quiz Question:__ How many topics are assigned a weight greater than 0.3 or less than 0.05 for the article on Paul Krugman in the **low alpha** model? Use the average results from 100 topic predictions." + ] + }, + { + "cell_type": "code", + "execution_count": 52, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "+---------------------+-------------------------------+\n", + "| average predictions | topics |\n", + "+---------------------+-------------------------------+\n", + "| 0.629444444444 | art and publishing |\n", + "| 0.0176543209877 | Great Britain and Australia |\n", + "| 0.0161111111111 | international athletics |\n", + "| 0.0146913580247 | American college and politics |\n", + "| 0.0136419753086 | science and research |\n", + "| 0.0131481481481 | team sports |\n", + "| 0.0129012345679 | general politics |\n", + "| 0.010987654321 | Business |\n", + "+---------------------+-------------------------------+\n", + "[? rows x 2 columns]\n", + "Note: Only the head of the SFrame is printed. This SFrame is lazily evaluated.\n", + "You can use sf.materialize() to force materialization.\n", + "8\n" + ] + } + ], + "source": [ + "paul_krugman = gl.SArray([wiki_docs[int(np.where(wiki['name']=='Paul Krugman')[0])]])\n", + "paul_krugman_low_alpha = average_predictions(tpm_low_alpha, paul_krugman, 100)\n", + "print paul_krugman_low_alpha[(paul_krugman_low_alpha['average predictions'] > 0.3) |\n", + " (paul_krugman_low_alpha['average predictions'] < 0.05)\n", + " ]\n", + "print len(paul_krugman_low_alpha[(paul_krugman_low_alpha['average predictions'] > 0.3) |\n", + " (paul_krugman_low_alpha['average predictions'] < 0.05)\n", + " ])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "__Quiz Question:__ How many topics are assigned a weight greater than 0.3 or less than 0.05 for the article on Paul Krugman in the **high alpha** model? Use the average results from 100 topic predictions." + ] + }, + { + "cell_type": "code", + "execution_count": 53, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "+---------------------+--------------------+\n", + "| average predictions | topics |\n", + "+---------------------+--------------------+\n", + "| 0.629444444444 | art and publishing |\n", + "| 0.010987654321 | Business |\n", + "+---------------------+--------------------+\n", + "[? rows x 2 columns]\n", + "Note: Only the head of the SFrame is printed. This SFrame is lazily evaluated.\n", + "You can use sf.materialize() to force materialization.\n", + "2\n" + ] + } + ], + "source": [ + "paul_krugman_high_alpha = average_predictions(tpm_high_alpha, paul_krugman, 100)\n", + "print paul_krugman_low_alpha[(paul_krugman_high_alpha['average predictions'] > 0.3) |\n", + " (paul_krugman_high_alpha['average predictions'] < 0.05)\n", + " ]\n", + "print len(paul_krugman_low_alpha[(paul_krugman_high_alpha['average predictions'] > 0.3) |\n", + " (paul_krugman_high_alpha['average predictions'] < 0.05)\n", + " ])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Changing the hyperparameter gamma\n", + "\n", + "Just as we were able to see the effect of alpha by plotting topic weights for a document, we expect to be able to visualize the impact of changing gamma by plotting word weights for each topic. In this case, however, there are far too many words in our vocabulary to do this effectively. Instead, we'll plot the total weight of the top 100 words and bottom 1000 words for each topic. Below, we plot the (sorted) total weights of the top 100 words and bottom 1000 from each topic in the high, original, and low gamma models." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now we will consider the following two models:\n", + " - tpm_low_gamma, a model trained with gamma = 0.02 and default alpha\n", + " - tpm_high_gamma, a model trained with gamma = 0.5 and default alpha" + ] + }, + { + "cell_type": "code", + "execution_count": 54, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "del tpm_low_alpha\n", + "del tpm_high_alpha\n", + "tpm_low_gamma = gl.load_model('topic_models/lda_low_gamma')\n", + "tpm_high_gamma = gl.load_model('topic_models/lda_high_gamma')" + ] + }, + { + "cell_type": "code", + "execution_count": 55, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAakAAAEbCAYAAABgLnslAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XmclWX9//HXm1XJDaRUZmRAUVNxQ1JQFNQyxFzQHwiS\nW2VIkoUtZqZImaaZlanh9nXBfV/SzFzGDRQMxDQQl9hdQVQwZfv8/rjvGc6MM4d7YM6cw/B+Ph7n\nMfdynev+nDMz53Pu677u61JEYGZmVopaFDsAMzOz+jhJmZlZyXKSMjOzkuUkZWZmJctJyszMSpaT\nlJmZlayCJylJ/SVNlzRD0hn1lOknaYqkVyQ9WeiYzMxs3aBC3iclqQUwAzgImA9MAoZExPScMpsC\n44GDI2KepI4R8UHBgjIzs3VGoc+k9gJej4hZEbEMuA04olaZY4G7I2IegBOUmZlVKXSSKgPm5KzP\nTbfl2h7oIOlJSZMkHVfgmMzMbB3RqtgBkMTQAzgQ+BIwQdKEiHijuGGZmVmxFTpJzQM656yXp9ty\nzQU+iIjPgM8kPQ3sBtRIUpI8yKCZWTMWEaq9rdDNfZOAbpIqJLUBhgAP1CpzP9BHUktJ7YC9gWl1\nVRYRmR6jR4/OXLYpH45r3Y7Jca37MTmu0o2pPgU9k4qIFZJGAo+SJMRrI2KapOHJ7rgqIqZL+gfw\nMrACuCoi/lPIuMzMbN1Q8GtSEfEIsEOtbVfWWr8YuLjQsZiZ2bqlWY440a9fv2KHUCfHlV0pxgSO\nqyFKMSZwXA1RCjEV9GbexiQp1pVYzcysYSQRdXScKIUu6Ga2HujSpQuzZs0qdhhWZBUVFcycOTNz\neZ9JmVmTSL8pFzsMK7L6/g7qO5NqltekzMyseXCSMjOzkuUkZWZmJctJyszWe127duWJJ54odhhW\nBycpMyua8oouSCrYo7yiS7Ffoq0ld0E3s6KZN3sWF0xeWrD6z+zRpmB1W9PwmZSZWY6lS5fy4x//\nmLKyMsrLyxk1ahTLli0DkhEY7r33XgCee+45WrRowd///ncAnnjiCfbYY4866/zss8844YQT6NCh\nAzvvvDO///3v2Xrrrav3X3jhhXTr1o1NNtmE7t27c99991Xvu+GGG+jTpw+nn3467du3p1u3bkyY\nMIEbbriBzp07s+WWW3LjjTdWlz/ppJM49dRTGTBgABtvvDH77bcf7777LqNGjaJDhw7stNNOTJ06\nNdOxS4GTlJlZjvPOO4+JEyfy8ssvM3XqVCZOnMh5550HQN++famsrATg6aefZtttt+Xpp58G4Kmn\nnqp3GKFzzz2X2bNnM3PmTP75z39y0003Ia26Jahbt24899xzfPzxx4wePZpvf/vbvPvuu9X7J06c\nyO67787ChQsZOnQoQ4YM4cUXX+TNN99k3LhxjBw5kk8//bS6/J133sn555/PggULaNOmDb1796Zn\nz54sWLCAo48+mlGjRmU+drE5SZmZ5bjlllsYPXo0m2++OZtvvjmjR49m3LhxQJKknnrqKSBJUmee\neWb1+lNPPUXfvn3rrPPOO+/krLPOYpNNNqFTp06cdtppNfYfffTRbLHFFgAMGjSI7bbbjokTJ1bv\n79q1K8cffzySOOaYY5g7dy6jR4+mdevWfOMb36BNmza88caqKfgGDhzI7rvvTps2bRg4cCAbbrgh\nw4YNq37+Sy+9lPnYxeYkZWaWY/78+XTuvGqu1oqKCubPnw9A7969mTFjBu+99x5Tp07l+OOPZ86c\nOSxYsICJEyey//7711tneXl59XpuUx/AjTfeyB577EH79u1p3749r776Kh988EH1/qokArDhhhsC\n0LFjxxrbFi9eXG/52uu5ZVd37GJzkjIzy9GpU6caYwzOmjWLTp06AckH/J577smf//xnunfvTqtW\nrejduzeXXHIJ3bp1o0OHDvXWOXfu3Or12bNn11j+/ve/zxVXXMGHH37Ihx9+yM4779wkQ0gV89hZ\nOUmZmeUYOnQo5513Hh988AEffPABv/nNbzjuuOOq9++///5cdtll1U17/fr1q7Fel0GDBnHBBRew\naNEi5s2bx+WXX169b8mSJbRo0YKOHTuycuVKrrvuOl555ZW8Ma5tEql6/pocu6k5SZnZei+3E8Ov\nfvUrevbsya677spuu+1Gz549Oeuss6r39+3bl8WLF1c37VWt50tS55xzDmVlZXTt2pWDDz6YQYMG\n0bZtWwB23HFHfvKTn9CrVy+23HJLXn31Vfr06ZM53rrWs77eNTl2U/Mo6GbWJOoa/bq8ogvzZhdu\n+o6yzhXMnTWzYPWvqbFjx3L77bfz5JNPFjuUJudR0M1snTF31kwiomCPUklQ77zzDuPHjycieO21\n1/jDH/7AUUcdVeyw1gkeccLMrMCWLl3K8OHDmTlzJpttthlDhw5lxIgRxQ5rneDmPjNrEp700MDN\nfWZm1ow4SZmZWclykjIzs5LlJGVmZiXLScrMzEqWk5SZWQONGDGC3/72t41eNp9Zs2bRokULVq5c\nudZ1NdRJJ53EOeeck6ls165deeKJJxrt2L5PysyKpkt5ObPmzStY/RVlZczMGdi1sfz1r38tSNnV\naejwR82Bk5SZFc2sefNYdO65Bat/swLUvXLlSlq0cCNUU1ntOy3pcEkbp8s/lXSLpF0KH5qZWdOY\nPn06BxxwAO3bt2eXXXbhwQcfrN530kkn8YMf/IBDDz2UjTfemMrKyi80f1100UV06tSJ8vJyrr32\nWlq0aMFbb71V/fyqsk899RRbb701l1xyCVtssQVlZWVcf/311fU8/PDD9OjRg0033ZSKigrGjBmT\n+TV07dqViy++mN12242NN96Yk08+mffee48BAwawySabcPDBB/PRRx9Vl3/ggQfo3r07HTp04MAD\nD2T69OnV+6ZMmcKee+7JpptuypAhQ/jss89qHOtvf/tb9RxUffr04d///nfmOBsqy9eB8yLiE0m9\ngCOBu4Grsh5AUn9J0yXNkHRGHfv7SlokaXL6+FX28M3M1s7y5cs57LDD6N+/P++//z6XXnopw4YN\n4/XXX68uc+utt3L22WfzySefsO+++9Z4/iOPPMKf/vQnnnjiCd544w0qKyvzNsu98847fPLJJ8yf\nP59rrrmGU089tTp5bLTRRowbN46PPvqIhx56iLFjx/LAAw9kfi333HMPjz/+ODNmzOCBBx5gwIAB\n/O53v+ODDz5gxYoVXHrppQDMmDGDY489lksvvZT333+fQw45hMMOO4zly5ezbNkyBg4cyAknnMDC\nhQsZNGgQd999d/UxpkyZwne/+12uvvpqFi5cyPDhwzn88MNZtmxZ5jgbIkuSWpH+PAy4MiLuBtpm\nqVxSC+Ay4JvAzsBQSV+to+jTEdEjfZyXpW4zs8bw/PPPs2TJEs444wxatWrFAQccwLe+9S1uvfXW\n6jJHHHEEvXr1AqieYqPKnXfeyUknncRXv/pVNthgA85dTRNjmzZtOPvss2nZsiWHHHIIG220Ea+9\n9hqQzFW18847A9C9e3eGDBlSPT19Fj/84Q/p2LEjW221Ffvttx977703u+66a/U08lOmTAHgjjvu\n4Fvf+hYHHnggLVu25Kc//SmfffYZ48eP5/nnn2f58uWcdtpptGzZkqOPPpqvfe1r1ce4+uqrOeWU\nU+jZsyeSOO6442jbti3PP/985jgbIkuSelfSn4EhwEOS2gAtM9a/F/B6RMyKiGXAbcARdZRb/64G\nmllJmD9//hemc6+oqGBeToeO2vvzPX/rrbfOO0bh5ptvXuOaVrt27aqnc3/hhRc48MAD+cpXvsJm\nm23GlVde2aCp3LNOGz9//nwqKiqq90mivLycefPmMX/+fMrKymrUm1t21qxZ/OEPf6BDhw506NCB\n9u3bM3fuXObPn585zobIkqQGAS8Ah0fEQuDLwFn5n1KtDJiTsz433VZbb0kvSXpI0k4Z6zYzW2ud\nOnVizpw5NbbNnj27xgd1vua7rbba6gtTw69pL7xhw4Zx5JFHMm/ePBYtWsTw4cMLMihvp06dmDWr\n5jxec+bMoays7AuvB2pOd7/11ltz1llnsXDhQhYuXMiHH37I4sWLOeaYYxo9TsiTpCS1k9SOpLnv\nPuC/6fqHQON1god/AZ0jYneSpsH7GrFuM7O89t57b9q1a8dFF13E8uXLqays5G9/+xtDhw7N9PzB\ngwdz3XXXMX36dD799FPOO2/Nr1gsXryY9u3b07p1ayZOnMgtt9xSY39jJazBgwfz0EMP8eSTT7J8\n+XIuvvhiNthgA/bZZx969+5N69at+ctf/sLy5cu55557mDhxYvVzTz75ZMaOHVu9bcmSJTz88MMs\nWbKkUWKrLV8X9DeBqndkC+BTkma5DYF3gU4Z6p8HdM5ZL0+3VYuIxTnLf5d0haQO6VlbDbltvf36\n9aNfv34ZQjAzq1/r1q158MEHGTFiBOeffz7l5eWMGzeO7bbbDqj7LCp3W//+/TnttNM44IADaNmy\nJWeffTbjxo37wrWr+uTWdcUVV3D66aczcuRI+vbtyzHHHMOiRYvqLJuvntWV3X777bnpppsYOXIk\n8+fPZ/fdd+fBBx+kVaskJdxzzz1873vf41e/+hUDBgzg6KOPrn7unnvuydVXX83IkSN544032HDD\nDenTpw99+/Zd7XFzVVZWUllZudpyq51PStIVwGMRcU+6PhA4KCJGrrZyqSXwGnAQ8DYwERgaEdNy\nymwREe+my3sBd0RElzrq8nxSZuuwuuYRWldv5s1n+vTp7LLLLnz++ee+n6oODZ1PKkuS+ndE7FJr\n28sRsWvGgPoDfyZpWrw2In4naTgQEXGVpFOBEcAy4H/AqIh4oY56nKTM1mHNedLD++67jwEDBrBk\nyRJOPPFEWrVqVaPbtq1SiCT1GPAIcFO6aRgwICIOWvtws3OSMlu3NeckdcghhzBhwgRatWpFv379\nuPzyy2v0rLNVCpGkvgycB+xPco3qaeCciHivUSLOyEnKbN3WnJOUZdfQJJV37L70mtKPI2J444Vo\nZmaWTd6rehGxAvh6E8ViZmZWQ5ZR0F+UdCdwB1DdET4iHi5YVGZmZmRLUh2A5cBROdsCcJIyM7OC\nWm3HiVLhjhNm6zZ3nDBoeMeJLPNJbSnpVklz08fNkrZspHjNzIou35Tnzz77LDvuuGOmeqrmi7LG\nk+V26OtJup1vnz6eSbeZma2VLhVbIqlgjy4Va/99uk+fPkybNm31BVPr4xTvhZTlmtSWEfHXnPWx\nkk4pVEBmtv6YNftdIvvnf4Npx3cLV7k1iSxnUosk/b+qFUlHA4vylDczW+dMmTKF3Xbbjfbt2zN0\n6FCWLl0KfLEJb/LkydVTvA8ePJghQ4bUmEo+IuqdHr62mTNn0rdvXzbddFMOPvhgRo4cyXHHHVe9\nf/DgwWy11Va0b9+efv368Z///Kd630knncSpp57KgAED2Hjjjdlvv/149913GTVqFB06dGCnnXZi\n6tSp1eUbOr18vmM3pSxJ6jvA9yUtlLQAODndZmbWbNx55508+uij/Pe//2Xq1Kk1kktVE96yZcs4\n6qij+M53vsPChQsZOnQo9957b4168k0PX9uxxx5Lr169WLBgAaNHj2bcuHE1mgsHDBjAm2++yXvv\nvUePHj0YNmzYF2I+//zzWbBgAW3atKF379707NmTBQsWcPTRRzNq1Kga5bNOL5/l2E0l33xSXwKI\niLci4uCI6BARm0dE/4h4q+lCNDMrvB/96EdsscUWbLbZZhx22GG89NJLXygzYcIEVqxYwciRI2nZ\nsiUDBw5kr732qlEm3/TwuebMmcOLL77ImDFjaNWqFfvuuy+HH354jTInnngi7dq1o3Xr1pxzzjlM\nnTqVTz75pHr/wIED2X333aunh99www0ZNmwYkjjmmGO+8BqyTi+f5dhNJd+Z1BxJkyX9RdJQSe6y\nYmbNVu6AsLlTuud6++23vzC1eu3efPmmh881f/58OnTowAYbbFBnXStXruQXv/gF3bp1Y7PNNqNr\n165IqjGdfNbp4htaPsuxm0q+JLU5cCLwKnAI8KSkOZJul3RaUwRnZlZKttpqK+bVmv+q9tTzDalr\n4cKFfPbZZ3XWdfPNN/Pggw/yxBNPsGjRImbOnElENMm9ZsU8dm31JqlIvBwRYyPieJJR0C8E9gQu\nbqoAzcxKRe/evWnZsiWXX345K1as4P77768xtXpDdO7cmZ49e3LuueeybNkyJkyYwIMPPli9f/Hi\nxbRt25b27duzZMkSzjzzzAZ3b1/TpNIYx24s+a5J9ZA0UtItkiYCVwAbAd8D2jdVgGZmhZb1A7h1\n69bcc889XHPNNbRv355bbrmFww47LO9U8fnqvvnmmxk/fjwdO3bknHPOYciQIdV1HX/88XTu3Jmy\nsjK6d+/OPvvs07AXVevYDZlevjGO3VjqHRZJ0kpgMvBHkindlzVlYHXE42GRzNZhdU4fX7Els2YX\n7l6mis5bMHPWOwWrH6BXr16MGDGCE044Ya3rGjJkCDvuuCOjR49uhMhKU6NNeiipC7BP+tiTZHr3\nicAEYEJEzG+0qDNwkjJbtzWXsfuefvppdthhBzp27MhNN93ED37wA9566601mon3xRdfpEOHDnTt\n2pV//OMfHHXUUUyYMIHddtutAJGXhkab9DAiZgIzgVvSCjYi6UhxIdAVaNkYAZuZrUtee+01Bg8e\nzKeffso222zD3XffvcZTxb/zzjscddRRLFy4kPLycsaOHdusE9SayHcmtSGwN6vOpvYC3gbGA89F\nxE1NFWQaj8+kzNZhzeVMytZOYzb3fQhMImneew54PiI+btxws3OSMlu3OUkZNGJzH7B5RKxszODW\nVtYeOGWdK5g7a2ZhgzEzs4LLd02qpBIUwAWTl2Yqd2aPNgWOxMzMmkKWAWbNzMyKIst8UmZma62i\nosITAhoVFRUNKp83SUnqCxwJVI2oOA+4PyIq1yQ4M1t/zZw5s9gh2Doo37BIFwHnAFOBq9LHVOBs\nSb9vmvBKX3lFl8xTWZdXdCl2uGZm65R8Z1JHRsT2tTdKugGYAfysYFGtQ+bNnlVyHTrKK7owb/as\nzOXdG9LMSlW+JLVU0m4RMbXW9l2BzwsYk62lhiROcG9IMytd+ZLU94AbJAVQNclJZyDSfWZmZgWV\n7z6p54Hd04FmqztOpGP6mZmZFVyW+6Q2rfVoEEn9JU2XNEPSGXnKfU3SMklHNfQYZmbWPNV7JiXp\nAGAsMJ+k6zlAuaStgFMi4snVVS6pBXAZcFBazyRJ90fE9DrK/Q74xxq9CjMza5byXZO6DDg0It7I\n3ShpO+B+YKcM9e8FvB4Rs9Ln3gYcAUyvVe6HwF3A1zLGbeughvQ6dI9DM4P8SaoN8FYd2/+b7sui\njFWdLgDmkiSuapI6kXR3P0BSjX3WvJRid30zK235ktRNwARJt7Aq0WwNHJvuayx/AnKvVdU7bspj\nY39dvbxNz75s07NvI4Zh6yuf4Zk1vcrKSiorK1dbLl/vvjGSHiBpnuuebp4HjIiIyRnjmEfSbb1K\nOauub1XpCdymZFCvjsAhkpZFxAO1K/v6KedkPKxZdj7DM2t6/fr1o1+/ftXrY8aMqbNc3rH7ImIK\nMGUt4pgEdJNUQTKr7xBgaK1jbFO1LOk64MG6EpSZma1/1miqDkn3ZSkXESuAkcCjwKvAbRExTdJw\nSd+v6ylrEo+ZmTVP+bqg19d7T9Tq/JBPRDwC7FBr25X1lP1O1nrNzKz5y9fc92/gBeruyNC+MOGY\nmZmtki9JvQYcFxFv1t4haU4d5c2skXgke7NEviT1G+q/H8rTdJgVkEeyN0vk64J+a559txUmHDMz\ns1XWqHefmZlZU3CSMjOzkuUkZWZmJSvviBMAkloDJwN9SG62fRa4JiKWFTg2MzNbz602SQHXkySn\ncen6UGA/koFmzczMCiZLkuoRETvmrP9d0rRCBWRmZlYlyzWplyXtUbUiaXfgpcKFZGZmlshyJvVV\nkmnfq2bo3Q54RdIkICKi5CYqbNuyJcnMH9lUlJUxc+7cAkZkZmZrIkuSGlzwKBrZ5ytWsOjcczOX\n36wBZddGQ5KnE6eZWYYkFRGvSfoqSe8+gGcjYnphw2qeGpI8mypxmpmVstVek5I0AriPpJlve+Be\nScMLHZiZmVmW5r4RwNci4hMASb8BngPqnBPKzMyssWTp3Sfg85z1z6l7jikzM7NGleVM6hZgvKS7\n0vWjgZsKF5I1NXfoMLNSlW/6eEXiAklPsarjxGkRMaFpwrOm4A4dZlaq8p1J/QvoARAR44HxTRKR\nGb7XzcwS+ZKUrztZ0ZTqvW5m1rTyJakvSzqtvp0RcWkB4jEzM6uWr3dfS6Aj8OV6Hma2nimv6IKk\nTI/yii7FDteagXxnUm9HxDlNFomZlbx5s2dxweSlmcqe2aNNgaOx9UG+MylfkzIzs6LKl6QObrIo\nzMzM6lBvkoqI95syELN1QVXX+CyPLuXlxQ7XbJ2X72beVhGxvCmDMSt1pXrjs0cNseYqX8eJiUAP\nSddHxIlNFI+ZrYFSTZ5maytfkmojaTCwn6TDa++MiAcKF5aZWXblFV2YN3tWprJlnSuYO2tmYQOy\nRpMvSZ0KfBvYDBhUa18AmZKUpP7An0iuf10bERfW2n848BtgJbAMGBURz2WKvpG0bUP2ppLOWzBz\n1jsFjsjMGsJd45uvepNURDwFPCXpxYhYo7mjJLUALgMOAuYDkyTdX2tm38eqzsok7QLcAey4Jsdb\nU58vhZiWrax2fLewwZg1Ex5/0RpDlqk6/k/SD4D90/WngKszdqrYC3g9ImYBSLoNOAKoTlIR8WlO\n+Y1IzqjMbB3n8RetMWRJUpcBXwL+L13/NrAH8P0Mzy0D5uSszyVJXDVIOhK4gGS4pUMz1GtmZuuB\nLEmqV0TslrP+qKSpjRlERNwH3CepD3Ae8I26yj029tfVy9v07Ms2Pfs2Zhhmth5wd/3SUFlZSWVl\n5WrLZUlSKyV1iYiZAJK6kL1Jbh7QOWe9PN1Wp4h4VtI2kjpExMLa+79+iocSNLO14+76paFfv370\n69even3MmDF1lss3LFKVM4BnJD0m6XGSa1I/yxjHJKCbpApJbYAh1OoVKGnbnOUeQJu6EpSZ2bqk\nISPGe9T4+q32TCoiHpW0Pat63E2LiP9lqTwiVkgaCTzKqi7o0yQNT3bHVcDRko4HlgL/AwavyQsx\nMyslDekWD+4aX58szX2kSWnymhwgIh4Bdqi17cqc5YuAi9ak7uasIfduge/fMrPmKVOSsqbXkHu3\nwPdvmVnzlOWalJmZWVGsNklJul3SN9WQtidrtqqaITNNVVGx5Xofl5mtnSzNfdcB3wEuk3Q7cH1E\nvFHYsKxUleoQUqUal5mtndWeSUXEIxFxDMlIEe8AT0p6WtJxknxNy8zMCibTNSlJ7YFjgeOAl4Er\ngX2ARwoXmpmZre9WeyYk6U5gF+Bm4OiIqBoj5GZJUwoZnJmZrd+yNNddRTKdRlRtqJpaPiL2KFxo\nZma2vsvS3HdhboJKTSxEMGZmZrnqPZOS9BVgK2DDdDLCqi7omwDtmiA2MzNbz+Vr7juUpOt5OXBF\nzvZPgLMLGZSZmRnknz7+OuA6SYMj4o4mjMmsWfD4i2ZrL19z39CIuBXYStJptfdHxKUFjcxsHefx\nF60hPBlj3fI197VPf3ZsikDMzNZnTTUZY3lFF+bNnpWpbFnnCubOmrnGx2oM+Zr7rkh/+vqTWTPS\nkGZIN0E2Pw2Z56oU5rjK19x3Sb4nRsTpjR+OmRWaxzm0dUm+5r5XmywKM1vv+QzP6pKvue/apgzE\nzNZvPsOzuuRr7vtDRPxE0r1A7REniIijChqZmZmt9/I1992e/rysKQIxMzOrLV9z38T05+OSWgPb\nkZxRvR4Ry5soPjOzovF1suLLMlVHf5KR0GeTjN9XLunkiHi00MGZmRWTr5MVX5apOv4EfD0iZgBI\n2h64H9ixkIGZmZllmapjcVWCAkiXlxQuJDMzy6eqGTLLo0vFlsUOd63k6913eLo4UdIDwB0k16QG\nAS80QWxmZlaH9akZMl9z36Cc5Y+Ab6bLnwAbFywiMzOzVL7efcc1ZSBmZma1Zend1xY4EdgZ2KBq\ne0R8v3BhmZmZZes4cSPQBfgWybWobYHPChiTmZkZkC1JbR8RZ5L08rsW6A/sVdiwzMzMsiWpZenP\nRZJ2JOk08ZWsB5DUX9J0STMknVHH/mMlTU0fz0raJWvdZmbWvGW5mfdaSe2B0cA/gHbAOVkql9SC\nZOy/g4D5wCRJ90fE9JxibwH7R8RH6egWVwO9GvAazMysmVptkoqIK9PFJ4HODax/L5Kx/mYBSLoN\nOAKoTlIR8XxO+eeBsgYew8zMmqnVNvdJai/pj5ImSnpB0sXpmVUWZcCcnPW55E9C3wP+nrFuMzNr\n5rJck7oN+BgYBnyb5Gbe2/M+Yw1IOgA4CfjCdSszM1s/ZbkmVRYRo3PWx0h6JWP986jZRFiebqtB\n0q4kI633j4gP66vssbG/rl7epmdftunZN2MYZmbWUG1btsw+VUlZGTPnzs1cd2VlJZWVlastlyVJ\nPS7p/0XEXQCSjgL+mTGOSUA3SRXA28AQYGhuAUmdgbuB4yLizXyVff2UTP01zMysEXy+YgWLzj03\nU9ktzj83c0KDL86/NWbMmDrL5Rtg9kOSAWUF/FDS8pznLAJGrS6IiFghaSTwKEnT4rURMU3S8GR3\nXAWcDXQArlDyCpdFhO/DMjNbhzRk0FvIPvBtvjOpjtkPV7+IeATYoda2K3OWTwZOboxjmZlZ85Jv\ngNkVVcuSBgD7p6uVaeIxMzMrqCxd0H8L/Jzkptu3gJ9LOq/QgZmZmWXpOHEYsEfVmZWk/wMmA78q\nZGBmZmZZ7pMC2CRn2RMemplZk8hyJnURMFnS4yQ9/fqR9MgzMzMrqLxJKu0S/jjJuH17p5vPiYgv\n3JBrZmbW2PImqYgISf+MiO7APU0Uk5mZGZDtmtRLkvYoeCRmZma1ZLkmtQfJPFBvAktIrktFRPQo\naGRmZraCEh0KAAARKElEQVTey5KkDi94FGZmZnXIN3ZfW5LhiroB/wauzx2FwszMrNDyXZO6HugD\nvA4cCVzcFAGZmZlVydfc1z0idgGQdBXwQtOEZGZmlsh3JrWsaiEiluUpZ2ZmVhD5zqR2k7QwXRaw\ncbpe1buvQ8GjMzOz9Vq+JNWmyaIwMzOrQ6b5pMzMzIoh6yjoZmZmTc5JyszMSpaTlJmZlax8I058\nCERdu3DvPjMzawL5evd1bLIozMzM6pC5d5+kDsAGOZvmFyooMzMzyHBNStKhkmYAc0mGRpoLPFHo\nwMzMzLJ0nPgtsC/wWkRsDXwTeKagUZmZmZEtSS2PiPeBFpIUEf8E9ipwXGZmZpkmPfxI0kbAs8CN\nkt4D/lfYsMzMzLKdSR1JkpR+DFQC84BvFTAmMzMzIFuSOjMiVkTEsoi4NiIuAU4vdGBmZmZZklT/\nOrYd2tiBmJmZ1VZvkpI0XNIUYAdJk3MerwPTsh5AUn9J0yXNkHRGHft3kDRe0meSfIZmZmbV8nWc\nuAN4HLgA+EXO9k8i4r0slUtqAVwGHERy8+8kSfdHxPScYguAH5Jc+zIzM6tW75lURHwYEW9ExCCS\nkSa+kT6+3ID69wJej4hZ6RT0twFH1DrOBxHxL2B5g6M3M7NmLcuIE6cCdwKd08cdkn6Qsf4yYE7O\n+tx0m5mZ2WpluU9qOLBXRCwGkHQ+MB64opCBmZmZZUlSApbmrC9Lt2Uxj+Tsq0p5um2NPDb219XL\n2/TsyzY9+65pVWZmVmTnnnvuasvkm0+qVUQsB8YBL0i6O901ELghYwyTgG6SKoC3gSHA0Dzl8ya/\nr59yTsbDmplZqctNUmPGjKmzTL4zqYlAj4i4SFIl0CfdfkpETMoSQESskDQSeJTk+te1ETFN0vBk\nd1wlaQvgRWBjYKWkHwE7VTUvmpnZ+itfkqo+q4mIiSRJq8Ei4hFgh1rbrsxZfhfYek3qNjOz5i1f\nkvpyvptr0+GRzMzMCiZfkmoJbET2ThJmZmaNKl+Sejsifp1nv5mZWUHlu5nXZ1BmZlZU+ZLUQU0W\nhZmZWR3yjd23sCkDMTMzqy3LfFJmZmZF4SRlZmYly0nKzMxKlpOUmZmVLCcpMzMrWU5SZmZWspyk\nzMysZDlJmZlZyXKSMjOzkuUkZWZmJctJyszMSpaTlJmZlSwnKTMzK1lOUmZmVrKcpMzMrGQ5SZmZ\nWclykjIzs5LlJGVmZiXLScrMzEqWk5SZmZUsJykzMytZTlJmZlaynKTMzKxkOUmZmVnJKniSktRf\n0nRJMySdUU+ZSyW9LuklSbsXOiYzM1s3FDRJSWoBXAZ8E9gZGCrpq7XKHAJsGxHbAcOBsYWMyczM\n1h2FPpPaC3g9ImZFxDLgNuCIWmWOAG4EiIgXgE0lbVHguMzMbB1Q6CRVBszJWZ+bbstXZl4dZczM\nbD3kjhNmZlayFBGFq1zqBZwbEf3T9V8AEREX5pQZCzwZEben69OBvhHxbq26CheomZkVXUSo9rZW\nBT7mJKCbpArgbWAIMLRWmQeAU4Hb06S2qHaCgrqDNzOz5q2gSSoiVkgaCTxK0rR4bURMkzQ82R1X\nRcTDkgZIegNYApxUyJjMzGzdUdDmPjMzs7XR7DpOZLl5uKlJulbSu5JeLnYsVSSVS3pC0quS/i3p\ntGLHBCCpraQXJE1J4xpd7JiqSGohabKkB4odSxVJMyVNTd+vicWOp4qkTSXdKWla+je2dwnEtH36\nPk1Of35UCn/3kkZJekXSy5JultSm2DEBSPpR+j9Y1M+HZnUmld48PAM4CJhPck1sSERML3JcfYDF\nwI0RsWsxY6kiaUtgy4h4SdJGwL+AI4r9XgFIahcRn0pqCTwHnBYRRf8AljQK2BPYJCIOL3Y8AJLe\nAvaMiA+LHUsuSdcDT0XEdZJaAe0i4uMih1Ut/ayYC+wdEXNWV76AcXQCngW+GhFLJd0OPBQRNxYr\npjSunYFbga8By4G/A6dExFtNHUtzO5PKcvNwk4uIZ4GS+hCJiHci4qV0eTEwjRK5Py0iPk0X25Jc\nNy36NylJ5cAA4Jpix1KLKLH/Y0mbAPtFxHUAEbG8lBJU6uvAm8VMUDlaAl+qSuYkX7CLbUfghYj4\nPCJWAE8DRxUjkJL6424EWW4etlokdQF2B14obiSJtFltCvAO8M+ImFTsmIA/Aj+jBBJmLQH8U9Ik\nSScXO5hUV+ADSdelTWtXSdqw2EHVcgzJmUJRRcR84A/AbJKBDBZFxGPFjQqAV4D9JLWX1I7kC9rW\nxQikuSUpa6C0qe8u4EfpGVXRRcTKiNgDKAf2lrRTMeORdCjwbnrmqfRRKvaNiB4kHyKnpk3LxdYK\n6AFcnsb2KfCL4oa0iqTWwOHAnSUQy2YkrT0VQCdgI0nHFjcqSJv9LwT+CTwMTAFWFCOW5pak5gGd\nc9bL021Wh7R54S5gXETcX+x4akubiJ4E+hc5lH2Bw9PrP7cCB0gq6jWDKhHxdvrzfeBekibvYpsL\nzImIF9P1u0iSVqk4BPhX+p4V29eBtyJiYdqsdg+wT5FjAiAirouInhHRD1hEcr2/yTW3JFV983Da\nQ2YIyc3CpaDUvoED/B/wn4j4c7EDqSKpo6RN0+UNgW8ARe3MERG/jIjOEbENyd/UExFxfDFjgqSD\nSXomjKQvAQeTNNMUVXoz/hxJ26ebDgL+U8SQahtKCTT1pWYDvSRtIEkk79W0IscEgKQvpz87AwOB\nW4oRR6FHnGhS9d08XOSwkHQL0A/YXNJsYHTVReUixrQvMAz4d3r9J4BfRsQjxYwL2Aq4Ie191QK4\nPSIeLnJMpWoL4N50yLBWwM0R8WiRY6pyGnBz2rT2FiVyk356feXrwPeLHQtAREyUdBdJc9qy9OdV\nxY2q2t2SOpDE9YNidX5pVl3QzcyseWluzX1mZtaMOEmZmVnJcpIyM7OS5SRlZmYly0nKzMxKlpOU\nmZmVLCep9YikDjlTFbwtaW7OeoPumUunH9luLeNpJ+nJtakjp65RDZ3iQNJBku6tY/t3Jf2xMeJq\nYDyrfU8ljZP0hRHYJXWVdMwaHPOSdCqG82ttP0BSo4xeocQjkj6UdE+tfdsomZplhqSb0pHvq/Zd\nIel1SS9JarLZAyRtm947WN/+tpKeSm++tQJzklqPpEOv7JGOp/ZX4JKq9YhY3sC6vhsRr69lSN8D\n7ljLOkg/2E4HNliDp9d3o2CT30C4lu/ptiSjYWSWfsieFBG7RMQva+0+EOi9hrHUEMnNmBcCJ9Sx\n+/fA7yJie+B/wIlpbIcBZRGxHXAqcEVjxFKX3MSYo97ff0R8DlQCgwoVk63iJLX+qvEtUNLP02/U\nL6ejdlR9o3xF0q2S/iPpNklt033PVH27lXSopH+lZ2WPpNsOTL8BT5b0Yj2jYA8D7k/Ld0rrnJzG\n0Cvd/u10/WVJv023tUy/lf9R0ksko5N/BXhG0qNpmUMkjU+PfWvV8dNYp0t6kfzTuHSRVCnpNUm/\nTJ/7W0mn5rxnv5M0otb7+AtJp6TLf5H0j3T5G0rmWKqamLOu2HLf0+HpsSdIulrSJTmHOVDSc5Le\nkFT1Gi4A+qXv38haMUnSH9Lf71RJVVMu/I1kQNPJOduQtA3JF4ifpvt6SeqiZJLMlyT9Q8k8SFVn\ndlekr2W6pDrHWYyIJ4ElteJqAewP3JduugE4Ml0+Argxfe5zwBaSNq/1/CGSLkyXfyLptXR5O0mV\n6fLB6d/lVElXKm0xkDRH0gWS/gUcKalnWmYycErOMXaRNDF9H15SMmMAJH+3w+p6rdbIIsKP9fAB\njAZOT5f3IhmOpQ2wEck4azuTfDtfCXwtLXcDyQSEAM8Au5IMzTMLKE+3b5b+fDjnee1IRzfJOX5b\nYG7O+s+Bn6XLSp9TBvwXaE8y504lyWjfLdO4jsh5/mxg43T5y2nZDdL1X5KMwr0hyVQuXdLtdwH3\n1PHefDctt0kax6vpa90WmJiWaQG8CWxa67n7kgxPBMlkds+nr+fXJEMD1Rlbrfe0nGQooU1Ihjt6\njuSsF2BcTv27ANPS5YPqei3pvsEkE+mR/r5mAx3T93FhPc/5TdXvOuf3OSRdPhm4MyeeB9Ll7dO6\nW9dTZ40Y01j+k7PeBZicLv8d2CtnXyWwa636yoDn0uV7Saaa+TLwHWBMHb/vm0iG9yHd/uOcul4B\neqXLl+TEcQUwKF1uDbRJl1uSjIxf9P/l5v7wmZQB9AHujoilkUzXcR+wX7rvrVg1n9NNadlcvUkG\nXJ0LEBGL0u3PAZem3+o3jfQ/O8dXgIU565OA70k6G9glkokP9wYej4gPIxkh+haSb94An0fNkdtz\nB/DdB9gJGK/k2sKxJB+AOwGvRcTMtNzNed6Tf0TEx2kc9wF9IuJN4GMls5YeQjIp3Ee1njcJ+JqS\nQXIXp+t7kryfz9QTW0WtOqpe98eRNMPeVWv/fQAR8W+S6R1Wpw/pgKqRDP76DNAz3Zf1usrewO3p\n8o3U/Du4I617BkmSWqtrlVlFxDygg5Lx+LZM4+jLqvd6R2r+vm9k1d8PpK8nPUPbICKeT7ePyykz\nHjhb0s+AzhGxND32CmBlVcuCFY6TlDVUXW31X/igi4jfknzj3gh4XtK2tYr8j5xrSJE0B/UD3iYZ\nYHZofXXnPL8+Av4eybW2PSKie0SckrMvi9qvs2r9WpIzopNIRpGvWSj5EJsPHE9yJvUMyRlE54h4\no57YRtSuZzVxfp6xXH1yn5P12lu+crn71IA63wc6StUdEHKn1plHzUn26pt253mSM99XSd7r/UgS\n6viceOqzJM8+ACLiJpImyM+BR1Rzvq42kVyfsgJykjJI/rkHKum1tBHJ9YBn0n1dJe2ZLh+bs73K\neJJrIZ0BJLVPf24TEa9ExO+AycAOuU+KiA+ADXOuEXQmaT65Brge2IOk+aafktlBW5F0DKhMq6j9\n4fMxSfNYVUx9JXVN624nqRtJM2bVVC4imbKhPgdL2iT9ln4EyZkhJPP9HAbsFvXPoPoM8FOSKbef\nJbnwXzW3Un2x5ZqYvu5NlIwinm/a7qr34RNg4zzxDEmvTW1BcjZXFU99H+KfsOr9hCQZDE6Xj0tf\nW5VB6WvZniSZ1Nf5o8Z0NRGxMo2t6vWdQHqNkmSKnePTevsA70TEgjrqfJbkvX6K5O/sm8An6Rnw\nNJLfd5e07LdZ9fdTLa33f5L2TjdVX2uS1DUi3oqIS0mu4VVdM/wKnquuSThJGWlz3q0kH1zjSWZU\nfTXdPQ04XdJ/SNr4r656Wvrc94ARwP1p89VN6f6fphfqXyL5wKtrConHWDXB20FA1YXrgcBf0uac\ns1n1ATQ+Vk0lUvvb+tXAY5IeTWP6HnB7evzngO0i4n9prI+QJIL5ed6WSSQflFNIrgG9nL7ez0k+\noPPNR/QMSXPm85FMD740fU7V+/Xd2rHlvqaImEPS621S+rw3gY9yy+SoWp8CtEo7CYysVeYukjm5\nXib5PYxKvyTUVV+V+4HBSjrE9CJJtMPTmAcBo3LKzlPSEeV+4OSoo6eopPEkzasHS5ot6YB018+B\nMyTNAL5E8gUF4EFgvqQ3gMvT49flGZLE+HR63Lmseq//R/Je3ytpKvAZcE09r/s7wFXp31/uDLTH\nKuk8NIXk91T1930A8FA9MVkj8lQdVq+0ie6uSKZyL0T9PYEREfHdQtRfCGmPtCkknTZmFvA4X4qI\nJekZ5P3AFRFRch+KksaRdKIolclFm4Sk+0iS/X+LHUtz5zMpW52CfYuJZHrxZwtVf2OT1B14A3i4\nkAkq9Zv0W/1UYHopJqjUevctV8lN43c6QTUNn0mZmVnJ8pmUmZmVLCcpMzMrWU5SZmZWspykzMys\nZDlJmZlZyXKSMjOzkvX/AbawIRe6AqQgAAAAAElFTkSuQmCC\n", + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAakAAAEbCAYAAABgLnslAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XmcFNW5//HPw66RZRAFYWQgIorghqigGEdNVHBBNCKI\nG/ozuHBNSHKvGhfwRmM0anKNWzReRVzirqBEURE1bqMBcUVwYdgEBEQFr7I9vz/q9NDTzPTUDFPd\nDfN9v179mq6qc6qe7pnpp+vUqXPM3RERESlEjfIdgIiISHWUpEREpGApSYmISMFSkhIRkYKlJCUi\nIgVLSUpERApW4knKzI40s5lmNsvMLqymzI1mNtvM3jGzvWqqa2ZFZjbZzD42s2fNrHVY/1Mze9vM\nZpjZW2Z2SFqd3mb2btjXX5J8zSIiUj8STVJm1gi4CTgC6AkMM7NdM8oMAHZy952BkcBtMepeBDzv\n7rsAU4CLw/ovgaPdfU/gDGB82qFuBc5y9+5AdzM7op5froiI1LOkz6T2A2a7e7m7rwH+AQzKKDMI\nuAfA3d8EWptZ+xrqDgLGhefjgONC/Rnuvig8/wBoYWZNzawD0NLd3wp17knVERGRwpV0kuoEzEtb\nnh/WxSmTrW57d18MEJLS9pkHNrOfA9NCgusU6meLQ0RECkyTfAdQBatDnUpjO5lZT+Bq4Gf1EpGI\niORF0klqAdA5bbk4rMsss2MVZZplqbvIzNq7++LQlLckVcjMioHHgFPdfU4Nx9iImWkwQxGRHHP3\nKk9Qkm7uewvoZmYlZtYMGApMyCgzATgNwMz6AitCU162uhOIOkYAnA48Geq3AZ4CLnT3N1IHCE2C\nX5vZfmZm4XhPVhe0u9f4GDNmTKxyuXgoFsWyOcejWBRLNomeSbn7OjMbBUwmSoh3uvtHZjYy2uy3\nu/skMxtoZp8Aq4AR2eqGXV8DPGRmZwLlwJCw/nxgJ+ByMxtD1Ax4uLsvDdvuBloAk9z9mSRfu4iI\nbLrEr0mFZLBLxrq/ZSyPils3rF8O/LSK9VcBV1Wzr38Du8cOXERE8k4jTtRRaWlpvkOooFiqpliq\nV0jxKJaqKZaI1dQe2NCYmes9ERHJHTPDq+k4UYhd0EVkC9SlSxfKy8vzHYbkUUlJCXPmzKlVHZ1J\nZdCZlEgywrflfIcheVTd30C2MyldkxIRkYKlJCUiIgVLSUpERAqWkpSINHhdu3ZlypQp+Q5DqqAk\nJSJ5U1zSBTNL7FFc0iXfL1E2kbqgi0jeLJhbztXTVie2/4t7N0ts35IbOpMSEUmzevVqfvWrX9Gp\nUyeKi4sZPXo0a9asAaKRFx5//HEAXn31VRo1asQ///lPAKZMmcLee+9d5T6///57Tj/9dNq2bUvP\nnj3505/+xI47bpiY4ZprrqFbt260atWKXr168cQTT1RsGzduHP379+fXv/41RUVFdOvWjddff51x\n48bRuXNnOnTowD333FNRfsSIEZx//vkMHDiQli1bctBBB7F48WJGjx5N27Zt2W233ZgxY0asYxcC\nJSkRkTRXXnklZWVlvPvuu8yYMYOysjKuvPJKAA4++GCmTp0KwMsvv8xOO+3Eyy+/DMBLL71U7fBB\nY8eOZe7cucyZM4fnnnuOe++9l2hChki3bt149dVX+eabbxgzZgynnHIKixcvrtheVlbGXnvtxfLl\nyxk2bBhDhw7l7bff5tNPP2X8+PGMGjWK7777rqL8ww8/zB/+8AeWLVtGs2bN6NevH3369GHZsmWc\ncMIJjB49Ovax801JSkQkzf3338+YMWPYdttt2XbbbRkzZgzjx48HoiT10ksvAVGSuvjiiyuWX3rp\nJQ4++OAq9/nwww9zySWX0KpVKzp27MgFF1xQafsJJ5xA+/btATjxxBPZeeedKSsrq9jetWtXTjvt\nNMyMk046ifnz5zNmzBiaNm3Kz372M5o1a8Ynn3xSUX7w4MHstddeNGvWjMGDB7PVVlsxfPjwivrv\nvPNO7GPnm5KUiEiahQsX0rnzhvlWS0pKWLhwIQD9+vVj1qxZLFmyhBkzZnDaaacxb948li1bRllZ\nGT/5yU+q3WdxcXHFcnpTH8A999zD3nvvTVFREUVFRXzwwQcsXbq0YnsqiQBstdVWALRr167SupUr\nV1ZbPnM5vWxNx843Jakc6FLSIVZPpC4lHfIdqkiD17Fjx0pjDJaXl9OxY0cg+oDfZ599+J//+R96\n9epFkyZN6NevHzfccAPdunWjbdu21e5z/vz5Fctz586t9PwXv/gFt9xyC1999RVfffUVPXv2zMkQ\nUvk8dlxKUjlQPncx/hE1PsrnFk47sEhDNWzYMK688kqWLl3K0qVL+f3vf8+pp55asf0nP/kJN910\nU0XTXmlpaaXlqpx44olcffXVrFixggULFnDzzTdXbFu1ahWNGjWiXbt2rF+/nrvuuov3338/a4yb\nmkRS9ety7FyrVZKyyI+SCkZEJB/SOzFceuml9OnThz322IM999yTPn36cMkll1RsP/jgg1m5cmVF\n015qOVuSuvzyy+nUqRNdu3bl8MMP58QTT6R58+YA9OjRg9/85jf07duXDh068MEHH9C/f//Y8Va1\nHPf11uXYuVbjKOhmdg8wClgLlAHbAn9y9xuSDy/3khgF3cyomPg+W7kem/4NSaRQVTUCdnFJFxbM\nTW76jk6dS5hfPiex/dfVbbfdxoMPPsiLL76Y71ByKqlR0Pdw92+A44DngBLgjE2IU0QEgPnlc3D3\nxB6FkqAWLVrEa6+9hrvz8ccfc/3113P88cfnO6zNQpwRJ5qaWRNgEHCru682s/UJxyUissVYvXo1\nI0eOZM6cObRp04Zhw4Zx7rnn5juszUKcJPV3YC7wPvCSmXUGVmavIiIiKZ07d+a9997LdxibpVrP\nzGvRFbem7p7cgFt5pGtSIsnQzLxSl2tS1Z5JmdkF1W0LbqxdeCIiIrWTrblvu/BzZ2A/YGJYPhp4\nEyUpERFJWJwu6C8DR4cefphZK2Ciu1d/U8BmTM19IslQc58k1QW9PfB92vIPgMbvERGRxMVJUvcB\nb5rZpWZ2KfAaMD7ZsERECte5557LVVddVe9lsykvL6dRo0asX5/7O4BGjBjB5ZdfHqts165dmTJl\nSr0du8Yu6O7+32b2TyA1vO857v5WvUUgIg1Wl+JiyhcsSGz/JZ06MSdtYNf6cuuttyZStia1Hf5o\nS5A1SZlZY+Bdd+8JKDGJSL0qX7CAFWPHJrb/Ngnse/369TRqpLG5cyXrO+3u64DPzKxTjuIREcm5\nmTNncsghh1BUVMTuu+/OxIkTK7aNGDGC8847j6OOOoqWLVsyderUjZq/rr32Wjp27EhxcTF33nkn\njRo14rPPPquonyr70ksvseOOO3LDDTfQvn17OnXqxN13312xn0mTJtG7d29at25NSUkJV1xxRezX\n0LVrV6677jr23HNPWrZsydlnn82SJUsYOHAgrVq14vDDD+frr7+uKD9hwgR69epF27ZtOfTQQ5k5\nc2bFtunTp7PPPvvQunVrhg4dyvfff1/pWE899VTFHFT9+/dP9EblOF8HtgE+MrNnzeyx1COxiERE\ncmjt2rUcc8wxHHnkkXz55ZfceOONDB8+nNmzZ1eUeeCBB7jsssv49ttvOfDAAyvVf+aZZ/jLX/7C\nlClT+OSTT5g6dWrWZrlFixbx7bffsnDhQv7+979z/vnnVySPbbbZhvHjx/P111/z9NNPc9tttzFh\nwoTYr+Wxxx7jhRdeYNasWUyYMIGBAwfyxz/+kaVLl7Ju3TpuvDG6c2jWrFmcfPLJ3HjjjXz55ZcM\nGDCAY445hrVr17JmzRoGDx7M6aefzvLlyznxxBN59NFHK44xffp0zjrrLO644w6WL1/OyJEjOfbY\nY1mzZk3sOGsjTpK6EhgMXAvcnPYQEdnsvfHGG6xatYoLL7yQJk2acMghh3D00UfzwAMPVJQZNGgQ\nffv2BaiYYiPl4YcfZsSIEey66660aNGCsTU0MTZr1ozLLruMxo0bM2DAALbZZhs+/vhjIJqrqmfP\nngD06tWLoUOHVkxPH8d//Md/0K5dO3bYYQcOOugg9t9/f/bYY4+KaeSnT58OwEMPPcTRRx/NoYce\nSuPGjfntb3/L999/z2uvvcYbb7zB2rVrueCCC2jcuDEnnHAC++67b8Ux7rjjDs455xz69OmDmXHq\nqafSvHlz3njjjdhx1kaNScrdXwBmAE3DY0ZYJyKy2Vu4cOFG07mXlJSwIK1DR+b2bPV33HHHrPeD\nbbvttpWuaW299dYV07m/+eabHHrooWy//fa0adOGv/3tb7Wayj3utPELFy6kpKSkYpuZUVxczIIF\nC1i4cCGdOlW+wpNetry8nOuvv562bdvStm1bioqKmD9/PgsXLowdZ23UmKTM7ARgGnAqcBrwtpkN\nTiQaEZEc69ixI/Pmzau0bu7cuZU+qLM13+2www4bTQ1f1154w4cP57jjjmPBggWsWLGCkSNHJnID\ndMeOHSkvrzyP17x58+jUqdNGrwcqT3e/4447cskll7B8+XKWL1/OV199xcqVKznppJPqPU6I19x3\nObCvuw9395OB/YGxiUQjIpJj+++/P1tvvTXXXnsta9euZerUqTz11FMMGzYsVv0hQ4Zw1113MXPm\nTL777juuvPLKOseycuVKioqKaNq0KWVlZdx///2VttdXwhoyZAhPP/00L774ImvXruW6666jRYsW\nHHDAAfTr14+mTZvy17/+lbVr1/LYY49RVlZWUffss8/mtttuq1i3atUqJk2axKpVq+oltkxxklQj\nd1+ctrwkZj0RkYLXtGlTJk6cyKRJk2jXrh2jRo1i/Pjx7LzzzkDVZ1Hp64488kguuOACDjnkELp3\n706/fv2Aja9dVSd9X7fccguXXXYZrVu35sorr9zo7CTbGVptppTv3r079957L6NGjWK77bbj6aef\nZuLEiTRp0oSmTZvy2GOPcdddd7Htttvy8MMPc8IJJ1TU3WeffbjjjjsYNWoUbdu2pXv37owbNy7W\ncesizth91wO7AqmriEOBme7+23qNpEBo7D6RZFQ1btvmejNvNjNnzmT33Xfnhx9+0P1UGeoydl+c\nJGXAECDV7/IV4JF6/yQvEEpSIsnYkgeYfeKJJxg4cCCrVq3ijDPOoEmTJpW6bUukXpOUmY0iGqfv\nHXdvMNPFK0mJJGNLTlIDBgzg9ddfp0mTJpSWlnLzzTdX6lknkfpOUn8BDgC6AdOBV4mS1uvu/nWV\nlbYASlIiydiSk5TEk1RzXwuiSQ8PAPoR9e5b4u57bHLEBUhJSiQZSlJSr9PHZ5RpBjQPj4XAB5sQ\np4iISCzVJikzuwXYA/iOaAT0N4Bb3P3LHMUmIiINXLb+kd2BFsBc4FPgEyUoERHJpazXpMysEdHZ\n1AHh0YPoZt7X3P33OYkwx3RNSiQZuiYldbkmVdN8Uuvd/R3gUeBxYCrRGdZ/bXK0IiIFItuU5//6\n17/o0aNHrP2k5ouS+lNtkjKz88zsXjP7HCgDfg7MIbqxt01uwhORLVmXkg6YWWKPLiUdNjnG/v37\n89FHMZpCgoY4xXuSsvXu2xWYCFzs7vOylBMRqZPyuYtjNYXXlfVYXHMhKWjVnkm5+wXu/qASlIg0\nBNOnT2fPPfekqKiIYcOGsXr1amDjJrxp06ZVTPE+ZMgQhg4dWmkqeXevdnr4THPmzOHggw+mdevW\nHH744YwaNYpTTz21YvuQIUPYYYcdKCoqorS0lA8//LBi24gRIzj//PMZOHAgLVu25KCDDmLx4sWM\nHj2atm3bsttuuzFjxoyK8rWdXj7bsXNJox+KiBDNsDt58mQ+//xzZsyYUSm5pJrw1qxZw/HHH8+Z\nZ57J8uXLGTZsGI8//nil/WSbHj7TySefTN++fVm2bBljxoxh/PjxlZoLBw4cyKeffsqSJUvo3bs3\nw4cP3yjmP/zhDyxbtoxmzZrRr18/+vTpw7JlyzjhhBMYPXp0pfJxp5ePc+xcSTxJmdmRZjbTzGaZ\n2YXVlLnRzGab2TtmtldNdc2syMwmm9nHZvasmbUO69ua2RQz+9bMbsw4xothX9PNbJqZtUvqNYvI\n5ueXv/wl7du3p02bNhxzzDG88847G5V5/fXXWbduHaNGjaJx48YMHjyY/fbbr1KZbNPDp5s3bx5v\nv/02V1xxBU2aNOHAAw/k2GOPrVTmjDPOYOutt6Zp06ZcfvnlzJgxg2+//bZi++DBg9lrr70qpoff\naqutGD58OGbGSSedtNFriDu9fJxj50qsJGVmrcysVW13Hrqw3wQcAfQEhpnZrhllBgA7ufvOwEjg\nthh1LwKed/ddgCnAxWH998ClwG+qCWmYu+/t7r3dPf6czCKyxUsfEDZ9Svd0X3zxxUZTq2f25ss2\nPXy6hQsX0rZtW1q0aFHlvtavX89FF11Et27daNOmDV27dsXMKk0nH3e6+NqWj3PsXMnWu6849O77\nEpgBvGtmS8K6zjH3vx8w293L3X0N8A9gUEaZQcA9AO7+JtDazNrXUHcQkJplaxxwXKj/nbu/BvxQ\n29crIlKTHXbYgQUZ819lTj1fm30tX76c77//vsp93XfffUycOJEpU6awYsUK5syZg7vn5F6zfB47\nU7YP7QeBfwId3b2ru3cBOgHPECWMODoB6b/B+WFdnDLZ6rZPzRbs7ouA7WPGc3do6rs0ZnkRkQr9\n+vWjcePG3Hzzzaxbt44nn3yy0tTqtdG5c2f69OnD2LFjWbNmDa+//joTJ06s2L5y5UqaN29OUVER\nq1at4uKLL6519/a6JpX6OHZ9yZaktnf3+8JZDADuvsbd7wW2SzCmurwTcX4TJ7v77sBBwEFmdkod\njiMiW6C4H8CpqdX//ve/U1RUxP33388xxxyTdar4bPu+7777eO2112jXrh2XX345Q4cOrdjXaaed\nRufOnenUqRO9evXigAMOqN2Lyjh2baaXr49j15ds80k9DHxB1JyWOqPZETgD2MHdf17jzs36AmPd\n/ciwfBHg7n5NWpnbgBfd/cGwPBM4GOhaXV0z+wgodffFZtYh1O+Rts/TgX3c/YJq4qp2u5n5mDFj\nKpZLS0spLS2t6aXW9D5oWCRp8KoaEqdLSQfK5yZ3L1NJ5/bMKV+U2P4B+vbty7nnnsvpp5++yfsa\nOnQoPXr0IP0zaEuS+huYOnUqU6dOrVh/xRVX1GnSw+bAL4iu/6Sa2RYAE4Db3f37KitW3kdj4GPg\nMKKEV0bUeeGjtDIDgfPd/aiQ1P7i7n2z1TWza4DlIWFdCBS5+0Vp+zwd6OPu/5EWRxt3X2ZmTYH7\ngefc/fYqYtbYfSIJ2FLG7nv55ZfZZZddaNeuHffeey/nnXcen332WZ1m4n377bdp27YtXbt25dln\nn+X444/n9ddfZ88990wg8vyr1/mk3P0H4K/hUSfuvs6iaegnEzUt3hmSzMhos9/u7pPMbKCZfQKs\nAkZkqxt2fQ3wkJmdCZQTDdWUerGfAy2BZmY2CDicaCT3Z82sCdAYeB64o66vS0Qaro8//pghQ4bw\n3Xff8eMf/5hHH320zlPFL1q0iOOPP57ly5dTXFzMbbfdtsUmqLqqaRT0w4h6zqWfST3p7s/nILa8\n0JmUSDK2lDMpqbt6PZMys+uBXsB4op51AMXAf5rZQHf/9aaHLCIiUr1s16RmuXv3KtYbMCvcfLvF\n0ZmUSDJ0JiV1OZPK1gX9BzPrXcX63lR/s6yIiEi9yTZVx5nA7aGXX6oLemeioYfOTDowERGRbL37\n3gL6mFkxaR0n3H1+dXVERKpTUlKiCQEbuJKSklrXyXYmldI+PADWsqEThYhIbHPmzMl3CLIZyta7\n7zDgVqL7kFIjKhaHwWXPdfcXchCfiIg0YNnOpP4KHOnun6WvNLOdgKeAHlXWEhERqSfZevc1JTqL\nyjQ3bBMREUlUtjOpccCbZvYAlQeYHQbcnXBcIiIiNQ6LtDtVDDDr7u/mILa80M28IiK5VadhkQDc\n/T3gvUSiEhERqUGdplM3s4k1lxIREdk02bqg71HdJqBPMuGIiIhskK25bzrwKlVP594mmXBEREQ2\nyJakZgJnuvsnmRvMbF4V5UVEROpVtmtSV1B9EhudQCwiIiKVZO2C3hCpC7qISG7VdT4pERGRvFKS\nEhGRgqUkJSIiBavG+aTMrBFwJNAlvby735hcWCIiIvEmPXwScKLhkdYnG46IiMgGcZJUF3ffPfFI\nREREMsS5JvWsmR2aeCQiIiIZ4iSpV4CJZrbSzJab2VdmtjzpwCQZXUo6YGY1PrqUdMh3qCIiNd/M\na2afAyeQcU3K3dclG1p+bOk38xZSLCIisAnzSQXzgen1/sktIiJSgzhJ6hNgiplNAn5IrVQXdBER\nSVrcM6n5QKuEYxEREamkxiTl7pcBmNlWYfn/kg5KREQEYvTuM7PdzOwtYDYw28zeNLMeyYcmIiIN\nXZwu6LcDv3P3YncvBi4B7kg2LBERkXhJqqW7P5dacPfngZbJhSQiIhKJk6TmmNnFZlYcHhcBcxKO\nS0REJFaSOhPYEZgEPA0UAyOSDEoaBo1+ISI1idMF/WB3Py99hZkdDzyWTEjSUJTPXRxz9IvFyQcj\nIgUpzpnUpVWsu6S+AxHJJ53ViRSmas+kzOwIoskOO5nZDWmbWqF5pWQLo7M6kcKUrblvCfA+8D3w\nQdr6b4GLkgxKREQEsiQpd58OTDezH7n7nenbzGwUcFPSwYmISMMW55rUGVWsO6ue4xAREdlItmtS\nJwFDga5mlt6TrxWwIunAREREsl2TKgOWEd0XdXPa+m+B6UkGJSIiAtmvSX0OfA48b2btgD5h02fu\nviYXwYmISMMWZxT044FpwKnAacDbZjY46cBERETijDgxBtjX3RcDmFl7YDLweJKBiYiIxOnd1yiV\noIIlMeuJiIhskjhnUpPN7GnggbA8FHg2uZBEREQicZLUb4ETgf5heRzwSGIRbSa6FBdTvmBBvsMQ\nEdmi1Zik3N2Bh4CHzKyNu9fqHikzOxL4C1ET4Z3ufk0VZW4EBgCrgDPc/Z1sdc2sCHgQKCGa22qI\nu39tZm2JEui+wF3ufkHaMXoDdwMtgEnu/qvavI5M5QsWsGLs2Fhl28QsJyIilVV7bcnM9jOz583s\nITPb08zeBT4xs8VmdnicnZtZI6Lhk44AegLDzGzXjDIDgJ3cfWdgJHBbjLoXAc+7+y7AFODisP57\nolHbf1NFOLcCZ7l7d6B7GEB3i9CluDjWCN5mlu9QRURqJduZ1M1EPftaAy8Cx7j7q2bWExhP1MOv\nJvsBs929HMDM/gEMAmamlRkE3APg7m+aWevQg7BrlrqDgIND/XHAVOAid/8OeM3Mdk4Pwsw6AC3d\n/a2w6h7gOLaQa2uFdFanZlARqU/ZklQTd58EYGaXu/urAO7+gcX/St4JmJe2PJ8ocdVUplMNddun\nehy6+yIz2z5GHPOrOIbUs0JKmCKy+cuWpDzt+f9l2Vbf6tImVa/xjE378CwtLaW0tLQ+dy8i0qBN\nnTqVqVOnxiqbLUntaWbLiZJGy/CcsLxNzFgWAJ3TlovDuswyO1ZRplmWuovMrL27Lw5NeUtixFHV\nMao0Vt/wRUQSk/nl/4orrqi2bLabcpsB2wHtgObheWq5RcxY3gK6mVmJmTUjusdqQkaZCUTDLWFm\nfYEVoSkvW90JbJhC5HTgySqOXXFG5u6LgK9DZxALx6uqjmxB1KFEZPOXbYDZdZu6c3dfFyZInMyG\nbuQfmdnIaLPf7u6TzGygmX1C1AV9RLa6YdfXEHWJPxMoB4akjmlmnwMtgWZmNgg43N1nAudTuQv6\nM5v6+qSw6fqYyOYvzs28myQkg10y1v0tY3lU3Lph/XLgp9XU6VrN+n8Du8eLWkRECkG2+6QST2Ai\nIiLZZLsmVQZgZnfnJhQREZHKsp0tNTOzIcBBZnZs5kZ3z+wAISIiUq+yJanzgVOANkQDzKZzNu6l\nJyIiUq+y9e57CXjJzN7O7OggIiKSC3EmL/xfMzvPzP4RHueqU4VI7dTmnq0uxcX5DlekYMRJNjcB\nPwL+NyyfAuwN/CKpoES2NLpnS6Ru4iSpvu6+Z9ryZDObkVRAIiIiKXGa+9abWZfUQni+PplwRERE\nNoiTpC4EXgkTIL4AvAT8Z7JhiUiS4l4j0/Uxybc408dPNrPuQI+w6iN3z5y6Q0Q2I3GvkeXi+ljc\niTJLOnVizvz5NZaTLUusXnohKU1LOBYRaYAKKWFK4YnT3CciIpIXSlIiIoGu1RWeGpv7zOxBonuk\nJrt7ktPGiwjQvBmxJ2Is6dyeOeWLEo6o4VDTY+GJc03qLuBM4KaQsO5290+SDUuk4fphNVRM71kD\n67E40ViUMCXf4vTuewZ4xsyKgOHAi2H22zuAB9x9bcIxikieFFLClIYp1jWpkKBOBk4F3gX+BhwA\naAp2ERFJTJxrUg8TTbt+H3CCu6duVLjPzKYnGZyIiDRsca5J3Q48n95pwsyauPtad987udBERKSh\ni9Pcd00VvfrKkghGREQi6g4fqfZMysy2B3YAtjKz3YFUF59WwNY5iE1EpMFSd/hItua+o4i6nhcD\nt6St/xa4LMmgREREIPv08XcBd5nZEHd/KIcxiYiIANmb+4a5+wPADmZ2QeZ2d78x0chERKTBy9Zx\noij8bAdsV8VDRCRnUqNfxOpMUNKhwcSypcvW3HdL+KnrTyKSd4U0+kUhxbKly9bcd0O2iu7+6/oP\nR0REamNLH18xW+++D3IWhYiI1MmWflaXrbnvzlwGIiIihalLcTHlCxbUWK6kUyfmzJ9fY7nayNbc\nd727/8bMHgc2mkfK3Y+v10hERKQg5fPG4mzNfQ+GnzfV+1FFRERiyNbcVxZ+vmBmTYGdic6oZmsO\nKRERyYU4U3UcSTQS+lyi8fuKzexsd5+cdHAiItKwxZmq4y/AT919FoCZdQeeBHokGZiIiEicqTpW\nphIUQHi+KrmQREREItl69x0bnpaZ2QTgIaJrUicCb+YgNhERaeCyNfedmPb8a+CI8PxboGViEYmI\niATZevdkHn/QAAAUxUlEQVSdmstAREREMsXp3dccOAPoCbRIrXf3XyQXloiISLyOE/cAXYCjia5F\n7QR8n2BMIiKyGUpiCpM4XdC7u/tJZnaUu99pZvcAr2zSKxERkS1OEoPdxjmTWhN+rjCzHkSdJraP\nF4aIiEjdxTmTutPMioAxwLPA1sDliUYlIiJCjCTl7n8LT18EOicbjoiIyAY1NveZWZGZ/dnMyszs\nTTO7LpxZiYiIJCrONal/AN8Aw4FTiG7mfTBrDRERkXoQ55pUJ3cfk7Z8hZm9n1RAIiIiKXHOpF4w\ns5+nFszseOC55EISERGJVJukzOwrM1sOnAY8ZGarzWw18AhwetwDmNmRZjbTzGaZ2YXVlLnRzGab\n2TtmtldNdcN1sslm9rGZPWtmrdO2XRz29ZGZHZ62/sWwr+lmNs3M2sV9DSIikh/ZzqTaAduFn02B\nrcKjaVhfIzNrRDT9/BFEwyoNM7NdM8oMAHZy952BkcBtMepeBDzv7rsAU4CLQ53dgCFEc10NAG4x\nM0s73DB339vde7v70jivQURE8qfaJOXu61IPokRxVXj8LKyLYz+i6ebL3X0NUSeMQRllBhENvYS7\nvwm0NrP2NdQdBIwLz8cBx4XnxwL/cPe17j4HmB32U+PrFRGRwhOnC/pVwH8Bn4XHf5nZlTH33wmY\nl7Y8P6yLUyZb3fbuvhjA3RexYQSMzDoLMo53d2jquzRm/CIikkdxevcdA+ydOnsys/8FpgFJfdBb\nzUU24jHKnOzuX5jZj4DHzOwUd7+3DscSEZEciZOkAFoBX4XntZnwcAGVR6koDusyy+xYRZlmWeou\nMrP27r7YzDoAS2rYF+7+Rfi5yszuJ2oGrDJJjR07tuJ5aWkppaWl2V6jiIjUwtSy6Gf6Z2114iSp\na4FpZvYC0VlOKXBZzFjeArqZWQnwBTAUGJZRZgJwPvCgmfUFVoTkszRL3QlEc1xdQ9TT8Mm09feZ\n2Z+Jmvm6AWVm1hho4+7LzKwp0bQj1Xajj/PGiYhI3ZSGngKpz9orrrii2rJZk1ToGfcC0bh9+4fV\nl7t75tlQldx9nZmNAiYTXf+6090/MrOR0Wa/3d0nmdlAM/sEWAWMyFY37Poaom7xZwLlRD36cPcP\nzewh4EOi0dvPc3cPEzc+a2ZNgMbA88AdcV6DiIjkT9YkFT7gn3P3XsBjdTmAuz8D7JKx7m8Zy6Pi\n1g3rlwM/rabO1cDVGeu+A/rUKnAREcm7OF2y3zGzvROPREREJEOca1J7A2+Z2adEzXFGdJLVO9HI\nRESkwYuTpI5NPAoREZEqVJukQmeDs4l6yL0H3F2LkSZEREQ2WbZrUncD/YmGFjoOuC4XAYmIiKRk\na+7r5e67A5jZ7cCbuQlJREQkku1Mak3qSRjgVUREJKeynUntGeaTgqhHX8uwnOrd1zbx6EREpEHL\nlqSa5SwKERGRKlSbpNSTT0RE8k2TAIqISMFSkhIRkYKlJCUiIgUr24gTX1H1jLfq3SciIjmRrXdf\nu5xFISIiUoXYvfvMrC3QIm3VwqSCEhERgRjXpMzsKDObBcwnGhppPjAl6cBERETidJy4CjgQ+Njd\ndwSOAF5JNCoRERHiJam17v4l0MjMzN2fA/ZLOC4REZFYkx5+bWbbAP8C7jGzJcD/JRuWiIhIvDOp\n44iS0q+AqcAC4OgEYxIREQHiJamL3X2du69x9zvd/Qbg10kHJiIiEidJHVnFuqPqOxAREZFM2Uac\nGAmcA3Q3s2lpm1oC/046MBERkWwdJx4CXgCuBi5KW/+tuy9JNCoRERGyjzjxFfAVcKKZ9QQOCpte\nAZSkREQkcXFGnDgfeBjoHB4Pmdl5SQcmIiIS5z6pkcB+7r4SwMz+ALwG3JJkYCIiInF69xmwOm15\nTVgnIiKSqGy9+5q4+1pgPPCmmT0aNg0GxuUiOBERadiynUmVAbj7tURNft+Fxznufl0OYssbM6vx\nkQvFJV0KJhYRkXzIdk2q4tPP3csISashuHra6hrLXNy7WeJxLJhbXjCxiIjkQ7YktZ2ZVTv8URge\nSRqI4pIuLJhbnu8wRKSByZakGgPboE4SQmGd1RVSwlQsVSukWGTzli1JfeHu/52zSERiKqSEuTnG\nAsnHU0ixKGFu3mJdkxIR2VwpYVatkGLJJluSOixnUYiINACFlDALKZZsqu2C7u7LcxmIiIhIpjgj\nToiIiOSFkpSIiBQsJSkRESlYSlIiIlKwlKRERKRgKUmJiEjBUpISEZGCpSQlIiIFS0lKREQKlpKU\niIgULCUpEREpWEpSIiJSsBJPUmZ2pJnNNLNZZnZhNWVuNLPZZvaOme1VU10zKzKzyWb2sZk9a2at\n07ZdHPb1kZkdnra+t5m9G/b1l6Rer4iI1J9Ek5SZNQJuAo4AegLDzGzXjDIDgJ3cfWdgJHBbjLoX\nAc+7+y7AFODiUGc3YAjQAxgA3GJmqXmxbgXOcvfuQHczOyKZVy0iIvUl6TOp/YDZ7l7u7muAfwCD\nMsoMAu4BcPc3gdZm1r6GuoOAceH5OOC48PxY4B/uvtbd5wCzgf3MrAPQ0t3fCuXuSasjIiIFKukk\n1QmYl7Y8P6yLUyZb3fbuvhjA3RcB21ezrwVp+5pfQxwiIlJgCrHjRF2mrfd6j0JERPLP3RN7AH2B\nZ9KWLwIuzChzG3BS2vJMoH22usBHRGdTAB2Aj6raP/AMsH96mbB+KHBrNTG7HnrooYceuX1Ul0ea\nkKy3gG5mVgJ8QZQchmWUmQCcDzxoZn2BFe6+2MyWZqk7ATgDuAY4HXgybf19ZvZnoua8bkCZu7uZ\nfW1m+4WYTgNurCpgd6/LmZyIiCQg0STl7uvMbBQwmahp8U53/8jMRkab/XZ3n2RmA83sE2AVMCJb\n3bDra4CHzOxMoJyoRx/u/qGZPQR8CKwBzvNwekSUCO8GWgCT3P2ZJF+7iIhsOtvwGS4iIlJYCrHj\nRMGLc4NyjuK408wWm9m7+YohLZZiM5tiZh+Y2XtmdkEeY2luZm+a2fQQy5h8xZIWUyMzm2ZmE/Ic\nxxwzmxHem7I8x9LazB4ON95/YGb75ymO7uH9mBZ+fp3nv9/RZvZ+GHzgPjNrlsdYfhn+h/L2P60z\nqVoKNxnPAg4DFhJd4xrq7jPzEEt/YCVwj7vvkevjZ8TSAejg7u+Y2TbAv4FB+XhfQjxbu/t3ZtYY\neBW4wN3z9qFsZqOBfYBW7n5sHuP4DNjH3b/KVwxpsdwNvOTud5lZE2Brd/8mzzE1IrpFZX93n1dT\n+QSO3xH4F7Cru682sweBp939njzE0hN4ANgXWAv8EzjH3T/LZRw6k6q9ODco54S7/wvI+4cNgLsv\ncvd3wvOVRD0w83Yvmrt/F542J7r2mrdvY2ZWDAwE/p6vGNIYBfB/b2atgIPc/S6AcAN+XhNU8FPg\n03wkqDSNgR+lEjfRl+F86AG86e4/uPs64GXg+FwHkfc/1s1QnBuUGzQz6wLsBbyZxxgamdl0YBHw\nXNpoI/nwZ+A/yWOiTOPAc2b2lpmdncc4ugJLzeyu0Mx2u5ltlcd4Uk4iOnvIC3dfCFwPzCUajGCF\nuz+fp3DeBw4KY6VuTfRFa8dcB6EkJfUqNPU9AvwynFHlhbuvd/e9gWJg/zCuY86Z2VHA4nCWadTt\nZvX6dKC79yb6wDk/NBnnQxOgN3BziOc7ovsc88bMmhINrfZwHmNoQ9QyUwJ0BLYxs5PzEUtoqr8G\neA6YBEwH1uU6DiWp2lsAdE5bLg7rGrzQPPEIMN7dn6ypfC6EJqQXgSPzFMKBwLHhWtADwCFmlvPr\nCynu/kX4+SXwOFHzdT7MB+a5+9th+RGipJVPA4B/h/cmX34KfObuy0MT22PAAfkKxt3vcvc+7l4K\nrCC6Hp9TSlK1V3GDcuh1M5ToJuJ8KYRv5yn/C3zo7v+TzyDMrF1q+pbQhPQzopFMcs7df+fund39\nx0R/K1Pc/bR8xGJmW4czXczsR8DhRE06ORfG3pxnZt3DqsOI7m/Mp2HksakvmAv0NbMWYQaHw4iu\n7+aFmW0XfnYGBgP35zqGpEec2OLUcJNxTpnZ/UApsK2ZzQXGpC5E5yGWA4HhwHvhWpADv8vTTdM7\nAONCT61GwIPuPikPcRSa9sDjZuZE//v3ufvkPMZzAdEIMU2Bzwg38udDuObyU+AX+YoBwN3LzOwR\noqa1NeHn7XkM6VEza8uGwRFy3rlFXdBFRKRgqblPREQKlpKUiIgULCUpEREpWEpSIiJSsJSkRESk\nYClJiYhIwVKSakDMrG3alARfmNn8tOVa3TMXpgnZeRPj2drMXtyUfaTta3RtpzQws8PM7PEq1p9l\n0ezOORXnPTWz8Wa20SjqZtbVzE6qwzFvCNMw/CFj/e9rMzVD5vHNbG8zO6K28cQ81pAwtcc6M9sj\nY9ulZjbbzD40s8PS1u8bXucsM7s+bX1zi6YLmW1mr4bBgHOiur+/tO3tzezpXMVTqJSkGpAw1Mre\nYay0W4EbUsvuvraW+zrL3WdvYkj/D3hoE/dBmI7j10SzLtdWdTcK5vwGwk18T3ciGtEitjCiwQh3\n393df1fH41Z3/N4kNxTVu0Tj272avtLMdgeOA3YFjib6G0+5FTjd3bsDvdIS2C+AL9x9Z+AW4I8J\nxZyaBiRTtX9nYVSOpWa2b1IxbQ6UpBquSkMpmdl/hW+a74YRNTCznSyafO2B8M30H2bWPGx7JfUt\n1syOMrN/h7OyZ8K6Q83snXCW9nY1I1wPB54M5TuGfU4LMfQN608Jy++a2VVhXWMz+8rM/mxm7xCN\nML498IqZTQ5lBpjZa+HYD6SOH2KdaWZvk32KlS5mNtXMPjaz34W6V5nZ+Wnv2R/N7NyM9/EiMzsn\nPP+rmT0bnv/MovmTUpNmVhVb+ns6Mhz7dTO7w8xuSDvMoeFb/ydmlnoNVwOl4f0blRGTmdn14fc7\nw8xS0y08RTSA6bS0del6h+N/bGYjathX+vEvAC4HTk7t28y2NbMnQ51/WRjw16IztrvCa//czAaZ\n2XVh/xOr+mB395nu/gkbDwc2CHjA3deFOY/KzWyfcHbU3N2nhXLjiZJZqs648PwhYKOzvxp+p3eF\n53H+Tvet7u8vy//Lk8ApVfxuGg5316MBPoAxwK/D8/2Ihl9pBmxDNIZaT6Jvx+uBfUO5cUSTBwK8\nAuxBNNROOVAc1rcJPyel1duaMLpJ2vGbA/PTlv8L+M/w3EKdTsDnQBHRHDtTiUbvbhziGpRWfy7Q\nMjzfLpRtEZZ/RzTC9lZE06x0CesfAR6r4r05K5RrFeL4ILzWnYCyUKYR8CnQOqPugUTDDUE0ed0b\n4fX8N9GwP1XGlvGeFhMNE9SKaPiiV4nOeiH6gE3tf3fgo/D8sKpeS9g2hGjiPMLvay7QLryPy6up\n83vgbaBpiHle+FndviodP7yHN6Qt3wJcHJ7/DHgr7TgvhveoN7AKODRsmwAMzPI3/AqwR9ryrcCQ\ntOW7iUY13x+YlLa+NBUr0bh426dt+5xoYsra/E5j/Z2S5e+Pav5fiAaznpbvz4t8PnQmJQD9gUfd\nfbVH02s8ARwUtn3mG+ZiujeUTdePaNDU+QDuviKsfxW4MXyrb+3hPy7N9sDytOW3gP9nZpcBu3s0\naeH+wAvu/pVHI0LfD/wklP/BK4+0nj7Q7gHAbsBrFo0jeDLQJaz72N3nhHL3ZXlPnnX3b0IcTwD9\n3f1T4BuLZiwdQDQh3NcZ9d4i+sbcmmjW5LeIZuQ9iOhDtarYSjL2kXrd33jUDPtIxvYnANz9PaLp\nHGrSnzBwqkdNSK8AfcK2bIMTP+HuazwaFfwloi8z2fZVUwzjQ73ngB3SzhYmhb+P96LNPiWsf4/o\n95ZLVb0fNf1O4/6dZvv7q+7/ZQnRWJQNlpKU1FZVbegb/WO7+1XA2URnZm+Y2U4ZRf6PtGtI7v4i\n0TfcL4gGhx1W3b7T6lfHgH96dK1tb3fv5e7n1LC/jV5CNct3En17HkE06nvlQu6riWZSPY3oW/cr\nRGcZnX1DE1VmbOdm7qeGOH+IWa466XWyXXtL32ZEZwXZ9pVNtuOkXs96YHXa+vXUbhDsBVSelC81\njU626XUq6lg00O1GU9jH+J1C/L/TKstl+X9pUcU+GhQlKYHon26wRT2dtiFqK38lbOtqZvuE5yen\nrU95jehaRGcAMysKP3/s7u+7+x+BacAu6ZXcfSmwlYVehaH+Ynf/O1Ezzd5EM/uWWjQzaBOiC/NT\nwy4y/9m/IWoeS8V0sJl1Dfve2sy6ETVjpqZZMaKpGapzuJm1smh07PSL9I8BxwB7evUzpr4C/JZo\nuu1/AecTNZ1liy1dWXjdrcIHZ7Ypu1Pvw7dAyyzxDA3Xk9oTnc2l4smWZI4zs6YWTdfQP9Spbl/f\nsuH9p4rlVwjXVszsp8ACd6/qw7e2STe9/ARgWIh5J6Ik8u9wlv99uD5lwKmEa6Ghzhnh+UlEsxtU\nJdvvNO7fabV/f1n+X7qTp+lUCoWSlBCa8x4g+qd7jWi21A/C5o+AX5vZh0Rt6nekqoW6S4BzgSdD\n89W9Yftvw8Xvd4g+sKr653+eDRO6HQbMMLNpRPPW/NXdFwCXETU1TQNe8w1Tf2R+M78DeN7MJoeY\n/h/wYDj+q8DO4UPxXOAZokSwMMvb8hbRB9h0ousR74bX+wPRB1W2eYdeIWrOfMOj6cBXhzqp9+us\nzNjSX5O7zwP+FGJ4meja19fpZdKklqcDTSzqvDIqo8wjRPNpvUv0exgdviRUtb9077PhQ/ny0OxX\n3b6mA43Tjj8F2NOiDjXHE3Wk6GdmM4CxbEgMmWrsVWlmPzezeUTNjM+Y2USA8Dt6guhv9imi33XK\neUTXVGcBH6R9wbidqOlxdihTXS/HbL/TWH+n4e/vHKr++6vu/+UQoEF3Q9dUHVKt8G30EY+mYU9i\n/32Ac939rCT2n4TQ22w60cXwOQke50fuvip8M38SuMXdG/SHVUNkZi8DR7n7t/mOJV90JiU1Sexb\njEdTh/8rqf3XNzPrBXxCdKF/TsKH+304q5wBzFSCanjMbHvg2oacoEBnUiIiUsB0JiUiIgVLSUpE\nRAqWkpSIiBQsJSkRESlYSlIiIlKwlKRERKRg/X9CuYVdWIuo+QAAAABJRU5ErkJggg==\n", + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "a_top = np.sort([sum(tpm_low_gamma.get_topics(topic_ids=[i], num_words=100)['score']) for i in range(10)])[::-1]\n", + "b_top = np.sort([sum(topic_model.get_topics(topic_ids=[i], num_words=100)['score']) for i in range(10)])[::-1]\n", + "c_top = np.sort([sum(tpm_high_gamma.get_topics(topic_ids=[i], num_words=100)['score']) for i in range(10)])[::-1]\n", + "\n", + "a_bot = np.sort([sum(tpm_low_gamma.get_topics(topic_ids=[i], num_words=547462)[-1000:]['score']) for i in range(10)])[::-1]\n", + "b_bot = np.sort([sum(topic_model.get_topics(topic_ids=[i], num_words=547462)[-1000:]['score']) for i in range(10)])[::-1]\n", + "c_bot = np.sort([sum(tpm_high_gamma.get_topics(topic_ids=[i], num_words=547462)[-1000:]['score']) for i in range(10)])[::-1]\n", + "\n", + "ind = np.arange(len(a))\n", + "width = 0.3\n", + " \n", + "param_bar_plot(a_top, b_top, c_top, ind, width, ylim=0.6, param='gamma',\n", + " xlab='Topics (sorted by weight of top 100 words)', \n", + " ylab='Total Probability of Top 100 Words')\n", + "\n", + "param_bar_plot(a_bot, b_bot, c_bot, ind, width, ylim=0.0002, param='gamma',\n", + " xlab='Topics (sorted by weight of bottom 1000 words)',\n", + " ylab='Total Probability of Bottom 1000 Words')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "collapsed": true + }, + "source": [ + "From these two plots we can see that the low gamma model results in higher weight placed on the top words and lower weight placed on the bottom words for each topic, while the high gamma model places relatively less weight on the top words and more weight on the bottom words. Thus increasing gamma results in topics that have a smoother distribution of weight across all the words in the vocabulary." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "__Quiz Question:__ For each topic of the **low gamma model**, compute the number of words required to make a list with total probability 0.5. What is the average number of words required across all topics? (HINT: use the get_topics() function from GraphLab Create with the _cdf_\\__cutoff_ argument)." + ] + }, + { + "cell_type": "code", + "execution_count": 72, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "def calculate_avg_words(model, num_words=547462, cdf_cutoff=0.5, num_topics=10):\n", + " avg_num_of_words = []\n", + " for i in range(num_topics):\n", + " avg_num_of_words.append(len(model.get_topics(topic_ids=[i], num_words=547462, cdf_cutoff=.5)))\n", + " avg_num_of_words = np.mean(avg_num_of_words)\n", + " return avg_num_of_words" + ] + }, + { + "cell_type": "code", + "execution_count": 73, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "text/plain": [ + "252.40000000000001" + ] + }, + "execution_count": 73, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "calculate_avg_words(tpm_low_gamma)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "__Quiz Question:__ For each topic of the **high gamma model**, compute the number of words required to make a list with total probability 0.5. What is the average number of words required across all topics? (HINT: use the get_topics() function from GraphLab Create with the _cdf_\\__cutoff_ argument)." + ] + }, + { + "cell_type": "code", + "execution_count": 74, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "text/plain": [ + "576.20000000000005" + ] + }, + "execution_count": 74, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "calculate_avg_words(tpm_high_gamma)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We have now seen how the hyperparameters alpha and gamma influence the characteristics of our LDA topic model, but we haven't said anything about what settings of alpha or gamma are best. We know that these parameters are responsible for controlling the smoothness of the topic distributions for documents and word distributions for topics, but there's no simple conversion between smoothness of these distributions and quality of the topic model. In reality, there is no universally \"best\" choice for these parameters. Instead, finding a good topic model requires that we be able to both explore the output (as we did by looking at the topics and checking some topic predictions for documents) and understand the impact of hyperparameter settings (as we have in this section)." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 2", + "language": "python", + "name": "python2" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 2 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython2", + "version": "2.7.12" + }, + "toc": { + "toc_cell": false, + "toc_number_sections": false, + "toc_threshold": "8", + "toc_window_display": false + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git a/machine_learning/4_clustering_and_retrieval/assigment/week5/.ipynb_checkpoints/quiz-week5-assignment-checkpoint.ipynb b/machine_learning/4_clustering_and_retrieval/assigment/week5/.ipynb_checkpoints/quiz-week5-assignment-checkpoint.ipynb new file mode 100644 index 0000000..1bead40 --- /dev/null +++ b/machine_learning/4_clustering_and_retrieval/assigment/week5/.ipynb_checkpoints/quiz-week5-assignment-checkpoint.ipynb @@ -0,0 +1,194 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Modeling text topics with Latent Dirichlet Allocation" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Question 1\n", + "\n", + "\n", + "\n", + "*Screenshot taken from [Coursera](https://www.coursera.org/learn/ml-clustering-and-retrieval/exam/LXSnc/modeling-text-topics-with-latent-dirichlet-allocation)*\n", + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Question 2\n", + "\n", + "\n", + "\n", + "*Screenshot taken from [Coursera](https://www.coursera.org/learn/ml-clustering-and-retrieval/exam/LXSnc/modeling-text-topics-with-latent-dirichlet-allocation)*\n", + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Question 3\n", + "\n", + "\n", + "\n", + "*Screenshot taken from [Coursera](https://www.coursera.org/learn/ml-clustering-and-retrieval/exam/LXSnc/modeling-text-topics-with-latent-dirichlet-allocation)*\n", + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Question 4\n", + "\n", + "\n", + "\n", + "*Screenshot taken from [Coursera](https://www.coursera.org/learn/ml-clustering-and-retrieval/exam/LXSnc/modeling-text-topics-with-latent-dirichlet-allocation)*\n", + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Question 5\n", + "\n", + "\n", + "\n", + "*Screenshot taken from [Coursera](https://www.coursera.org/learn/ml-clustering-and-retrieval/exam/LXSnc/modeling-text-topics-with-latent-dirichlet-allocation)*\n", + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Question 6\n", + "\n", + "\n", + "\n", + "*Screenshot taken from [Coursera](https://www.coursera.org/learn/ml-clustering-and-retrieval/exam/LXSnc/modeling-text-topics-with-latent-dirichlet-allocation)*\n", + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Question 7\n", + "\n", + "\n", + "\n", + "*Screenshot taken from [Coursera](https://www.coursera.org/learn/ml-clustering-and-retrieval/exam/LXSnc/modeling-text-topics-with-latent-dirichlet-allocation)*\n", + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Question 8\n", + "\n", + "\n", + "\n", + "*Screenshot taken from [Coursera](https://www.coursera.org/learn/ml-clustering-and-retrieval/exam/LXSnc/modeling-text-topics-with-latent-dirichlet-allocation)*\n", + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Question 9\n", + "\n", + "\n", + "\n", + "*Screenshot taken from [Coursera](https://www.coursera.org/learn/ml-clustering-and-retrieval/exam/LXSnc/modeling-text-topics-with-latent-dirichlet-allocation)*\n", + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Question 10\n", + "\n", + "\n", + "\n", + "*Screenshot taken from [Coursera](https://www.coursera.org/learn/ml-clustering-and-retrieval/exam/LXSnc/modeling-text-topics-with-latent-dirichlet-allocation)*\n", + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Question 11\n", + "\n", + "\n", + "\n", + "*Screenshot taken from [Coursera](https://www.coursera.org/learn/ml-clustering-and-retrieval/exam/LXSnc/modeling-text-topics-with-latent-dirichlet-allocation)*\n", + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Question 12\n", + "\n", + "\n", + "\n", + "*Screenshot taken from [Coursera](https://www.coursera.org/learn/ml-clustering-and-retrieval/exam/LXSnc/modeling-text-topics-with-latent-dirichlet-allocation)*\n", + "\n", + "" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 2", + "language": "python", + "name": "python2" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 2 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython2", + "version": "2.7.12" + }, + "toc": { + "toc_cell": false, + "toc_number_sections": false, + "toc_threshold": "8", + "toc_window_display": false + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git a/machine_learning/4_clustering_and_retrieval/assigment/week5/.ipynb_checkpoints/quiz-week5-assignment1-checkpoint.ipynb b/machine_learning/4_clustering_and_retrieval/assigment/week5/.ipynb_checkpoints/quiz-week5-assignment1-checkpoint.ipynb deleted file mode 100644 index 3aa5bc4..0000000 --- a/machine_learning/4_clustering_and_retrieval/assigment/week5/.ipynb_checkpoints/quiz-week5-assignment1-checkpoint.ipynb +++ /dev/null @@ -1,214 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Exploring Ensemble Methods" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "collapsed": true - }, - "source": [ - "# Question 1" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "\n", - "\n", - "*Screenshot taken from [Coursera](https://www.coursera.org/learn/ml-classification/exam/FgzAt/exploring-ensemble-methods)*\n", - "\n", - "" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Question 2" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "\n", - "\n", - "*Screenshot taken from [Coursera](https://www.coursera.org/learn/ml-classification/exam/FgzAt/exploring-ensemble-methods)*\n", - "\n", - "" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Question 3" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "\n", - "\n", - "*Screenshot taken from [Coursera](https://www.coursera.org/learn/ml-classification/exam/FgzAt/exploring-ensemble-methods)*\n", - "\n", - "" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Question 4" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "\n", - "\n", - "*Screenshot taken from [Coursera](https://www.coursera.org/learn/ml-classification/exam/FgzAt/exploring-ensemble-methods)*\n", - "\n", - "" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Question 5" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "\n", - "\n", - "*Screenshot taken from [Coursera](https://www.coursera.org/learn/ml-classification/exam/FgzAt/exploring-ensemble-methods)*\n", - "\n", - "" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Question 6" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "\n", - "\n", - "*Screenshot taken from [Coursera](https://www.coursera.org/learn/ml-classification/exam/FgzAt/exploring-ensemble-methods)*\n", - "\n", - "" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Question 7" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "\n", - "\n", - "*Screenshot taken from [Coursera](https://www.coursera.org/learn/ml-classification/exam/FgzAt/exploring-ensemble-methods)*\n", - "\n", - "" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Question 8" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "\n", - "\n", - "*Screenshot taken from [Coursera](https://www.coursera.org/learn/ml-classification/exam/FgzAt/exploring-ensemble-methods)*\n", - "\n", - "" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Question 9" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "\n", - "\n", - "*Screenshot taken from [Coursera](https://www.coursera.org/learn/ml-classification/exam/FgzAt/exploring-ensemble-methods)*\n", - "\n", - "" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Question 10" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "\n", - "\n", - "*Screenshot taken from [Coursera](https://www.coursera.org/learn/ml-classification/exam/FgzAt/exploring-ensemble-methods)*\n", - "\n", - "" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 2", - "language": "python", - "name": "python2" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 2 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython2", - "version": "2.7.10" - } - }, - "nbformat": 4, - "nbformat_minor": 0 -} diff --git a/machine_learning/4_clustering_and_retrieval/assigment/week5/5_lda_blank.ipynb b/machine_learning/4_clustering_and_retrieval/assigment/week5/5_lda_blank.ipynb new file mode 100644 index 0000000..2387075 --- /dev/null +++ b/machine_learning/4_clustering_and_retrieval/assigment/week5/5_lda_blank.ipynb @@ -0,0 +1,723 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Latent Dirichlet Allocation for Text Data\n", + "\n", + "In this assignment you will\n", + "\n", + "* apply standard preprocessing techniques on Wikipedia text data\n", + "* use GraphLab Create to fit a Latent Dirichlet allocation (LDA) model\n", + "* explore and interpret the results, including topic keywords and topic assignments for documents\n", + "\n", + "Recall that a major feature distinguishing the LDA model from our previously explored methods is the notion of *mixed membership*. Throughout the course so far, our models have assumed that each data point belongs to a single cluster. k-means determines membership simply by shortest distance to the cluster center, and Gaussian mixture models suppose that each data point is drawn from one of their component mixture distributions. In many cases, though, it is more realistic to think of data as genuinely belonging to more than one cluster or category - for example, if we have a model for text data that includes both \"Politics\" and \"World News\" categories, then an article about a recent meeting of the United Nations should have membership in both categories rather than being forced into just one.\n", + "\n", + "With this in mind, we will use GraphLab Create tools to fit an LDA model to a corpus of Wikipedia articles and examine the results to analyze the impact of a mixed membership approach. In particular, we want to identify the topics discovered by the model in terms of their most important words, and we want to use the model to predict the topic membership distribution for a given document. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Note to Amazon EC2 users**: To conserve memory, make sure to stop all the other notebooks before running this notebook." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Text Data Preprocessing\n", + "We'll start by importing our familiar Wikipedia dataset.\n", + "\n", + "The following code block will check if you have the correct version of GraphLab Create. Any version later than 1.8.5 will do. To upgrade, read [this page](https://turi.com/download/upgrade-graphlab-create.html)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "import graphlab as gl\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt \n", + "\n", + "%matplotlib inline\n", + "\n", + "'''Check GraphLab Create version'''\n", + "from distutils.version import StrictVersion\n", + "assert (StrictVersion(gl.version) >= StrictVersion('1.8.5')), 'GraphLab Create must be version 1.8.5 or later.'" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "# import wiki data\n", + "wiki = gl.SFrame('people_wiki.gl/')\n", + "wiki" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In the original data, each Wikipedia article is represented by a URI, a name, and a string containing the entire text of the article. Recall from the video lectures that LDA requires documents to be represented as a _bag of words_, which ignores word ordering in the document but retains information on how many times each word appears. As we have seen in our previous encounters with text data, words such as 'the', 'a', or 'and' are by far the most frequent, but they appear so commonly in the English language that they tell us almost nothing about how similar or dissimilar two documents might be. \n", + "\n", + "Therefore, before we train our LDA model, we will preprocess the Wikipedia data in two steps: first, we will create a bag of words representation for each article, and then we will remove the common words that don't help us to distinguish between documents. For both of these tasks we can use pre-implemented tools from GraphLab Create:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "wiki_docs = gl.text_analytics.count_words(wiki['text'])\n", + "wiki_docs = wiki_docs.dict_trim_by_keys(gl.text_analytics.stopwords(), exclude=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Model fitting and interpretation\n", + "In the video lectures we saw that Gibbs sampling can be used to perform inference in the LDA model. In this assignment we will use a GraphLab Create method to learn the topic model for our Wikipedia data, and our main emphasis will be on interpreting the results. We'll begin by creating the topic model using create() from GraphLab Create's topic_model module.\n", + "\n", + "Note: This may take several minutes to run." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "topic_model = gl.topic_model.create(wiki_docs, num_topics=10, num_iterations=200)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "GraphLab provides a useful summary of the model we have fitted, including the hyperparameter settings for alpha, gamma (note that GraphLab Create calls this parameter beta), and K (the number of topics); the structure of the output data; and some useful methods for understanding the results." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "topic_model" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "It is certainly useful to have pre-implemented methods available for LDA, but as with our previous methods for clustering and retrieval, implementing and fitting the model gets us only halfway towards our objective. We now need to analyze the fitted model to understand what it has done with our data and whether it will be useful as a document classification system. This can be a challenging task in itself, particularly when the model that we use is complex. We will begin by outlining a sequence of objectives that will help us understand our model in detail. In particular, we will\n", + "\n", + "* get the top words in each topic and use these to identify topic themes\n", + "* predict topic distributions for some example documents\n", + "* compare the quality of LDA \"nearest neighbors\" to the NN output from the first assignment\n", + "* understand the role of model hyperparameters alpha and gamma" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Load a fitted topic model\n", + "The method used to fit the LDA model is a _randomized algorithm_, which means that it involves steps that are random; in this case, the randomness comes from Gibbs sampling, as discussed in the LDA video lectures. Because of these random steps, the algorithm will be expected to yield slighty different output for different runs on the same data - note that this is different from previously seen algorithms such as k-means or EM, which will always produce the same results given the same input and initialization.\n", + "\n", + "It is important to understand that variation in the results is a fundamental feature of randomized methods. However, in the context of this assignment this variation makes it difficult to evaluate the correctness of your analysis, so we will load and analyze a pre-trained model. \n", + "\n", + "We recommend that you spend some time exploring your own fitted topic model and compare our analysis of the pre-trained model to the same analysis applied to the model you trained above." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false, + "scrolled": true + }, + "outputs": [], + "source": [ + "topic_model = gl.load_model('lda_assignment_topic_model')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Identifying topic themes by top words\n", + "\n", + "We'll start by trying to identify the topics learned by our model with some major themes. As a preliminary check on the results of applying this method, it is reasonable to hope that the model has been able to learn topics that correspond to recognizable categories. In order to do this, we must first recall what exactly a 'topic' is in the context of LDA. \n", + "\n", + "In the video lectures on LDA we learned that a topic is a probability distribution over words in the vocabulary; that is, each topic assigns a particular probability to every one of the unique words that appears in our data. Different topics will assign different probabilities to the same word: for instance, a topic that ends up describing science and technology articles might place more probability on the word 'university' than a topic that describes sports or politics. Looking at the highest probability words in each topic will thus give us a sense of its major themes. Ideally we would find that each topic is identifiable with some clear theme _and_ that all the topics are relatively distinct.\n", + "\n", + "We can use the GraphLab Create function get_topics() to view the top words (along with their associated probabilities) from each topic.\n", + "\n", + "__Quiz Question:__ Identify the top 3 most probable words for the first topic. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "__ Quiz Question:__ What is the sum of the probabilities assigned to the top 50 words in the 3rd topic?" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's look at the top 10 words for each topic to see if we can identify any themes:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "[x['words'] for x in topic_model.get_topics(output_type='topic_words', num_words=10)]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We propose the following themes for each topic:\n", + "\n", + "- topic 0: Science and research\n", + "- topic 2: Team sports\n", + "- topic 3: Music, TV, and film\n", + "- topic 4: American college and politics\n", + "- topic 5: General politics\n", + "- topic 6: Art and publishing\n", + "- topic 7: Business\n", + "- topic 8: International athletics\n", + "- topic 9: Great Britain and Australia\n", + "- topic 10: International music\n", + "\n", + "We'll save these themes for later:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "themes = ['science and research','team sports','music, TV, and film','American college and politics','general politics', \\\n", + " 'art and publishing','Business','international athletics','Great Britain and Australia','international music']" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Measuring the importance of top words\n", + "\n", + "We can learn more about topics by exploring how they place probability mass (which we can think of as a weight) on each of their top words.\n", + "\n", + "We'll do this with two visualizations of the weights for the top words in each topic:\n", + " - the weights of the top 100 words, sorted by the size\n", + " - the total weight of the top 10 words\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here's a plot for the top 100 words by weight in each topic:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "for i in range(10):\n", + " plt.plot(range(100), topic_model.get_topics(topic_ids=[i], num_words=100)['score'])\n", + "plt.xlabel('Word rank')\n", + "plt.ylabel('Probability')\n", + "plt.title('Probabilities of Top 100 Words in each Topic')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In the above plot, each line corresponds to one of our ten topics. Notice how for each topic, the weights drop off sharply as we move down the ranked list of most important words. This shows that the top 10-20 words in each topic are assigned a much greater weight than the remaining words - and remember from the summary of our topic model that our vocabulary has 547462 words in total!\n", + "\n", + "\n", + "Next we plot the total weight assigned by each topic to its top 10 words: " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "top_probs = [sum(topic_model.get_topics(topic_ids=[i], num_words=10)['score']) for i in range(10)]\n", + "\n", + "ind = np.arange(10)\n", + "width = 0.5\n", + "\n", + "fig, ax = plt.subplots()\n", + "\n", + "ax.bar(ind-(width/2),top_probs,width)\n", + "ax.set_xticks(ind)\n", + "\n", + "plt.xlabel('Topic')\n", + "plt.ylabel('Probability')\n", + "plt.title('Total Probability of Top 10 Words in each Topic')\n", + "plt.xlim(-0.5,9.5)\n", + "plt.ylim(0,0.15)\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here we see that, for our topic model, the top 10 words only account for a small fraction (in this case, between 5% and 13%) of their topic's total probability mass. So while we can use the top words to identify broad themes for each topic, we should keep in mind that in reality these topics are more complex than a simple 10-word summary.\n", + "\n", + "Finally, we observe that some 'junk' words appear highly rated in some topics despite our efforts to remove unhelpful words before fitting the model; for example, the word 'born' appears as a top 10 word in three different topics, but it doesn't help us describe these topics at all." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Topic distributions for some example documents\n", + "\n", + "As we noted in the introduction to this assignment, LDA allows for mixed membership, which means that each document can partially belong to several different topics. For each document, topic membership is expressed as a vector of weights that sum to one; the magnitude of each weight indicates the degree to which the document represents that particular topic.\n", + "\n", + "We'll explore this in our fitted model by looking at the topic distributions for a few example Wikipedia articles from our data set. We should find that these articles have the highest weights on the topics whose themes are most relevant to the subject of the article - for example, we'd expect an article on a politician to place relatively high weight on topics related to government, while an article about an athlete should place higher weight on topics related to sports or competition." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Topic distributions for documents can be obtained using GraphLab Create's predict() function. GraphLab Create uses a collapsed Gibbs sampler similar to the one described in the video lectures, where only the word assignments variables are sampled. To get a document-specific topic proportion vector post-facto, predict() draws this vector from the conditional distribution given the sampled word assignments in the document. Notice that, since these are draws from a _distribution_ over topics that the model has learned, we will get slightly different predictions each time we call this function on a document - we can see this below, where we predict the topic distribution for the article on Barack Obama:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "obama = gl.SArray([wiki_docs[int(np.where(wiki['name']=='Barack Obama')[0])]])\n", + "pred1 = topic_model.predict(obama, output_type='probability')\n", + "pred2 = topic_model.predict(obama, output_type='probability')\n", + "print(gl.SFrame({'topics':themes, 'predictions (first draw)':pred1[0], 'predictions (second draw)':pred2[0]}))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To get a more robust estimate of the topics for each document, we can average a large number of predictions for the same document:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "def average_predictions(model, test_document, num_trials=100):\n", + " avg_preds = np.zeros((model.num_topics))\n", + " for i in range(num_trials):\n", + " avg_preds += model.predict(test_document, output_type='probability')[0]\n", + " avg_preds = avg_preds/num_trials\n", + " result = gl.SFrame({'topics':themes, 'average predictions':avg_preds})\n", + " result = result.sort('average predictions', ascending=False)\n", + " return result" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "print average_predictions(topic_model, obama, 100)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "__Quiz Question:__ What is the topic most closely associated with the article about former US President George W. Bush? Use the average results from 100 topic predictions." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "__Quiz Question:__ What are the top 3 topics corresponding to the article about English football (soccer) player Steven Gerrard? Use the average results from 100 topic predictions." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Comparing LDA to nearest neighbors for document retrieval\n", + "\n", + "So far we have found that our topic model has learned some coherent topics, we have explored these topics as probability distributions over a vocabulary, and we have seen how individual documents in our Wikipedia data set are assigned to these topics in a way that corresponds with our expectations. \n", + "\n", + "In this section, we will use the predicted topic distribution as a representation of each document, similar to how we have previously represented documents by word count or TF-IDF. This gives us a way of computing distances between documents, so that we can run a nearest neighbors search for a given document based on its membership in the topics that we learned from LDA. We can contrast the results with those obtained by running nearest neighbors under the usual TF-IDF representation, an approach that we explored in a previous assignment. \n", + "\n", + "We'll start by creating the LDA topic distribution representation for each document:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "wiki['lda'] = topic_model.predict(wiki_docs, output_type='probability')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next we add the TF-IDF document representations:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "wiki['word_count'] = gl.text_analytics.count_words(wiki['text'])\n", + "wiki['tf_idf'] = gl.text_analytics.tf_idf(wiki['word_count'])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For each of our two different document representations, we can use GraphLab Create to compute a brute-force nearest neighbors model:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "model_tf_idf = gl.nearest_neighbors.create(wiki, label='name', features=['tf_idf'],\n", + " method='brute_force', distance='cosine')\n", + "model_lda_rep = gl.nearest_neighbors.create(wiki, label='name', features=['lda'],\n", + " method='brute_force', distance='cosine')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's compare these nearest neighbor models by finding the nearest neighbors under each representation on an example document. For this example we'll use Paul Krugman, an American economist:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "model_tf_idf.query(wiki[wiki['name'] == 'Paul Krugman'], label='name', k=10)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "model_lda_rep.query(wiki[wiki['name'] == 'Paul Krugman'], label='name', k=10)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Notice that that there is no overlap between the two sets of top 10 nearest neighbors. This doesn't necessarily mean that one representation is better or worse than the other, but rather that they are picking out different features of the documents. \n", + "\n", + "With TF-IDF, documents are distinguished by the frequency of uncommon words. Since similarity is defined based on the specific words used in the document, documents that are \"close\" under TF-IDF tend to be similar in terms of specific details. This is what we see in the example: the top 10 nearest neighbors are all economists from the US, UK, or Canada. \n", + "\n", + "Our LDA representation, on the other hand, defines similarity between documents in terms of their topic distributions. This means that documents can be \"close\" if they share similar themes, even though they may not share many of the same keywords. For the article on Paul Krugman, we expect the most important topics to be 'American college and politics' and 'science and research'. As a result, we see that the top 10 nearest neighbors are academics from a wide variety of fields, including literature, anthropology, and religious studies.\n", + "\n", + "\n", + "__Quiz Question:__ Using the TF-IDF representation, compute the 5000 nearest neighbors for American baseball player Alex Rodriguez. For what value of k is Mariano Rivera the k-th nearest neighbor to Alex Rodriguez? (Hint: Once you have a list of the nearest neighbors, you can use `mylist.index(value)` to find the index of the first instance of `value` in `mylist`.)\n", + "\n", + "__Quiz Question:__ Using the LDA representation, compute the 5000 nearest neighbors for American baseball player Alex Rodriguez. For what value of k is Mariano Rivera the k-th nearest neighbor to Alex Rodriguez? (Hint: Once you have a list of the nearest neighbors, you can use `mylist.index(value)` to find the index of the first instance of `value` in `mylist`.)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Understanding the role of LDA model hyperparameters\n", + "\n", + "Finally, we'll take a look at the effect of the LDA model hyperparameters alpha and gamma on the characteristics of our fitted model. Recall that alpha is a parameter of the prior distribution over topic weights in each document, while gamma is a parameter of the prior distribution over word weights in each topic. \n", + "\n", + "In the video lectures, we saw that alpha and gamma can be thought of as smoothing parameters when we compute how much each document \"likes\" a topic (in the case of alpha) or how much each topic \"likes\" a word (in the case of gamma). In both cases, these parameters serve to reduce the differences across topics or words in terms of these calculated preferences; alpha makes the document preferences \"smoother\" over topics, and gamma makes the topic preferences \"smoother\" over words.\n", + "\n", + "Our goal in this section will be to understand how changing these parameter values affects the characteristics of the resulting topic model.\n", + "\n", + "__Quiz Question:__ What was the value of alpha used to fit our original topic model? " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "__Quiz Question:__ What was the value of gamma used to fit our original topic model? Remember that GraphLab Create uses \"beta\" instead of \"gamma\" to refer to the hyperparameter that influences topic distributions over words." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We'll start by loading some topic models that have been trained using different settings of alpha and gamma. Specifically, we will start by comparing the following two models to our original topic model:\n", + " - tpm_low_alpha, a model trained with alpha = 1 and default gamma\n", + " - tpm_high_alpha, a model trained with alpha = 50 and default gamma" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "tpm_low_alpha = gl.load_model('lda_low_alpha')\n", + "tpm_high_alpha = gl.load_model('lda_high_alpha')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Changing the hyperparameter alpha\n", + "\n", + "Since alpha is responsible for smoothing document preferences over topics, the impact of changing its value should be visible when we plot the distribution of topic weights for the same document under models fit with different alpha values. In the code below, we plot the (sorted) topic weights for the Wikipedia article on Barack Obama under models fit with high, original, and low settings of alpha." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "a = np.sort(tpm_low_alpha.predict(obama,output_type='probability')[0])[::-1]\n", + "b = np.sort(topic_model.predict(obama,output_type='probability')[0])[::-1]\n", + "c = np.sort(tpm_high_alpha.predict(obama,output_type='probability')[0])[::-1]\n", + "ind = np.arange(len(a))\n", + "width = 0.3\n", + "\n", + "def param_bar_plot(a,b,c,ind,width,ylim,param,xlab,ylab):\n", + " fig = plt.figure()\n", + " ax = fig.add_subplot(111)\n", + "\n", + " b1 = ax.bar(ind, a, width, color='lightskyblue')\n", + " b2 = ax.bar(ind+width, b, width, color='lightcoral')\n", + " b3 = ax.bar(ind+(2*width), c, width, color='gold')\n", + "\n", + " ax.set_xticks(ind+width)\n", + " ax.set_xticklabels(range(10))\n", + " ax.set_ylabel(ylab)\n", + " ax.set_xlabel(xlab)\n", + " ax.set_ylim(0,ylim)\n", + " ax.legend(handles = [b1,b2,b3],labels=['low '+param,'original model','high '+param])\n", + "\n", + " plt.tight_layout()\n", + " \n", + "param_bar_plot(a,b,c,ind,width,ylim=1.0,param='alpha',\n", + " xlab='Topics (sorted by weight of top 100 words)',ylab='Topic Probability for Obama Article')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here we can clearly see the smoothing enforced by the alpha parameter - notice that when alpha is low most of the weight in the topic distribution for this article goes to a single topic, but when alpha is high the weight is much more evenly distributed across the topics.\n", + "\n", + "__Quiz Question:__ How many topics are assigned a weight greater than 0.3 or less than 0.05 for the article on Paul Krugman in the **low alpha** model? Use the average results from 100 topic predictions." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "__Quiz Question:__ How many topics are assigned a weight greater than 0.3 or less than 0.05 for the article on Paul Krugman in the **high alpha** model? Use the average results from 100 topic predictions." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Changing the hyperparameter gamma\n", + "\n", + "Just as we were able to see the effect of alpha by plotting topic weights for a document, we expect to be able to visualize the impact of changing gamma by plotting word weights for each topic. In this case, however, there are far too many words in our vocabulary to do this effectively. Instead, we'll plot the total weight of the top 100 words and bottom 1000 words for each topic. Below, we plot the (sorted) total weights of the top 100 words and bottom 1000 from each topic in the high, original, and low gamma models." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now we will consider the following two models:\n", + " - tpm_low_gamma, a model trained with gamma = 0.02 and default alpha\n", + " - tpm_high_gamma, a model trained with gamma = 0.5 and default alpha" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "del tpm_low_alpha\n", + "del tpm_high_alpha\n", + "tpm_low_gamma = gl.load_model('lda_low_gamma')\n", + "tpm_high_gamma = gl.load_model('lda_high_gamma')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "a_top = np.sort([sum(tpm_low_gamma.get_topics(topic_ids=[i], num_words=100)['score']) for i in range(10)])[::-1]\n", + "b_top = np.sort([sum(topic_model.get_topics(topic_ids=[i], num_words=100)['score']) for i in range(10)])[::-1]\n", + "c_top = np.sort([sum(tpm_high_gamma.get_topics(topic_ids=[i], num_words=100)['score']) for i in range(10)])[::-1]\n", + "\n", + "a_bot = np.sort([sum(tpm_low_gamma.get_topics(topic_ids=[i], num_words=547462)[-1000:]['score']) for i in range(10)])[::-1]\n", + "b_bot = np.sort([sum(topic_model.get_topics(topic_ids=[i], num_words=547462)[-1000:]['score']) for i in range(10)])[::-1]\n", + "c_bot = np.sort([sum(tpm_high_gamma.get_topics(topic_ids=[i], num_words=547462)[-1000:]['score']) for i in range(10)])[::-1]\n", + "\n", + "ind = np.arange(len(a))\n", + "width = 0.3\n", + " \n", + "param_bar_plot(a_top, b_top, c_top, ind, width, ylim=0.6, param='gamma',\n", + " xlab='Topics (sorted by weight of top 100 words)', \n", + " ylab='Total Probability of Top 100 Words')\n", + "\n", + "param_bar_plot(a_bot, b_bot, c_bot, ind, width, ylim=0.0002, param='gamma',\n", + " xlab='Topics (sorted by weight of bottom 1000 words)',\n", + " ylab='Total Probability of Bottom 1000 Words')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "collapsed": true + }, + "source": [ + "From these two plots we can see that the low gamma model results in higher weight placed on the top words and lower weight placed on the bottom words for each topic, while the high gamma model places relatively less weight on the top words and more weight on the bottom words. Thus increasing gamma results in topics that have a smoother distribution of weight across all the words in the vocabulary." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "__Quiz Question:__ For each topic of the **low gamma model**, compute the number of words required to make a list with total probability 0.5. What is the average number of words required across all topics? (HINT: use the get_topics() function from GraphLab Create with the _cdf_\\__cutoff_ argument)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "__Quiz Question:__ For each topic of the **high gamma model**, compute the number of words required to make a list with total probability 0.5. What is the average number of words required across all topics? (HINT: use the get_topics() function from GraphLab Create with the _cdf_\\__cutoff_ argument)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We have now seen how the hyperparameters alpha and gamma influence the characteristics of our LDA topic model, but we haven't said anything about what settings of alpha or gamma are best. We know that these parameters are responsible for controlling the smoothness of the topic distributions for documents and word distributions for topics, but there's no simple conversion between smoothness of these distributions and quality of the topic model. In reality, there is no universally \"best\" choice for these parameters. Instead, finding a good topic model requires that we be able to both explore the output (as we did by looking at the topics and checking some topic predictions for documents) and understand the impact of hyperparameter settings (as we have in this section)." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 2", + "language": "python", + "name": "python2" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 2 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython2", + "version": "2.7.11" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git a/machine_learning/4_clustering_and_retrieval/assigment/week5/5_lda_graphlab.ipynb b/machine_learning/4_clustering_and_retrieval/assigment/week5/5_lda_graphlab.ipynb new file mode 100644 index 0000000..14273eb --- /dev/null +++ b/machine_learning/4_clustering_and_retrieval/assigment/week5/5_lda_graphlab.ipynb @@ -0,0 +1,2406 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Latent Dirichlet Allocation for Text Data\n", + "\n", + "In this assignment you will\n", + "\n", + "* apply standard preprocessing techniques on Wikipedia text data\n", + "* use GraphLab Create to fit a Latent Dirichlet allocation (LDA) model\n", + "* explore and interpret the results, including topic keywords and topic assignments for documents\n", + "\n", + "Recall that a major feature distinguishing the LDA model from our previously explored methods is the notion of *mixed membership*. Throughout the course so far, our models have assumed that each data point belongs to a single cluster. k-means determines membership simply by shortest distance to the cluster center, and Gaussian mixture models suppose that each data point is drawn from one of their component mixture distributions. In many cases, though, it is more realistic to think of data as genuinely belonging to more than one cluster or category - for example, if we have a model for text data that includes both \"Politics\" and \"World News\" categories, then an article about a recent meeting of the United Nations should have membership in both categories rather than being forced into just one.\n", + "\n", + "With this in mind, we will use GraphLab Create tools to fit an LDA model to a corpus of Wikipedia articles and examine the results to analyze the impact of a mixed membership approach. In particular, we want to identify the topics discovered by the model in terms of their most important words, and we want to use the model to predict the topic membership distribution for a given document. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Note to Amazon EC2 users**: To conserve memory, make sure to stop all the other notebooks before running this notebook." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Text Data Preprocessing\n", + "We'll start by importing our familiar Wikipedia dataset.\n", + "\n", + "The following code block will check if you have the correct version of GraphLab Create. Any version later than 1.8.5 will do. To upgrade, read [this page](https://turi.com/download/upgrade-graphlab-create.html)." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "import graphlab as gl\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt \n", + "\n", + "%matplotlib inline\n", + "\n", + "'''Check GraphLab Create version'''\n", + "from distutils.version import StrictVersion\n", + "assert (StrictVersion(gl.version) >= StrictVersion('1.8.5')), 'GraphLab Create must be version 1.8.5 or later.'" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
URInametext
<http://dbpedia.org/resou
rce/Digby_Morrell> ...
Digby Morrelldigby morrell born 10
october 1979 is a former ...
<http://dbpedia.org/resou
rce/Alfred_J._Lewy> ...
Alfred J. Lewyalfred j lewy aka sandy
lewy graduated from ...
<http://dbpedia.org/resou
rce/Harpdog_Brown> ...
Harpdog Brownharpdog brown is a singer
and harmonica player who ...
<http://dbpedia.org/resou
rce/Franz_Rottensteiner> ...
Franz Rottensteinerfranz rottensteiner born
in waidmannsfeld lower ...
<http://dbpedia.org/resou
rce/G-Enka> ...
G-Enkahenry krvits born 30
december 1974 in tallinn ...
<http://dbpedia.org/resou
rce/Sam_Henderson> ...
Sam Hendersonsam henderson born
october 18 1969 is an ...
<http://dbpedia.org/resou
rce/Aaron_LaCrate> ...
Aaron LaCrateaaron lacrate is an
american music producer ...
<http://dbpedia.org/resou
rce/Trevor_Ferguson> ...
Trevor Fergusontrevor ferguson aka john
farrow born 11 november ...
<http://dbpedia.org/resou
rce/Grant_Nelson> ...
Grant Nelsongrant nelson born 27
april 1971 in london ...
<http://dbpedia.org/resou
rce/Cathy_Caruth> ...
Cathy Caruthcathy caruth born 1955 is
frank h t rhodes ...
\n", + "[59071 rows x 3 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.\n", + "
" + ], + "text/plain": [ + "Columns:\n", + "\tURI\tstr\n", + "\tname\tstr\n", + "\ttext\tstr\n", + "\n", + "Rows: 59071\n", + "\n", + "Data:\n", + "+-------------------------------+---------------------+\n", + "| URI | name |\n", + "+-------------------------------+---------------------+\n", + "| Learning a topic model" + ], + "text/plain": [ + "Learning a topic model" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
       Number of documents     59071
" + ], + "text/plain": [ + " Number of documents 59071" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
           Vocabulary size    547462
" + ], + "text/plain": [ + " Vocabulary size 547462" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
   Running collapsed Gibbs sampling
" + ], + "text/plain": [ + " Running collapsed Gibbs sampling" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
+-----------+---------------+----------------+-----------------+
" + ], + "text/plain": [ + "+-----------+---------------+----------------+-----------------+" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| Iteration | Elapsed Time  | Tokens/Second  | Est. Perplexity |
" + ], + "text/plain": [ + "| Iteration | Elapsed Time | Tokens/Second | Est. Perplexity |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
+-----------+---------------+----------------+-----------------+
" + ], + "text/plain": [ + "+-----------+---------------+----------------+-----------------+" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| 10        | 6.24s         | 1.4602e+07     | 0               |
" + ], + "text/plain": [ + "| 10 | 6.24s | 1.4602e+07 | 0 |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| 20        | 11.64s        | 1.47799e+07    | 0               |
" + ], + "text/plain": [ + "| 20 | 11.64s | 1.47799e+07 | 0 |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| 30        | 16.98s        | 1.50132e+07    | 0               |
" + ], + "text/plain": [ + "| 30 | 16.98s | 1.50132e+07 | 0 |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| 40        | 22.37s        | 1.53911e+07    | 0               |
" + ], + "text/plain": [ + "| 40 | 22.37s | 1.53911e+07 | 0 |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| 50        | 27.87s        | 1.49653e+07    | 0               |
" + ], + "text/plain": [ + "| 50 | 27.87s | 1.49653e+07 | 0 |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| 60        | 34.00s        | 1.38525e+07    | 0               |
" + ], + "text/plain": [ + "| 60 | 34.00s | 1.38525e+07 | 0 |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| 70        | 39.98s        | 1.46159e+07    | 0               |
" + ], + "text/plain": [ + "| 70 | 39.98s | 1.46159e+07 | 0 |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| 80        | 45.59s        | 1.33147e+07    | 0               |
" + ], + "text/plain": [ + "| 80 | 45.59s | 1.33147e+07 | 0 |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| 90        | 51.41s        | 1.1672e+07     | 0               |
" + ], + "text/plain": [ + "| 90 | 51.41s | 1.1672e+07 | 0 |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| 100       | 57.79s        | 1.27921e+07    | 0               |
" + ], + "text/plain": [ + "| 100 | 57.79s | 1.27921e+07 | 0 |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| 110       | 1m 4s         | 1.36326e+07    | 0               |
" + ], + "text/plain": [ + "| 110 | 1m 4s | 1.36326e+07 | 0 |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| 120       | 1m 10s        | 1.29229e+07    | 0               |
" + ], + "text/plain": [ + "| 120 | 1m 10s | 1.29229e+07 | 0 |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| 130       | 1m 17s        | 1.16539e+07    | 0               |
" + ], + "text/plain": [ + "| 130 | 1m 17s | 1.16539e+07 | 0 |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| 140       | 1m 23s        | 1.16374e+07    | 0               |
" + ], + "text/plain": [ + "| 140 | 1m 23s | 1.16374e+07 | 0 |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| 150       | 1m 29s        | 1.37665e+07    | 0               |
" + ], + "text/plain": [ + "| 150 | 1m 29s | 1.37665e+07 | 0 |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| 160       | 1m 35s        | 1.41662e+07    | 0               |
" + ], + "text/plain": [ + "| 160 | 1m 35s | 1.41662e+07 | 0 |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| 170       | 1m 41s        | 1.08636e+07    | 0               |
" + ], + "text/plain": [ + "| 170 | 1m 41s | 1.08636e+07 | 0 |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| 180       | 1m 49s        | 1.142e+07      | 0               |
" + ], + "text/plain": [ + "| 180 | 1m 49s | 1.142e+07 | 0 |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| 190       | 1m 55s        | 1.24898e+07    | 0               |
" + ], + "text/plain": [ + "| 190 | 1m 55s | 1.24898e+07 | 0 |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| 200       | 2m 2s         | 1.34335e+07    | 0               |
" + ], + "text/plain": [ + "| 200 | 2m 2s | 1.34335e+07 | 0 |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
+-----------+---------------+----------------+-----------------+
" + ], + "text/plain": [ + "+-----------+---------------+----------------+-----------------+" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "topic_model = gl.topic_model.create(wiki_docs, num_topics=10, num_iterations=200)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "GraphLab provides a useful summary of the model we have fitted, including the hyperparameter settings for alpha, gamma (note that GraphLab Create calls this parameter beta), and K (the number of topics); the structure of the output data; and some useful methods for understanding the results." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "text/plain": [ + "Class : TopicModel\n", + "\n", + "Schema\n", + "------\n", + "Vocabulary Size : 547462\n", + "\n", + "Settings\n", + "--------\n", + "Number of Topics : 10\n", + "alpha : 5.0\n", + "beta : 0.1\n", + "Iterations : 200\n", + "Training time : 123.0197\n", + "Verbose : False\n", + "\n", + "Accessible fields : \n", + "m['topics'] : An SFrame containing the topics.\n", + "m['vocabulary'] : An SArray containing the words in the vocabulary.\n", + "Useful methods : \n", + "m.get_topics() : Get the most probable words per topic.\n", + "m.predict(new_docs) : Make predictions for new documents." + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "topic_model" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "It is certainly useful to have pre-implemented methods available for LDA, but as with our previous methods for clustering and retrieval, implementing and fitting the model gets us only halfway towards our objective. We now need to analyze the fitted model to understand what it has done with our data and whether it will be useful as a document classification system. This can be a challenging task in itself, particularly when the model that we use is complex. We will begin by outlining a sequence of objectives that will help us understand our model in detail. In particular, we will\n", + "\n", + "* get the top words in each topic and use these to identify topic themes\n", + "* predict topic distributions for some example documents\n", + "* compare the quality of LDA \"nearest neighbors\" to the NN output from the first assignment\n", + "* understand the role of model hyperparameters alpha and gamma" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Load a fitted topic model\n", + "The method used to fit the LDA model is a _randomized algorithm_, which means that it involves steps that are random; in this case, the randomness comes from Gibbs sampling, as discussed in the LDA video lectures. Because of these random steps, the algorithm will be expected to yield slighty different output for different runs on the same data - note that this is different from previously seen algorithms such as k-means or EM, which will always produce the same results given the same input and initialization.\n", + "\n", + "It is important to understand that variation in the results is a fundamental feature of randomized methods. However, in the context of this assignment this variation makes it difficult to evaluate the correctness of your analysis, so we will load and analyze a pre-trained model. \n", + "\n", + "We recommend that you spend some time exploring your own fitted topic model and compare our analysis of the pre-trained model to the same analysis applied to the model you trained above." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": { + "collapsed": false, + "scrolled": true + }, + "outputs": [], + "source": [ + "topic_model = gl.load_model('topic_models/lda_assignment_topic_model')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Identifying topic themes by top words\n", + "\n", + "We'll start by trying to identify the topics learned by our model with some major themes. As a preliminary check on the results of applying this method, it is reasonable to hope that the model has been able to learn topics that correspond to recognizable categories. In order to do this, we must first recall what exactly a 'topic' is in the context of LDA. \n", + "\n", + "In the video lectures on LDA we learned that a topic is a probability distribution over words in the vocabulary; that is, each topic assigns a particular probability to every one of the unique words that appears in our data. Different topics will assign different probabilities to the same word: for instance, a topic that ends up describing science and technology articles might place more probability on the word 'university' than a topic that describes sports or politics. Looking at the highest probability words in each topic will thus give us a sense of its major themes. Ideally we would find that each topic is identifiable with some clear theme _and_ that all the topics are relatively distinct.\n", + "\n", + "We can use the GraphLab Create function get_topics() to view the top words (along with their associated probabilities) from each topic.\n", + "\n", + "__Quiz Question:__ Identify the top 3 most probable words for the first topic. " + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
topicwordscore
0university0.0337723780773
0research0.0120334992502
0professor0.0118011432268
\n", + "[3 rows x 3 columns]
\n", + "
" + ], + "text/plain": [ + "Columns:\n", + "\ttopic\tint\n", + "\tword\tstr\n", + "\tscore\tfloat\n", + "\n", + "Rows: 3\n", + "\n", + "Data:\n", + "+-------+------------+-----------------+\n", + "| topic | word | score |\n", + "+-------+------------+-----------------+\n", + "| 0 | university | 0.0337723780773 |\n", + "| 0 | research | 0.0120334992502 |\n", + "| 0 | professor | 0.0118011432268 |\n", + "+-------+------------+-----------------+\n", + "[3 rows x 3 columns]" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "topic_model.get_topics([0], num_words=3)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "__ Quiz Question:__ What is the sum of the probabilities assigned to the top 50 words in the 3rd topic?" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "text/plain": [ + "0.21034366078939654" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "sum(topic_model.get_topics([2], num_words=50)['score'])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's look at the top 10 words for each topic to see if we can identify any themes:" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[['university',\n", + " 'research',\n", + " 'professor',\n", + " 'international',\n", + " 'institute',\n", + " 'science',\n", + " 'society',\n", + " 'studies',\n", + " 'director',\n", + " 'national'],\n", + " ['played',\n", + " 'season',\n", + " 'league',\n", + " 'team',\n", + " 'career',\n", + " 'football',\n", + " 'games',\n", + " 'player',\n", + " 'coach',\n", + " 'game'],\n", + " ['film',\n", + " 'music',\n", + " 'album',\n", + " 'released',\n", + " 'band',\n", + " 'television',\n", + " 'series',\n", + " 'show',\n", + " 'award',\n", + " 'appeared'],\n", + " ['university',\n", + " 'school',\n", + " 'served',\n", + " 'college',\n", + " 'state',\n", + " 'american',\n", + " 'states',\n", + " 'united',\n", + " 'born',\n", + " 'law'],\n", + " ['member',\n", + " 'party',\n", + " 'election',\n", + " 'minister',\n", + " 'government',\n", + " 'elected',\n", + " 'served',\n", + " 'president',\n", + " 'general',\n", + " 'committee'],\n", + " ['work',\n", + " 'art',\n", + " 'book',\n", + " 'published',\n", + " 'york',\n", + " 'magazine',\n", + " 'radio',\n", + " 'books',\n", + " 'award',\n", + " 'arts'],\n", + " ['company',\n", + " 'business',\n", + " 'years',\n", + " 'group',\n", + " 'time',\n", + " 'family',\n", + " 'people',\n", + " 'india',\n", + " 'million',\n", + " 'indian'],\n", + " ['world',\n", + " 'won',\n", + " 'born',\n", + " 'time',\n", + " 'year',\n", + " 'team',\n", + " 'championship',\n", + " 'tour',\n", + " 'championships',\n", + " 'title'],\n", + " ['born',\n", + " 'british',\n", + " 'london',\n", + " 'australian',\n", + " 'south',\n", + " 'joined',\n", + " 'years',\n", + " 'made',\n", + " 'england',\n", + " 'australia'],\n", + " ['music',\n", + " 'de',\n", + " 'born',\n", + " 'international',\n", + " 'la',\n", + " 'orchestra',\n", + " 'opera',\n", + " 'studied',\n", + " 'french',\n", + " 'festival']]" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "[x['words'] for x in topic_model.get_topics(output_type='topic_words', num_words=10)]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We propose the following themes for each topic:\n", + "\n", + "- topic 0: Science and research\n", + "- topic 2: Team sports\n", + "- topic 3: Music, TV, and film\n", + "- topic 4: American college and politics\n", + "- topic 5: General politics\n", + "- topic 6: Art and publishing\n", + "- topic 7: Business\n", + "- topic 8: International athletics\n", + "- topic 9: Great Britain and Australia\n", + "- topic 10: International music\n", + "\n", + "We'll save these themes for later:" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "themes = ['science and research','team sports','music, TV, and film','American college and politics','general politics', \\\n", + " 'art and publishing','Business','international athletics','Great Britain and Australia','international music']" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Measuring the importance of top words\n", + "\n", + "We can learn more about topics by exploring how they place probability mass (which we can think of as a weight) on each of their top words.\n", + "\n", + "We'll do this with two visualizations of the weights for the top words in each topic:\n", + " - the weights of the top 100 words, sorted by the size\n", + " - the total weight of the top 10 words\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here's a plot for the top 100 words by weight in each topic:" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZYAAAEZCAYAAAC0HgObAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzs3Xl8FfX1+P/Xubm5udkTlgQSIKyCgggogqIQRQVXrEtr\n3a1atLX2a3+tW1Vwq1qXun3qVhe0VbB1t+6VoIgiCijKjgSyQCBk32+S8/tjJnCJ2ZB7A4TzfDzu\ngzsz75k5c3O5Z97LzIiqYowxxoSKZ08HYIwxpmuxxGKMMSakLLEYY4wJKUssxhhjQsoSizHGmJCy\nxGKMMSakLLHsY0Rkhoi88BPXvUhEPm1j+TsickFLZUWkXET6t7HudyIy8afE9VOIiF9E3hKREhGZ\n01n73Z+IyBQRWfMT1ntWRP4Yjph+ChGZLiIf7gVxvCgiV+3pODqDJZZOICLZIlIlImUissn9jxez\nG5vcnYuPWl1XVU9S1RdaKquq8aqaDdt/OG5rtu4IVf1kN+LaVWcBPYFkVf1F8AIRecxNhGUiUisi\nde77MhH5byiDEJG+boLbJCKNIpLSbLlfRJ4XkVIRyRWR3zZbPlZElohIhYh8ISLDW9lPpogUNpv3\nQgvznheRB0J1fPyE75qqXqKq94UwhlBo9zhExBf0vSkXkQYRqQya97PdCkD1XFV9dHe2sa+wxNI5\nFDhZVROAMcBhwE0tFRQR6czA9mEZwGpt4QpfVb3STYQJwF+A2aqa4L5ODnEcDcBbwNm0/ON1F9Ab\n6AOcCMxoqtmJiB94HXgcSAb+A7wmIi39v/wCiG2WeI4CtjSbNxGYt6sHISIRu7pOV6OqdU3fG1WN\nBwqA44LmvbanY9xXWGLpPAKgqpuAd4ERACIyV0TuEJH5IlIJDBCR3iLyhohsE5HVInJZs21Fi8hs\n9yzqKxEZuX0nIteJyFp32XcicnqzdT0i8ojbhLRcRI4NWneuiPyqxeCds/GBInI5cB5wrbuPN9zl\n65u2JY7r3Ti2urEmucuims60RaRYRBaKSM9W9jnMjalYRJaJyKnu/JnALcA5bgyXdOQP0GzbZ4rI\n9yJSJCIfiMjgoGWbRORPIrLCjfNxEYlsaTuqmq+qTwJLcP/GzVwAzFTVclVdBjwLXOwuOwGoVtUn\nVDUA3A/E4ySM5vupAb7GSRyISF+gGngjaF4/oC8wv6mMiPzX/R6tFJELg47xLhH5l/u3KQV+ISIx\n7rxiEfkGGN3sM7tZRPLd2tf3IjKhlc/2JRG50X0/RUTWiMgNIrJFRHJE5NyW1nPLJ4vILPdvsEFE\nbglaNtT9PmwTkQIReU5EYoOWZ4jI6+53bouI3Bu0aY+IPOQe2xoRmdxaDMHh0Oxv6n5GT4jIZhHZ\nKCJ3Np0IiMg093t6l/u9WiMipwWt+5qI/CFo+lwR+db9Dq9o7fPcF1li6WTuD8JJwOKg2ecDl+H8\nqGwEZrv/9sI5E/6LiGQGlT8NmINzlvsS8LrsOONcC0xwz9ZvBf4pIqlB644D1gDdgZnAq00/+u1Q\nAFV9CvgX8Ff3LG5aC2WvdmM8GkgDioG/u8suAhKAdKAbcAXOD+RORMSLUxN4D6fJ62rgXyIyRFVn\nsnNN5NkOxB+87YNxfuCvAFKAT4A3ZOeawjlAJjAUp5b5p13Zh7ufXjh/o2+DZn8DNNUwDnKnAXBr\nX8uCljf3CW4Scf/9BCeJTHLnHQ2sUNVt7vS/gRVAKs7JwN9E5Iig7Z0BPKuqicCrOJ9pCtAP5+93\ncdCxjHSnR7rlTwZy2/kImmTgfH96A78DHpfWm4L/hfN96Q8cDkwTt9/Pdasb48HAAcCf3fi8OCds\n3+Mk177AK0HrTQS+xPnO/R/wjw7G3tzdwEB334fj/F/+fdDyYUCVG+PVwIsi0rv5RkTkeOAh4Ar3\n/+oUYNNPjGmvY4ml87wuIkU4PwZzcZpImjynqitVtREnmRwJXKeqAVX9Buc/wYVB5b9W1ddUtQF4\nAPAD4wFU9RVVLXDf/xsniRwetG6Bqj6sqg2q+jKwCudHoj270kQ3Hfizqm5yz8RvA85yf7gDOEnt\nAHUsUdWKFrYxHohV1XtUtV5V5wJvA7/chTha8wvgVVX9VFXrcX5Qe+I0UTZ5UFUL3B/pu37ifuMA\nVLU8aF4ZzglE0/LSZusEL29uHk7ywP33U2ABMCFo3jwAERkCjMT5O9Sr6tfALJwa1Pbtqer7bow1\nOCcxt7m1qw04P8BN6nG+ZyNEJEJVs90yHVGpqne737nXcZLM4OaF3BrX0cD/p6q17vf4EdzPXlVX\nqWqWu50tOD/MwUk1XlX/rKo17vpfBG1+par+y03es4B+IpLQwfiDnQvcpKplqroZ57sR/JlWAn9x\nP/N3gc9wEnhzlwIPq+oC99g2quoPPyGevZIlls4zTVW7qeoAVf2dqtYGLcsJep8GFKlqVdC8DThn\n+D8q7/5HyXXXQ0QuFKczuFhEinHOfnsErZvXLK4NTeuGUAZOX0GRm0yX4ySUVOAF4H1gtjid2XdL\ny+37aez8uTTFmt5C2V2V5m4LADeh5zXbdvDZ+E/9jCoARCQuaF4iUB60vPmPW/Dy5j4DUtxmu4nA\np6paDBQHzWsaQNEb2Nrse9bq90hEBOekpvlxA6Cqy4HrgTuBAnGaM3caqNCGrc2mq3CTbjMZQDSw\n1f3uFAMP4iR9xGkiftn93pTgnHA1fbf7AOvbiGFzs/1LKzG0SkR8ODWejUGzm3+mBe4JX/Dylr47\nfYF1u7L/fYklls7T1hl/cKdvPtAtuO0Yp2kiOCH03b5R5wehD5DvnvE9CfxGVZNVNRmnaSB4381/\nmPu5+9wV7Y2w2Qic6CbSbm4ssW4Npl5Vb1fV4Tg1s1PZuTbWJJ+g4wyKtXli/CnycX7EAHBrUuns\n/KMavO8MOvYZ7fS5uGe0RcAhQbMPwfmb4P67fZn7txwRtHznjTs1u6XAmUCMqjb9wH3qzhvKjsSS\nD/QUkaigTTT//IJH/SnOj2/z4w7e/wuqOgGnKSgauL2lOHdDDlDe7HuTpKpj3eX34iTjg1Q1Caf5\nWILW7R/ieHaiqnXANnb+XDLY+TNNbXai1Nr/rxxgUMiD3EtYYtnLqGouTvPGXeJ0dI/EqTYHDwM+\nVEROd7/A1wA1uKOGgEagUEQ84nRqj2i2i1QR+Z2IeEXkbJw24V0dgluA8+PSmidw+oX6AYhIz6ZO\nTHGGzY5wf8wrcGoyjS1sYyFQJSLXurFmAqfg9CntrjnAz0TkKLdt/gagEKdzvMnVItJLRHoA1+H0\ne7XI/fH24/zI+d0z2yb/BG4RkYSgfoqmPqEPcQZi/Npd5484tZX5bcT+Kc7fPLjMZ+68H9zBIajq\nWpz+mjvEGUY7BieBt3UN1L+BP7uxZgBXBh3jgSIy0Y2zFqdfrKW/20+mznD2L0TkryISJ47BQZ3a\n8TjfmQr3u/WHoNXnA+UicruIRIszzPsIQm82cKuIJLl9J9ex82caC9zgfmen4gzEeLWF7fwDuEpE\njgSnGVBEukyiscTSOdo6w29p2S+BAThnOq8AN7t9DE3ewOknKMbplP2Z2+68Amdk0Rc4Z5/D+fGP\n1BfAEJwf0tuBM1W1ZBfjfBoY7jZXvNrC8ofcGD8QZ8TRAnb08/TCGVZbinNmPpcWfuzcvplTcTpH\nC4FHgQtUdZcv2Gth28twkvWTwBbgGJymyuAfytlubKtwOtjvbb4d2J5Uqt3tKJANlAQVuREnEefi\ndC7PUNVP3ThqgGk4P+DFONfmnN4sjubm4TQNBV/o+qk7r/kw47NxvgObcRLyH1X18za2fRPOGflG\nnIETs4KWReN8t7binKHHAje3sp32arRtLf8lkASsdGOZjdMRDs5IwKNxPt9XcL5HzgadvrKTgFE4\nn/UGoPmIyF2JsbUy1+P8jVfiDAZ4H3g4aPkKIAbn+/AocG5TsmfnGuJHOCcDT4pIOc4gleBBNvs0\n0TA/6MvN2g/iJLGnVfWeFso8jDPGvxK4WFWXuv9hPwF8gBf4j6re6pafAVyO88cDuFFV3wvrgZj9\nhohswkm4C/Z0LGbfISLTgNtVdWS7hbs4bzg37jZ3PApMxjn7XiQib6jqyqAyJwKDVHWIiIzDuVhs\nvKrWisgxqlrlNvl8JiLvquqX7qoPqGoorzA2xhgTAuFuCjscWKOqG9ymjdk4Vf9g04DnAVR1IZDY\ndN1F0MioKJwkGFy9sivUTbjY87qN2Q3hTizp7DxkNJcfj0pqXmb7sE+3A3oJThvxh6q6KKjcVSKy\nVET+ISKJoQ/d7K9UNc2awcyuUtU3rBnMsVd33qtqo6qOxhlOO05EDnIX/R0YqKqjcJKONYkZY8xe\nIqx9LDi1j35B03348XUIeew8dv5HZVS1TETmAlOB5aoafMHVUzgjWH5ERKxJwxhjfgJV/cndDeGu\nsSwCBotzczgfzv2X3mxW5k3cC+REZDxQoqoFItKjqYlLRKKB43GG+DXdg6nJGcB3rQWgqvZSZcaM\nGXs8hr3lZZ+FfRb2WbT92l1hrbGoaoM4D7b5gB3DjVeIyHRnsT6pqu+IyEkishZnuHHTnWp7A7Pc\nkWUeYI6qvuMu+6uIjMK5QCsb595Uxhhj9gLhbgpDnetLhjab90Sz6R89VU2di9jGtLLNlm4BYowx\nZi+wV3fem9DJzMzc0yHsNeyz2ME+ix3sswidsF95vyeJiHbl4zPGmHAQEXQv7rw3xhizn7HEYowx\nJqQssRhjjAkpSyzGGGNCyhKLMcaYkLLEYowxJqQssRhjjAkpSyzGGGNCyhKLMcaYkLLEYowxJqQs\nsRhjjAkpSyzGGGNCyhKLMcaYkLLEYowxJqQssRhjjAkpSyzGGGNCyhKLMcaYkLLEYowxJqQssRhj\njAkpSyzGGGNCyhKLMcaYkAp7YhGRqSKyUkRWi8h1rZR5WETWiMhSERnlzosSkYUiskRElonIjKDy\nySLygYisEpH3RSQx3MdhjDGmY8KaWETEAzwKTAGGA78UkWHNypwIDFLVIcB04HEAVa0FjlHV0cAo\n4EQROdxd7XrgI1UdCnwM3BDO4zDGGNNx4a6xHA6sUdUNqhoAZgPTmpWZBjwPoKoLgUQRSXWnq9wy\nUYAX0KB1ZrnvZwGnh+0IjDHG7JJwJ5Z0ICdoOted11aZvKYyIuIRkSXAZuBDVV3klklR1QIAVd0M\npLQWQGPjbsVvjDFmF3n3dABtUdVGYLSIJACvi8hBqrq8paKtbeOWW2bidY8yMzOTzMzMcIRqjDH7\nrKysLLKyskK2PVFt9Td59zcuMh6YqapT3enrAVXVe4LKPA7MVdU57vRKYFJTjSSo3M1Apao+ICIr\ngExVLRCRXu76B7awfy0tVRISwnaIxhjT5YgIqio/df1wN4UtAgaLSIaI+IBzgDeblXkTuBC2J6IS\nN2H0aBrtJSLRwPHAyqB1LnbfXwS80VoAdXUhOhJjjDEdEtamMFVtEJGrgA9wktjTqrpCRKY7i/VJ\nVX1HRE4SkbVAJXCJu3pvYJY7sswDzFHVd9xl9wAvi8ivgA3Az1uLwRKLMcZ0rrA2he1pIqLr1yv9\n++/pSIwxZt+xtzeF7XFWYzHGmM7V5RNLbe2ejsAYY/YvXT6xWI3FGGM6lyUWY4wxIWWJxRhjTEhZ\nYjHGGBNSXT6xWOe9McZ0ri6fWKzGYowxncsSizHGmJCyxGKMMSakLLEYY4wJKUssxhhjQqrLJxYb\nFWaMMZ2ryycWq7EYY0znssRijDEmpCyxGGOMCSlLLMYYY0KqyycW67w3xpjO1eUTi9VYjDGmc1li\nMcYYE1KWWIwxxoSUJRZjjDEhZYnFGGNMSIU9sYjIVBFZKSKrReS6Vso8LCJrRGSpiIxy5/URkY9F\n5HsRWSYiVweVnyEiuSKy2H1NbW3/NirMGGM6lzecGxcRD/AoMBnIBxaJyBuqujKozInAIFUdIiLj\ngMeB8UA98AdVXSoiccDXIvJB0LoPqOoD7cVgNRZjjOlc4a6xHA6sUdUNqhoAZgPTmpWZBjwPoKoL\ngUQRSVXVzaq61J1fAawA0oPWk44EYInFGGM6V7gTSzqQEzSdy87JoaUyec3LiEh/YBSwMGj2VW7T\n2T9EJLG1ACyxGGNM5wprU1gouM1g/wF+79ZcAP4O3KaqKiJ3AA8Al7a0fnb2TGbOdN5nZmaSmZkZ\n7pCNMWafkpWVRVZWVsi2J6oaso39aOMi44GZqjrVnb4eUFW9J6jM48BcVZ3jTq8EJqlqgYh4gbeB\nd1X1oVb2kQG8paojW1imw4cr330X8kMzxpguS0RQ1Q51N7Qk3E1hi4DBIpIhIj7gHODNZmXeBC6E\n7YmoRFUL3GXPAMubJxUR6RU0eQbQauqwpjBjjOlcYW0KU9UGEbkK+AAniT2tqitEZLqzWJ9U1XdE\n5CQRWQtUAhcDiMgE4DxgmYgsARS4UVXfA/7qDktuBLKB6a3FYInFGGM6V1ibwvY0EdHevZX8/D0d\niTHG7Dv29qawPc5qLMYY07kssRhjjAmpLp9Y7JYuxhjTubp8Yqmrgy7cjWSMMXudLp9YvF6or9/T\nURhjzP6jyycWn8/6WYwxpjNZYjHGGBNSlliMMcaE1H6RWGxkmDHGdJ4un1iioqzGYowxnanLJxZr\nCjPGmM5licUYY0xIWWIxxhgTUvtFYrHOe2OM6TxdPrFY570xxnSuLp9YrCnMGGM6lyUWY4wxIWWJ\nxRhjTEhZYjHGGBNSHUosIvKqiJwsIvtcIrJRYcYY07k6mij+DpwLrBGRu0VkaBhjCikbFWaMMZ2r\nQ4lFVT9S1fOAMUA28JGILBCRS0QkMpwB7i5VSyzGGNOZOty0JSLdgYuBy4AlwEM4iebDsEQWIlu3\nWmIxxpjO1NE+lteAT4EY4FRVPU1V56jq74C4dtadKiIrRWS1iFzXSpmHRWSNiCwVkVHuvD4i8rGI\nfC8iy0Tk6qDyySLygYisEpH3RSSxtf3n51tiMcaYztTRGstTqnqQqt6lqpsARCQKQFUPa20lt7P/\nUWAKMBz4pYgMa1bmRGCQqg4BpgOPu4vqgT+o6nDgCOC3QeteD3ykqkOBj4EbWoshJ8c6740xpjN1\nNLHc0cK8zzuw3uHAGlXdoKoBYDYwrVmZacDzAKq6EEgUkVRV3ayqS935FcAKID1onVnu+1nA6a0F\nUFgIZWUdiNQYY0xIeNtaKCK9cH7Mo0VkNCDuogScZrH2pAM5QdO5OMmmrTJ57ryCoDj6A6OAL9xZ\nKapaAKCqm0UkpbUA+vWD7OwORGqMMSYk2kwsOE1YFwN9gAeC5pcDN4Yppp2ISBzwH+D3qlrZSjFt\nfQsz+eILmDkTMjMzyczMDHmMxhizL8vKyiIrKytk2xPVNn6TmwqJnKmqr+zyxkXGAzNVdao7fT2g\nqnpPUJnHgbmqOsedXglMUtUCEfECbwPvqupDQeusADLdMr3c9Q9sYf/6xz8qs2bBli27Gr0xxuyf\nRARVlfZLtqzNPhYROd99219E/tD81YHtLwIGi0iGiPiAc4A3m5V5E7jQ3d94oKSpmQt4BlgenFSC\n1rnYfX8R8EZrAQwZAsXFUFragWiNMcbstvY672Pdf+OA+BZebVLVBuAq4APge2C2qq4Qkeki8mu3\nzDvAehFZCzwBXAkgIhOA84BjRWSJiCwWkanupu8BjheRVcBk4O5WDyAWuneH+fPbi9YYY0wodKgp\nbF8lIjr73w3cNsPDSSfBvffu6YiMMWbvt7tNYe2NCnu4reWqenVby/cG/4vNJzGxDwsW7OlIjDFm\n/9DeqLCvOyWKMJrj38CwmBTK8nx7OhRjjNkvtJlYVHVWW8v3BZO1B58dkYPvuUF7OhRjjNkvtNcU\n9qCq/j8ReYsWrhVR1dPCFlmIJG1+m6KDjiK6fCA7ru80xhgTLu01hb3g/ntfuAMJl1ez7yI1/xPy\nK51b6IvlFmOMCas2hxur6tfuv/Nw7g1WDBQBn7vz9nq/GPhbSiMfQEXtLsfGGNMJOnrb/JOBdcDD\nOHcrXuvelXivd8Xw66mJ+gyiG6io2NPRGGNM19deU1iT+4FjVHUtgIgMAv4LvBuuwEIlISaabp/f\nwxZ/Peu2VNK9e9KeDskYY7q0jt42v7wpqbh+wLkR5V7P5wPfD6fhia7nd/+9k0Zt3NMhGWNMl9be\nvcLOEJEzgK9E5B0RuVhELgLewrkP2F7P53Me9OWPE4oro7g169Y9HZIxxnRp7TWFnRr0vgCY5L7f\nCkSHJaIQi4pyHk3sj1XOHPobnlk6jjG9xzBtWPPnjRljjAmF9i6QvKSzAgkXn89JLN1jobLaz3/O\n/g+nvHQKE/pNoEdMjz0dnjHGdDkd6rwXET9wKc5z6/1N81X1V2GKK2SaEktcHGwta2Rcn3EM7jaY\nVYWr6NHPEosxxoRaRzvvXwB64TxRch7OEyX3ic77iAhobIT4ONhW7tw8ID0+ndyy3D0cmTHGdE0d\nTSyDVfVmoNK9f9jJwLjwhRU6Ik6tJTFWKC5zRoT1SehDXnneHo7MGGO6po4mloD7b4mIjAASgZTw\nhBR6TmLxUFKxo8aSV2aJxRhjwqGjieVJEUkGbsZ5LPBynKc47hOioiApxkO5e+V9ekI6ueXWFGaM\nMeHQoc57Vf2H+3YeMDB84YSHzwfJMR4q3cTSJ6GP1ViMMSZMOnqvsO4i8oj73PmvReRBEeke7uBC\nxeeDbn4v1RWCqjpNYdbHYowxYdHRprDZwBbgTOAsoBCYE66gQs3ng3i/B62OoLKhgbT4NPLL8+32\nLsYYEwYdTSy9VfV2VV3vvu4AUsMZWCj5fBAZCb5aL1sDAaIjo4n3xVNYVbinQzPGmC6no4nlAxE5\nR0Q87uvnwPvhDCyUoqKcxBJR42VLwBnglp5g17IYY0w4tHcTynIRKQMuB14E6tzXbODX4Q8vNJpq\nLFIdwVb3aV/WgW+MMeHR3hMk41U1wf3Xo6pe9+VR1YSO7EBEporIShFZLSLXtVLmYRFZIyJLRWR0\n0PynRaRARL5tVn6GiOS6gwkWi8jUtmLw+cDrBaojdtRYrAPfGGPCoqMP+kJETgMmupNZqvp2B9bx\n4DxxcjKQDywSkTdUdWVQmROBQao6RETGAY8B493FzwKPAM+3sPkHVPWBdgNvbMTn8xARAQ1VEWwN\nSizWFGaMMaHX0eHGdwO/x7kwcjnwexG5qwOrHg6sUdUNqhrAaUJrfr/6abiJQ1UXAokikupOzweK\nWwurI7GzdSs+H3g8EKgStgQ3hVmNxRhjQq6jnfcnAcer6jOq+gwwFed+Ye1JB3KCpnPdeW2VyWuh\nTEuucpvO/iEiia2WysnB53PeNgSEguodnffWx2KMMaHX4aYwIAkoct+3/kPeOf4O3KaqKiJ3AA/g\n3Nb/R2beey+rVx/ISy+BL2oim0p7Ak6NxZrCjDEGsrKyyMrKCtn2OppY7gKWiMhcnCaoicD1HVgv\nD+gXNN3Hnde8TN92yuxEVbcGTT6F86jkFs2cMIH1/qs59lj49PNGCsqWAdZ5b4wxTTIzM8nMzNw+\nfeutu/cI93abwkREgPk4HeqvAq8AR6hqR668XwQMFpEMEfEB5+DcxDLYm8CF7r7GAyWqWhAcAs36\nU0SkV9DkGcB3rUbgNoXV1TnPZCl0b52f5E8i0BCgvHafeKyMMcbsM9qtsbjNTe+o6sH8OCm0t26D\niFwFfICTxJ5W1RUiMt3d9JOq+o6InCQia4FKYPvjkEXkRSAT6C4iG4EZqvos8FcRGQU0AtnA9FaD\nyM3F181JLAnxwsayRlQVEdnegT8satiuHJYxxpg2dLQpbLGIjFXVRbu6A1V9DxjabN4TzaavamXd\nc1uZf2GHA8jJwdcLamshIU6IrI2kvKGBBK93ewf+sB6WWIwxJlQ6OipsHPCFiKwTkW9FZFnzixb3\nWhs3EhW147n3CXW+nYYcWwe+McaEVkdrLFPCGkU45ecT5W2gri6CuDiIr/OxNRBgMNaBb4wx4dBm\nYhERP3AFMBhYhtNHUt8ZgYWMz0e3us1sJZ24OIip8+10W5eVhSvb2YAxxphd0V5T2CzgMJykciJw\nf9gjCrX6erpVbNjeFBZdG7nTjSjtEcXGGBNa7TWFHeSOBkNEnga+DH9IodXoj2LAhnl8NfhI4uIg\nqiaSLYEawK6+N8aYcGivxhJoerPPNYG5iodmkLHuf9trLN4a7/YbUVrnvTHGhF57ieUQESlzX+XA\nyKb37nNa9nqrxw6gZ+4S5wLJeIioidg+Kiw1NpWi6iICDYF2tmKMMaaj2nseS4T7PJamZ7J4g953\n6Hkse9qS4T3w1lWRXLiGuDigekeNJcITQUpsCpsqNu3ZII0xpgvp6HUs+6wfYmqoj47nkOw3iIuD\nxirP9hoLWHOYMcaEWpdPLOulBGlsZHTe28TFQX2VZ3uNBawD3xhjQq3LJ5Z11TlEVpfSt+w74uKg\nzk0sqgpAn3irsRhjTCh1+cSydtty8qfGEtlYR2L9NqoqhGiPh9J6Z5DbqF6jWJC7YA9HaYwxXUeX\nTyxebyzfn1NDZW8PyVtWUVEBg6Oj+VtuLo2qnHLAKXyw7gOqA9V7OlRjjOkSunxiGZA8GM8rR1Cb\nUUvjyteoqIC3Dj6YD4uLOf2774j0JTGm9xg+/OHDPR2qMcZ0CV0+sQxMHkjhgFiKig+iZOEsiooU\nKYoia9QoMvx+xi5ezJEHnMNrK1/b06EaY0yX0OUTy4CkAWxMgZH9Ukiv9FJfrxx4IBTkeXhkyBB+\nm5bGe96RvLnqLeob98mbCxhjzF6lyyeWgckDWR9TC7W1JG+Nw++v4rjj8ljkPrLsd336UI+X5D6n\n8smGT/ZssMYY0wV0+cQyIGkAP0SUw+bNeH7IIT6mnsmT72DZMmd5hAj3DBxIadrZvLLi9T0brDHG\ndAFdPrEMTB7I+oZtUFQESUkcGLuN2Nil5OTseA7LlG7dGBSbxIuFJduvbzHGGPPTdPnEkpGUwcay\njTT8/CyIjubAiLWoHo/H8/H2MiLC/w07mIreZ/Bp7qI9GK0xxuz7unxi8Xv99IzpSe7px0JxMUNZ\nSULCEaQyVfDTAAAgAElEQVSnz6M66NKVQ+PjGRJRzY1rvrVaizHG7IYun1jqCusYkDyA9QOSweNh\nYNX3eDxjGD16Ht9/v3MCuWfQEL6oi2Xg5/OZsX49Kysr91DUxhiz7+ryiSXnnhwGJg/kh5L1cNxx\nZJQvIxBIRSSKVatW7VT21IzDeLZ3AyVLr2XZtnUc8803XLBiBYVBd0M2xhjTtrAnFhGZKiIrRWS1\niFzXSpmHRWSNiCwVkdFB858WkQIR+bZZ+WQR+UBEVonI+yKS2Nr+Nz2zib6evqwvXg8XXUSf6jVU\nlDVSUzOJbdvm/aj8BYecz9un3Mfn8y7gTxHf0DMykhGLFvGvggJrIjPGmA4Ia2IREQ/wKDAFGA78\nUkSGNStzIjBIVYcA04HHghY/667b3PXAR6o6FPgYuKG1GHpf2pv4ufH8UPIDnHAC8VpGzZLlJCRk\nEhHx48QCMKHfBD771Wf8feH9TKz/jrcOPpi/btzIlatXW3Ixxph2hLvGcjiwRlU3qGoAmA1Ma1Zm\nGvA8gKouBBJFJNWdng8Ut7DdacAs9/0s4PTWAuh3XT/iP4pnVf4q8HopjknHN+8jBg+eRFpaVquJ\nYmDyQB496VGu++g6RsVG89no0XxWVsZj+fkdP3pjjNkPhTuxpAM5QdO57ry2yuS1UKa5FFUtAFDV\nzUBKawUju0dy7NnHsmLrCkprSinuPZzoVUsY0LcfDQ0e8vPXtrqTKYOm0DehL08veZo4r5fXR4zg\n1uxsPi0paSc8Y4zZf3n3dAAh0mr71MyZM2moaCDpf0k82P9BTsoYQULBejyvvkJewSRWr55HevqQ\nFtcVEe457h5OfelUzh95PoOi45g1bBi/WL6cL8eMoY/fH7YDMsaYzpKVlUVWVlbIthfuxJIH9Aua\n7uPOa16mbztlmisQkVRVLRCRXsCW1grOnDkTgLJTytjg3UBVv6NJXrkG7riDwIW/o6QkC7is1R0d\nmnYoxww4hvsX3M+MzBlM7d6dq9PT+cXy5cwfPRoRaSdUY4zZu2VmZpKZmbl9+tZbb92t7YW7KWwR\nMFhEMkTEB5wDvNmszJvAhQAiMh4oaWrmcon7ar7Oxe77i4A32gtkcs/JfLDhA+oyhpBSkwsxMRy8\nuQ6fb167HfJ3HHMHD3/5MJsrNgNwXb9+bAsE+LS0tL3dGmPMfiesiUVVG4CrgA+A74HZqrpCRKaL\nyK/dMu8A60VkLfAE8Jum9UXkRWABcICIbBSRS9xF9wDHi8gqYDJwd3uxjB4zmsbaRnL7RdCrbDXc\ndBNHvvkUDQ311NSsb3PdAckDuGTUJZz36nkUVRchIlyVns4jee1VrIwxZv8jXXn4rIho0/GVLy7n\ngocuoM/xmdx5ye0kbvyO+ikn8ewZ8Rx30fH07/9nnNHRLatvrOf6j67ntZWv8dovXqN/94Po/8UX\nfHvYYdbXYozpUkQEVf3J7fxd/sr7JrEjYzl02aF8XfQenyacAk89hffWWzj031vYlP8GX355IPn5\nT9DQUN3i+l6Pl/tOuI87jrmDyc9P5q3lczg3JYXHbfixMcbsZL+psQB8ctwnnDjxZI58fgEflhwD\nq1ezvn8mm6++kwP/mMDGjfdSV7eZww77qs3tfrP5G8579Twi4wexIeMa8iccjT8iItyHY4wxncJq\nLLugz9g+DA4MY0n3TXDqqfDwwyycMoMeD95C4ZajOfjgN6iuXktd3dY2t3NIr0P45opv+MPIs6gu\n/Y7D37iOLZWtDkwzxpj9yn6VWBKOSGBsznjKU9+Dm26CRx/lZw8fQ1yyl7sP/TfXXRdBbOx4ysoW\ntLutCE8EFxxyAf8adxYFiUfwu3d/1wlHYIwxe7/9K7GMT+DQBSOp6/ceOnAQTJtG1N//Ru9n/sLj\nKTdTvCXA449P4JlnPuO++2Dx4va3Oa1nKtExaSwo2sLbq98O/0EYY8xebr9KLL4UH4c0DofoIs75\n97k8d0of6h99mPpRI/Fm9OGpCc9xww0TGD36M7KzYcoU+PTTtrcZIcKpPXpw4uE389t3fkt5bXmn\nHIsxxuyt9qvEApB8RDIJLy5gQu/j+dy7mdcPFJ77zZHU3D4DbruNfj1HEhX1DQ89VMOVV8Jbb7W/\nzcykJPI8PTh2wLHc9PFN4T8IY4zZi+13iSXhiATiyvpyXPdLeOLUJ/jZDc8zfmUFZ2+8j8Cho4l4\nYhYxMcOoqPiak06Cd99tf5sTExOZX1rK3cfdy8vLX2Zh7sLwH4gxxuyl9svEcoq/gHPPheJiiJg4\nieHZVUQGGjn/1Doa7r2HxJixlJZ+xtixsGkT5OS0vc2ePh99o6LIafDx0NSHOP+18ymtsdu9GGP2\nT10+sUyePJk33niDhoYGAGIPjuW86vUc3rOCqcc2Uu5JQIYO5aV+f2BbZD2XnxtP/LxtlJZ+RkQE\nnHBCx2otmUlJZJWU8PPhP+eEgSdwyRuX2EPBjDH7pS6fWC699FLuvvtuDjnkELKzs/F4PRz4z2H8\nvvtGUldv5ZgepRSUjaD2gfd58ZAXWT0oiVvnv0lJ8aeoaoebwyYlJTHPvSnlA1MeIK88j/s/vz/M\nR2eMMXuf/ebK+0ceeYS7776bt99+m9GjRwNQX69cdHYDVW++wzU8zG8jXyPeV8j3506jW6A/mdVP\nc90tPRk3DrZuBZ+v9X1tqavjgIUL2XbUUUSIsKFkA+P+MY6Xz36ZiRkTO+NwjTEmJHb3yvv9JrEA\nvPrqq1xxxRW88MILTJkyZfv82oISIgf15fO3tpGzBnLmr+C+tOPxf382pwbuZlFxPHfdBcce2/b+\nDvryS1448EAOjY8H4L2173Hpm5ey6PJFpMWnheUYjTEm1OyWLrvgjDPO4LXXXuO8885j3bp12+dH\npSbhOWAIE6K+4pxf+/jT84fwfvcB1I99jmd+mMvP+hZ1uJ9lXtBji6cOnsqVh13JmS+fSW19bTgO\nyRhj9jr7VWIBmDBhAieffDIfffTRzgsmTYJ587ZPDj7jdq5OrSbqtJtY899Slr3S/oWPTR34wW48\n+kbS4tO46p2rrDPfGLNf2O8SCziP4fzR852bJZbYgZM5YmAkKVEbeHnEfC4bcj/fvvoWpV9vo2JZ\nBXVb6n603YmJiXxaWkpDUALxiIfnpj3HgtwFPPH1E+E6JGOM2Wvst4ll7ty5O9cgJk6EBQsgEABA\nJIIhB/2d62K9aOZNvBndly3bbmDJlgwWf3wSX1x1C9W5ZTttt1dUFL18Pr6tqNhpfnxUPK//4nVu\nmXsLv37r1zy9+GmWFSyjobEh7MdqjDGdbb/qvA/Wv39/3nvvPYYNG7Zj5iGHwJNPwrhx22fpxo2M\nueMQ1i+/l1P6X8bGjYXExX3IxZOfIDljDTVRdzJhwvl06+YF4MrVqxHgwtRU+riJxutx8veabWt4\nb+17fJn/JV/kfkGyP5kPLviAJH9S2D4DY4zZVTYqrA1tJZaLL76Y8ePHc8UVV+yYefXVkJ4O1123\nU9lP/3knJ3z9KNPy1zBqdBwJCVBR1EDgn+9R/7O38SXkkpIyg0svPYxvKyqYkZ1NXm0tebW1bA0E\n6OXz0S8qin5+P79MSeGU7t0BuOb9a1iQs4APL/iQRH9i2D4HY4zZFZZY2tBWYnnuued49913mTNn\nzo6Zr74KjzwCH34IXu9O5SddO5iKTSkMSbmZ2IpjqCjxs3V5Dduy69kUHcHxxz/E8ccP4+KLT99p\nvUBjI/l1dWyoqWF1VRUP5+UR4/Fw58CBHJuUxNXvXs2i/EV8cMEHJEQlhPwzMMaYXWWJpQ1tJZbs\n7GzGjRvH5s2bEXE/v/JyOP102LwZ7rsPpk4Fd1le8UaevPssPqz6jmW9PIzsfQixkbFUfl5JZJ94\nFi7OYNKILxnR+zBuvOAeusV0a3G/jarM2bKFGdnZpPp8XJ2ezv8W3cqXuZ9z3sHncWjaoYzpPcaS\njDFmj7HE0oa2EgvAgAEDeOeddzjwwAN3zFR17pX/pz9B//7w7LOQFnRx44MPUvrgPSx+YiZ1g/pT\n9n0Z6/66jrUnRfCvr7Lpd9iLlEktz078f4wfMYX4+LFERMT8aN/1jY28VljIQ7m5bKytZaynkKKS\nNeSWrCWnZB1/OPh07jziih1JzxhjOokllja0l1guueQSxo4dy29+85sfLwwE4M474V//gv/9D/r1\n27Hstddg+nT4/HMYNIiN92yk+KNiHvu2OwuKYxlxwdl8mraEmwdmMLBvNg0Nt9Kz529JT/eQkrK9\nErTd1+XlvFRQQGlDAzWNjWytKefDrTl0j05mWko6l/TqxZGJ1gdjjOkce31iEZGpwIM4Q5ufVtV7\nWijzMHAiUAlcrKpL21pXRGYAlwNb3E3cqKrvtbDdNhPLrFmz+O9//8vLL7/c+gE8+CA89BB89BEM\nGrRj/kMPwQsvwGefQVQUAI2NcMopUFMcIHDQ5ZTIN1y97hZSf/kX6mlk7v33sq50NINPS+CE0yI4\n5hiIjW15t59t/IxT37iKy497imeKavlw5EhGubeKMcaYcNqrE4uIeIDVwGQgH1gEnKOqK4PKnAhc\npaoni8g44CFVHd/Wum5iKVfVB9rZf5uJZcOGDYwdO5aCgoK2m5yeeALuuAOeecYZipyQ4DSZnXGG\nU5N56KHtRSsr4aWXYPacRj7pcT6JyQUcsfGPDIz0EDP4dY5WwffUaeSRwkc1PSg8Kp3jTo7g5JNh\n6NCddztr6Sxu++Q2bvjZB9yRu4VFhx5Kz7buhGmMMSGwu4nF236R3XI4sEZVNwCIyGxgGrAyqMw0\n4HkAVV0oIokikgoMaGfd3e58yMjIIC4ujuXLlzN8+PDWC06f7iSTW26Bb791hiSPHg0jR8Ljj8Oo\nUXDJJYBTA7nsMrjsMg85+c/xh//cx/xe1zK3oZikvJ/xWvEA/jz7Usb7/szBj06k+JNcFtdlcNy9\nvTlopIcbbnBuAiACF426iBWFK/jDP0cjAy+nf9EyDtr0LD2ik+gR04Me0T04deipHDugnbtjGmNM\nJwp3jeVMYIqq/tqdPh84XFWvDirzFnCXqi5wpz8ErsNJLC2u69ZYLgZKga+A/09Vf/TIxvZqLAC/\n+tWvKCoq4swzz2TMmDEMHToUr7eNfFtfD6tXw+LFsHSpcxuYr7+Gs85yRpQNGQLDh0PMzh32y7cu\n59/fvcLfX/+K6vhvqIvK4aAkP7/sPZWjXr4IWZrKistHcddTUXTvDtdeC6edBhERUFhVSGltOb9a\nl0+SNHBhbDmVtdvYVL6JpxY/xYDkAfzl2L8wNn1s238QY4zpgL29KeynJJaPgGtpO7H0BApVVUXk\nDqC3ql7awv51xowZ26czMzPJzMzcqUx+fj4vvPACS5YsYfHixWzatInDDz+co446iqOOOopJkybh\na6/56ZFH4MYb4bjjIDvbeZ7x9dfDFVeA379T0bIyOPpoOPvccvpPfoinFj/J0q25HJvYkyErjyTz\nssvJ/n44sx7pS+FWD7//PVxwASQnQ2l9PectX86npaUcl5zMGT17coDfx7urXuOxhX9jfOpB/O2E\ne+if1L8Dfx1jjHFkZWXtdP/EW2+9da9OLOOBmao61Z2+HtDgDnwReRyYq6pz3OmVwCScxNLmuu78\nDOAtVR3Zwv7brbE0V1xczOeff878+fOZN28eq1at4swzz+Tcc8/l6KOPxuNp5fZq06dDZCQ8+qjT\nXHbTTU6N5sYb4eSToW/f7UVzc2H8eLjySjjqKIhNW8Pb629j8Yr3yKkvY3Ojn9K6WtL9faEwg6Kl\nR1Pz+Z9IT4mhXz84/qwAsccV8mF1IRtqaqhsbKSyoZ6iumoai77i1G4J/N+4C+gVbZ39xphdt7fX\nWCKAVTgd8JuAL4FfquqKoDInAb91O+/HAw+6nfetrisivVR1s7v+NcBYVT23hf3vcmJpLicnh5de\neolZs2Zx9NFH8/jjj7dcsKjIaQJ7800Y6zZJLVgADzzgNJfFxztVlYEDITmZ7yoH8GDWIazYlMSK\nDbF4I4WJk4ShBV8x6sRn6J65itzqUjZUlvFRXg7flzZw/sBJHB19Pe++OZQ33+zBxInRHHwweDxO\nk5kkBFjVZzX/9S5iW3QqMYFCEmpzSazL57jEOK4+9FcM6T5ktz4PY0zXt1cnFtg+ZPghdgwZvltE\npuPUPp50yzwKTMUZbnyJqi5ubV13/vPAKKARyAamq2pBC/ve7cTSpKSkhCFDhjB//nyGNh++1eSF\nF+Bvf4Mvv9z5ljCqsHIlzJ/vVFeKina8iovR4hJyt0Yxt+xQPow6jferT6BEo2gUoREhPh4yDv+c\n0lF/pij2ay7zTqVfbCnR0RWo9qO6+hhKSk6muDidnBzYuBFWVazkoJ8VccDJ9WyMamBxVR2SM4fD\nyOWaw3/LaUNPs4svjTEt2usTy54UysQCcNddd/HNN98we/bslguoOs8vzsiAc8+FCRNav1ClJTU1\nkJdH4/qNFP5zIxvfiqWxIkBk4DPWxGSwwj+Id0cuYdlhL3HIF59RWdqD4uJaSkqUyko/UVG1REQo\nHo/i9TojyyorvSQkRJE6oIGClBLKupfi9S8gacA2zpo0iQEpqQxOiGJwdDT9/X58rTX1GWP2G5ZY\n2hDqxFJZWcngwYN59913GTVqVMuF8vKcIchZWbBkiXMr/iOOgMMPd66B6dvXabvqAFWl5OMSit4r\npHFLGQ2F5TSsyuXeA15kSb887pt1H36vHzxQ7w3QOGYLST8XosYFqK7dQmHhWgoKsikubmDNmmNY\nvvx4Vmw7gGKU+i2gJX7oWYEkePH0qKMxqY6YaCHFF0maP5Jh3fyMGxxFQgLExTmhDxq0a7nSGLPv\nscTShlAnFoBHHnmE999/n7fffrv9wpWV8MUXsHCh81q0CLZudYZ49ewJgwdDZiYcc4xzTUxHEo4q\njbfO5Oe5fyNq0nE8d9ocBEEblYolFeTcl0PF0grSrkwjbnQcUWlReFLLKeU9tha+TFnZF3Tvfgpp\nadPZWj2Iy595kHUbKzmaG9i0No286gClEbVURgaoigqg5RHEbIuhW52fxpoICgshKcm5dOeoo5xK\n2WGHOV1I1rJmTNdgiaUN4UgstbW1HHDAAbz00ksceeSRu76BQAC2bXMSzIoVMHcufPwx5OfvGJos\n4owxvvPOHw1XblJ19+0cl3Mna1Mj6RORTJonkdToHiSmDyK6MQX5ykNKXgopOSmkrEnBU+jB4/Pg\nSS/Dc/LHNB73BhGxPtL7T2deYTHXzvs/bpl0O1cctuPGl42qvJdfyv+tKmBu41akJoKIjTFUfZVA\nWnkCw5Oiyf3Wx7plEQQCTl9QQgJ07w49eji5MynJuaQnNtZJPmlp0KePc41pr14QHf1T/xLGmHCx\nxNKGcCQWgGeeeYbHHnuMOXPmMHDgwNBstKRk+2ORqayEP/4RVq1yboI58kcjqQFoeOF5Chb+j3yp\nIM9TQUHlFspy1lLmF4r69SQ7rp61keVke8rwerx4PV4iI6KIIBLqgEAdHm8N8f4GEqMDxKqPOPWR\n6o0jPaobfXzd6e7z09MXQ0ykl4DfS3m0l21+Dyu3eMkrjcTfK5LCSB/R3oEcFj2aCTGHECiPpbAQ\nCgudw6qqcg6ptNTJn3l5kJMDBQXObdZSUuCgg2DaNDj1VGfaGLPnWGJpQ7gSS319PXfccQePPPII\nZ555Jn/+85/JyMgI7U5U4fnnnQTz8587t/Dv2dN5JSc7VYGkJOe0P7gJTRXWr3ea3YqLobKShvJS\nqnOzCaxZRf3aVQRqq2n0RdIQGUF1ZAwFcWnkJSSQ20vJjy8jP6qMAl8Fhb5qiiMDFHlr8ajQKxBH\nel0yfWqTSfR4UX8DNdKIJyZAZGIFEXGlRPnL8HsjifTF4o9KoNE/kLK4qXgSj2FgbDKn9+xNzyg/\nHvGg6lwwWlDg3Lzg9dfh/fdh2DDncNPSoHdvJ9GkpDiH3q2bU/OJj3eSkjW/GRN6lljaEK7E0qSo\nqIj777+fxx57jJiYGKKjo4mOjiYxMZHU1FRSUlLo27cvU6ZMYfTo0T9teO/69fDvf8OWLTteJSXO\n6X9RETQ0OH00xx4LRx7pXCfT1l2QVaG62rk1TX091NVBRYXzC19a6oxMc5dVVZRQW1FCXXkJRRVb\nWVuxkbW1m1hfv42K8ioiyxqojIylxOenOsJDjUeo90QRQRQVMeVsTdpGTXQleGsJaAO1jUIjSqMq\nghDtjSTRF01CVAzJ/kRSYlPpHt2H+tIMApUJ1JTHUlUaS2VpFOWlPipKfFRt6U1t9mgqyrz4fM7N\nDk47zbmrtNV0jAkNSyxtCHdiaVJVVcW2bduorq6mqqqK0tJStmzZQkFBAWvXruXtt98mEAhw2mmn\nccQRR3DggQcydOhQYmJ+/ACwXZab6/TT/O9/Ti0lO9vplxkwwGlfGjECDj7YGQLdrZvzCtUdkuvr\nnbsMLF4M2dk0rv+BwLo1eNZvpKY4gYKYKWxlCKX1iXi7NSJjv6E2rpFSPxQlKJXxFdRGlFIvFdT5\nS6iOLaQmsogaqaA84KO8zk9lwEdVTTzVNUlU18VR5imkJjKPA/wTGR47kepN/dm4PI2Vi9KIqOpN\nTJSPmBinXychge39PomJO15DhjjXsA4aZDUeY1piiaUNnZVY2qOqLF++nLfeeoslS5awfPly1q5d\ny7hx43j00UcZMWJEKHfmDAz44Qf4/nv47jtYtszp3Ni2zanldOvmjEabPNm5lXJystOc5vE4vezu\n82V2S3U1bNgAK1fSuHQpmz97n4ivswnUJ7GiTwY/JKVQFBlNcXR3KiN74g0kkbotnt5bY+lZFEF0\nZAW1vYqp7b2NhkEb8A5eTXT/VXiiqli3cAqfr+3Lmqj1lPi3UObfRrl/KxX+bfiquuErTcNbmQr1\nsUhDHNTH429IJaohjWhNo6o0hcL8RALlyaQmJRDj9+L3OwMJkpJ2JKLevZ183L+/0zzXq9fufyzG\n7AsssbRhb0ksLQkEAjz99NPcfPPNXH755dx8881Ed8YQKVXn0vymWs78+U7PekOD86qqchJLjx7O\ncOhrroETTwzNqb0qrFvn7Dcry7lZZ2kplJWhBQVopJfaPr2pTEmiBqW2LoKaOi+lDRFsiPTzQ0wy\n9f2EvuNW02fg92wtGklNbU8aAtE0BKIpr4ghoioKqYX6cj81JfFU1VVR3lDBNtlGoW8r26IKKY8q\npTKqgnJ/FdVRlYgK3kA0kXXRxJb2Ia54INGlA/HWptGoiVRVJ7OpJJme0T6OPCiCzAlRDBocS6/e\nMaT1iyMhORZPhF1YaroOSyxt2JsTS5NNmzZxzTXXkJWVRVpaGpGRkURGRhIREbG9TGJiIqNHj2bM\nmDGMHDmS2NhYvF4vERERxMXF7VR2t6lCebkzpGvhQrj7biepXHstHHig04wWFeW0NSUmOv+GKukU\nFTl9Snl5TpJTdV5VVU7yKS2lKucHalcsg/x11AwupibZQ22MUBMn/NBL2DS0DyQnkkIeivCdZxwb\nvGOp8fSiwZsMEd3B2x1fnRBd1khsFSShxGo9voZqioty2VyUTX5ZNpvKtrG1rJJAZAW+mErqGuuo\n0jpqPLU0RNTS6K2ByGrw1OOpjSeiLg5fXRze+ih8DX589X58DdH4GqPxNfrdl4+oBj8+vPgiPPi8\nQlSkkJ4QR0b3JHp060F8fDwRvgi8Pi+R/kh8ST4iEyPxdfORHJ9Mz9iezoWxxoSJJZY27AuJpcna\ntWspKyujrq6OQCBAY2Pj9mWFhYXbb+u/bNkyampqaGhooL6+nurqarp160Zqairp6ekMHjyYIUOG\ncMABBzBs2DD69evX+h2ZO0IV3nnHeTRAQQHU1jqvykqnwz8QcC5cGTLEaS864ADnMn1wEk5kpDMd\n/Goa1hUT4/QH+Xw/LTnV1+8YeFBURP3771L5/NOwaRMLB0dRkRFFYAREDqhF4+rx+AP4fQEC6mF9\nTTfy6tMo9PSlNqoHdd7uVHm7k09fKuX/b+9MYyw7rvv+O3W3t/cyPd2zaZRRLFOLLToEnMh0DCdR\nEEtOLBmIIVIIDMdOAAdQEiUOHEvKB33Ih1hwFtjIIgh2BNtQvMiEbAZ2HEERZNlh5IWULVkiOSQE\nczhb93Cmp/v1e+9uVScf6t3Xr4czwxGnh6S66zc46HvrVd2l5r37v6fqVFWXnAhLhIqhrAzbGtGf\nWI5tVBy5aGFLKEcRxTBmZz2h2EzIR46iqLBqibKSKC0wcY1kI0hGkORTK9CoABSHw+IYs4Nmm2Td\nq0TpCDEWMRakJsZhsERaU2bbDDtbxC5mebTMseExju0cY3W0ykK5wKAaMKgHtKM2WZqRZRnthTb9\n030GZwYMXj/ApAYxghhhsbNIt9PFpIaoGxEfiTFx8L4OO0FYbsM3k7C8XOq65sqVK6yvr3P+/Hme\neeYZnn32Wc6ePcuTTz7J5uYm9913H2tra6RpSpqmHD9+nIceeogHH3zw7ieiLEvfp3P2rB93c/as\n719p6r0RoZ2dXRsOvY3Hu1Foi4t+zpgb7dQpL1xNT3y36yf4jKJbi9HTT/tABue855Pn3hN65hk4\ne5bcXGPr9IjhmYLxiRqXOlwbqj4Uq9C6KHSfFZIXDK422DqidBHPrpzm8dNv4fHTb+FydpSr5gjX\n4wWKOKNdTuiPhnTyHZIqx9QFphqzMFzndfmEN2iLM+YkK3qabnmCUdnmCkc5Xx9jY9zDOmEy8RHi\nFy/6FkPwVbCy4vV5NIKzZxVF+da3DFl7w0Wi5b/A9Z6jaD1Paa6Ry3UmbFLphNqVWC2obIG1BZXN\nqV0BgE7/jZIRxhkW80WO7Bxh7eoaxyfHOaEn6Nd9OrZDp+6QaUZEREJCmqR0l7v0Vnr0j/ZpDVok\ng4RWv0XSSfxA3NQQL8R03twh6uyjRx14RQjCchsOg7C8FFtbWzz11FNcvXqVsiwpy5KzZ8/yyU9+\nkhDMEVUAABMbSURBVLIsefjhhzl16hRZltFqtV70t9vtziyOY4wxGGNm4dX7Ql37J+rzz+/a+fO7\n25ub3isZDvf2B8Wx93gaa4b9HzniY4+PH/eDYZoBMSdO+PSbNR1WFeQ51k4YTb7KcPQlyvwCttik\nLrew9RCnE6xOsDqm1i1KruMoMJqgMLUIXAtnM+o6Y6NY5cnqNF9Mv41n2mcYxW0qk+DqCOdiIucw\n6ujkBb3JmN5kTLsqyayjZR1xBc6Cq7ylZY25lpKvLzG6vsywXGZULVKUPZIyIqsi0toQEUFkwERU\nGjEpY7ZHEZvDiCRWFgeOpQWl13O0B9vEgw1c/yJF53nG6XNMkudJekPizhDSbZwpsVpTa0XtSsq6\nYFJPyF2OxVJTY7EYNV6MbIde0aN/vc8yy6x0V2gnbRJJSExC3/RZSVdYzVZZ7ayytrTG2pE1uke6\nJGsJ6bGUuHevV04P3IogLLchCMutUVWeeOIJHnnkEa5du0ZRFOR5TlEUs+0mfHo0GjEajbDW4pzD\nWst4PAZgeXmZhYUF4jie2dGjRzl58iSnTp3i1KlTnD59mtOnT3PixAm63e7dNc01OOcFqSi8RzKZ\n+ECAq1e9bWz44IBLl7wLcOmS77u5ds03wTV9RUniBSqOfdqJEz4U7PRp7yo089EsLPi0171uT9Sc\ncyXO+WYtUJyrsHYHa4fU9Saj0VfZ2fkSw+ETlOVFn19LnJ1A1Eejo1hdxGqPynUoXIetwrEx2WJj\nvM0LkxHXioTNPGZYtynpYE0bG3VpuzZLRcRSYchcypWlNS4fWePiyipFmrI4HLK0PaQ3GhPXlth6\n0zLGjjNvRYorE2yVYHKhN8zp7+S0titG15e4MjzJ5fFJJnUXdREVMZaY+5Y3eOCNQ+6/X1hYMqSp\nkLWEuK/IkkWWFTsoyOUFtjYucu3iJfJyQmlLSluyXW9zpbrCC/YFXnAvcI1rbJpNWrbFymiFI5tH\nWB2t0pe+b6ZLfJ9T1IqI2zFxO6Yz6NBd6tI70mNpeYmT/ZMcGxzj+OJxup0urbhFbII4vRyCsNyG\nICz3DlVlMpmwubnJ1tYWdV1jraWqKjY2Nrhw4QLnz5/n/PnznDt3jnPnznHhwgXyPKfVatHpdPZ4\nR4PBgLW1NY4dO8bq6ir9fp9er0ev15t5SiJCu91mdXWVtbU1VlZW6HQ631jwQlV5r6csd60ZLFoU\nXnyee87b5qZvrhuP/fa5c/7zZgK0NPXC1JxfxAvUwoLP0wQ3tNveksTnMQYV8Z6P26ZmSLUUU67F\nFKuGcglc6lAqrB1Rlpcpy0sUxSWcm6BqAQvI1AxIhJoFKhmQa5dSIyoVKoUKwaqhrCGvhVITchcz\nsTGlE8paKa1SOaiimDqKqU3CUPoMdYGhLDNMjjJsH8PlFfHVHVpPZWRPD5DnlpFJilQxUsW4Kqau\nEuo6pSwzyrpNXnWoXEIWlWRJSSsuSYxFxFeHGIgig0kE6Q1hsIHrX6LuXESTESYCEUWoicURYzHU\niOaoHePcmDq+zqS7wbi/zrhzFRsXVLFv9os0QjAIQsu2Wa5WOFIfZdkeYcEs0o8X6CcD0ij11wMk\nacRgoc/icp/FIwO6rTYt06IVt+gt91hYXaCbdemlPdrJwZvwLgjLbQjC8trDOUee54xGoz1e0tbW\nFuvr61y+fJmNjQ12dnZmVtc1qopzjvF4zMbGBhsbG1y5coXJZEIURbRardnMB51Oh36/P5v94OjR\no6RpijGGKIrIsoxOpzOzpky73SZNU+I4JkmSWVNgky9JEsQ5H8QwmXiRqirvPTXfs6ryntPW1u5E\naZPJbv4m0q2JerPWi9r6+m7T3/p0zbpGoFZXYW3N/516Swq+qStJIE3Q2GDdCOt2qO0OKhaNpk/u\nSFADagSNFBdbrKmwUYVrC7YX4/oJthvjegmuY1Cj1PUmVXWVqrqKtUOsHWPdGHUTVFIcGZUm1E5x\n6nA4So2YuIxcU0qX+Ae6GKxLmRQ9xnmfUdGntC0qTamIKV1KqRmVZpQ2pSCj0Ba5a1O5BHWG2iY4\nF2OLBFck2EmCK2PvcRUJLo+xRYyWMa5I0CKGIkILQUtBS4MrDVYKXOsK2t7Ata9AawvSLTTbAlMD\nAgoqNZpM0HiEJhNIJkg8gXiCJBOfFk/QbBvUEOVLROUiogYQBAEijIsR9Wam/yKJiFxG5NrELiPW\nFolkJGS0TMZCK2OpnbHUyUgi/0JlJCJLYha6LRZ7bQa9lDSNSBJDEgtxYohN5M9gDHHigzCSOKGX\n9ehlPQbZgG7W9S9iAqZlMMnNWw+CsNyGICwHH1WlqiryPCfPc8bjMePxmO3tbdbX12cC1ETaWWsp\nimLWzDcej/dsV1U1s+Z4o9GI8XhMXdckSUKWZaRpOvO2Gs+r1WrNBKyxJl+r1SKOY0QEEdkjhq1W\naxZi3lhc17TLkk6ec8Q5FsuShaIgFSGJIi9+In4fSIA0jjEiXrCc2w1emLe63hXEJrBiOpaIrS3f\nj7Wz4z2sLPNeWZp6T2w6iFab13rxQQC0p6NLO200idAI1DhcYqDXRnsdtNdGYy9YGoEm4FLBZYJL\nFWfs1CosJZUW5HVOYSuqymHzAlsVuLrG2Rq1NWCRqSEOIosaC8ZC5CACIodLwbWADDRVTOyQxCGx\n82vTTq9JIlAVnBqsRtQuoXIJpUuoXUJRtSjrjFoj1MZIbXB5zKhssZXHDAuDUwMKomCdwSrUDmob\nUZUZZZVR1jFWlQpLpZZaHaWD2imlFca1YVIbcqfTBlb/DHM4alNiTYUzlQ+/kOmn4kAcIj6aUMSB\nWMRUkI5w2RBNd9A4x1Qd4qJPVKdeCEXoFCtc+dgfzn5XdyssoQEy8E2NiMyi3QaDwT09l6pSliVF\nUVCW5Yv6pJpt5xyqOmsabPqrGs+r+axJn0wms3zWWqy1s3x1Xc88t+FwOBPQ+X6wyWQyE78sy+j3\n+yRJMrvuOI7p9/uz5sX5IIy40yEeDGbCJiJEIrSdo5skdOKYXpKQRhGRMSTGkEQRramYtrOMpXab\npSxjKU3JRDCqGOeI6po4zzE7O8ho5MWsEbey3O0by/O9ojcvgrPxTD4oYs9n856itbuC2cwS3niH\nTX9cXYNTpmoC0zrGOcRaX8QIagwaGTQSXAQuUupMsCnYxOEMgPWHRnF6DacWpd79ruCFwBmHE4dK\njYhDjAPjUAN1ItSJeM/SKBo5xCgumgp3BESKRAqxQqQz0VJRJPZG4rAtqNqGqhVRpgl5kpDHGXmU\n4cRQuWUKe5xx2WGzbHM9yRhVLcoqpaxSSPdXCoLHEggcEJqmwuFwiJ0+KMHP8jAcDvc0LTbeWyN+\ndV2/SPgaz64RvmbsVFVVewI7Njc3uXr1KlevXmU8Hs+O2+S11pIkyZ6gjVarxeLiIktLSwwGg9nL\nwY2eW9O3Nm9NWpIks3Lz/XVZls0GEM8fZ/5YAMYY+v0+g8GAwWBAEsdg7UwYM2NIRUiBtKpIy5Ik\nz4mc8w5bI1rzAga7+41XmOdeSOfzFcVuE2ld76bf7HmlijrFOcVVFldZtPIvH7Pm1Dz34j3yIi7j\nHWS0g4xHXjStF2lxFpzD2BoVQaMYjSKKpRU6l56fnTJ4LIFAAPAPyibg4bWEc46qqph/yZtMJly/\nfp3Nzc09A4PLspwFgjjnZt7fjeaco67rWQh948Xt7Ozs8fzmj9NYQ+MNbm9vs729PRNWYCa4jYc6\nv914lE2fXSNwSZLsiY6ct8YbbKxJn59loxHNWXPotOm0ofn8RpGUSJBeDL0FRBZnTa3Ntc33Ic4H\nuhhV34wqwqDb5Yf38f88CEsgELinGGPIbpjYtNVqsbS0xJkzZ16lq7o75sWt6ZMry3Lmqc1b47U1\notV4i025efGc9yTrur7pORuxnBfa5vP54zfnaLzO4XC4R1jn8+33y0hoCgsEAoHAHu62KeyeTwok\nIu8UkadE5KyI/NQt8vyciDwjIn8qIt/xUmVFZElEPiMiT4vI/xaRhXt9H4FAIBC4M+6psIiIAf4z\n8H3AW4H3icibbsjzLuAvq+obgR8HPnYHZT8IfFZV7wM+B3zoXt7HQeDzn//8q30JrxlCXewS6mKX\nUBf7x732WP4q8IyqPqeqFfCrwHtuyPMe4JcAVPUPgQURWXuJsu8BfnG6/YvAD97b2/jmJ/xodgl1\nsUuoi11CXewf91pYTgLPz+2fn6bdSZ7blV1T1XUAVb0MhNXOA4FA4DXCa3HhhZfTYRR66AOBQOC1\nws1ixPfLgLcDvzu3/0Hgp27I8zHgobn9p4C125UFnsR7LQDHgCdvcX4NFixYsGDfuN3Ns/9ej2P5\nY+BbROT1wCXgYeB9N+R5FHg/8Gsi8nbguqqui8gLtyn7KPAPgY8CPwL81s1OfjfhcoFAIBB4edxT\nYVFVKyL/FPgMvtntF1T1SRH5cf+xflxVf0dEvl9EngVGwI/eruz00B8Ffl1Efgx4DnjvvbyPQCAQ\nCNw5B3qAZCAQCAReeV6Lnfd3zZ0MyjyoiMgpEfmciHxVRL4iIv98mn5oB5WKiBGRJ0Tk0en+oawL\nEVkQkU+JyJPT78dfO8R18S9F5M9F5Msi8kkRSQ9LXYjIL4jIuoh8eS7tlvcuIh+aDmB/UkT+zp2c\n48AJy50Myjzg1MBPqOpbge8C3j+9/8M8qPQDwNfm9g9rXfws8Duq+mbgfnygzKGrCxE5Afwz4AFV\nfRu+S+B9HJ66+AT++TjPTe9dRN6C72p4M/Au4L/K/MyYt+DACQt3NijzwKKql1X1T6fbO/gIulMc\n0kGlInIK+H7g5+eSD11diMgA+B5V/QSAqtaqusUhrIspEdAVkRhoAxc4JHWhqn8AbN6QfKt7fzfw\nq9Pvy18Az+CfsbflIArLnQzKPBSIyF8CvgP4Iod3UOl/An4SH0LZcBjr4gzwgoh8Ytos+HER6XAI\n60JVLwL/ATiHF5QtVf0sh7Au5li9xb3f+Dy9wB08Tw+isAQAEekBvwF8YOq53BilceCjNkTk7wLr\nUw/udu77ga8LfHPPA8B/UdUH8BGYH+Rwfi8W8W/orwdO4D2Xf8AhrIvbcFf3fhCF5QJwem7/1DTt\n0DB1738D+GVVbcb4rE/nYENEjgEbr9b1vYJ8N/BuEfk68CvA3xKRXwYuH8K6OA88r6p/Mt1/BC80\nh/F78beBr6vqNVW1wKeBBzmcddFwq3u/ALxuLt8dPU8PorDMBmWKSIofWPnoq3xNrzT/Hfiaqv7s\nXFozqBRuM6j0IKGqH1bV06r6Bvz34HOq+sPA/+Tw1cU68LyIfOs06R3AVzmE3wt8E9jbRaQ17Yh+\nBz644zDVhbDXi7/VvT8KPDyNmjsDfAvwRy958IM4jkVE3omPgGkGVv70q3xJrxgi8t3AF4CvsDs9\nw4fxX4Zfx799PAe8V1Wvv1rX+UojIt8L/CtVfbeILHMI60JE7scHMSTA1/GDkSMOZ118BP+yUQFf\nAv4x0OcQ1IWI/A/gbwBHgHXgI8BvAp/iJvcuIh8C/hG+rj6gqp95yXMcRGEJBAKBwKvHQWwKCwQC\ngcCrSBCWQCAQCOwrQVgCgUAgsK8EYQkEAoHAvhKEJRAIBAL7ShCWQCAQCOwrQVgCgTlE5D82Sw1M\n939XRD4+t//vReRf3MXxPyIiP3G313mLYw/vxXEDgW+UICyBwF7+L356D6ajslfwyy80PAg8dicH\nEpHo5VzAyy3H4Z7bKvAaIghLILCXx5gKC15Q/hwYThfJSoE3AU8AiMjPTBdT+zMRee807XtF5Asi\n8lv4KVMQkX8zXUDpC8B9NzvpdNbh/yYiXwQ+KiLfKSKPicjjIvIHIvLGab4fEZFHROR/TY/50Zsc\na2Va9l37WjOBwB1yT9e8DwS+2VDVSyJSTddxabyTk/hF07aBr6hqLSJ/H3ibqn67iKwCfywivzc9\nzF8B3qqq50TkAfxCSW8DUrwo/Qk356Sqvh1ms1P/dVV1IvIO4N8BPzTNdz9+OYQKeFpEfk5VL0zL\nreLnd/qwqn5u3yomEPgGCMISCLyYx/AzIz+IX7fj1HR/C99UxnT/VwBUdUNEPg98JzAE/khVz03z\nfQ/waVUtgEKmyyPfgk/NbS8CvzT1VJS9v9X/M10KARH5Gn769wt44fos8H5V/f2Xcd+BwL4QmsIC\ngRfTNId9G74p7It4j+W7uHX/yvxMsaOXed75cv8WPxvztwM/ALTmPivmti27olMDjwPvfJnnDwT2\nhSAsgcCLeQz4e8A19WziPYh5Yfl94CERMSJyFO+Z3Gw68S8APygimYj08SJxJwzYXffiR++wjAI/\nBrxJRP71HZYJBPadICyBwIv5Cn5K8f93Q9p1Vb0GoKqfBr4M/Bm++eknVfVFC0Op6peAX5vm/W1u\nvZbFjRFdPwP8tIg8zu1/p/PlVP105e8D/qaI/JPblAsE7hlh2vxAIBAI7CvBYwkEAoHAvhKEJRAI\nBAL7ShCWQCAQCOwrQVgCgUAgsK8EYQkEAoHAvhKEJRAIBAL7ShCWQCAQCOwrQVgCgUAgsK/8f8OX\ni0+RexIkAAAAAElFTkSuQmCC\n", + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "for i in range(10):\n", + " plt.plot(range(100), topic_model.get_topics(topic_ids=[i], num_words=100)['score'])\n", + "plt.xlabel('Word rank')\n", + "plt.ylabel('Probability')\n", + "plt.title('Probabilities of Top 100 Words in each Topic')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In the above plot, each line corresponds to one of our ten topics. Notice how for each topic, the weights drop off sharply as we move down the ranked list of most important words. This shows that the top 10-20 words in each topic are assigned a much greater weight than the remaining words - and remember from the summary of our topic model that our vocabulary has 547462 words in total!\n", + "\n", + "\n", + "Next we plot the total weight assigned by each topic to its top 10 words: " + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYYAAAEZCAYAAACTsIJzAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XucHHWd7vHPk8RwJ4JIZklIgsCC4AoqQhTBXtElgBJ1\nVzcohwV3MbvKkpWjgriaOUePihcWWN2DWSMKclNAzdGo6EKDHDUGSOSWmCASEiCRqwRQCeG7f9Rv\nkqqmZ6Znpqu7Jnner9e8puv+dE1Nf7t+v6puRQRmZmZ9xnQ7gJmZVYsLg5mZFbgwmJlZgQuDmZkV\nuDCYmVmBC4OZmRW4MFSQpG0kPSdpjw5sa7akHw9z2aMlrRxg+kWSPthsXkl3SzpsONsdYkZJulTS\nY5LqZW9vSyVpiaS3D3GZl0l6oKxMw5GOg1d2OcMhku7pZobBuDC0SNJ6SU+kn42Sns6NO2GQZQd8\nAe1HvzeYSPqFpD+kba+T9E1Juw1x/S1tayTLRsQpEfH5ZvNGxD4RsQhA0qclzRtBhoEcBRwGTIyI\nWn6CpN7c3/APkjakx+slLW5nCEkTJF0j6b5U9F/eMH2spC+mF651kj42wLp+IenU3PABaZ35cQdK\nekbS9u18HkMREXdEROlvbsog6TO5Y+OPuWPjCUk3jWTdEXFzRLykXVnL4MLQoojYKSJ2joidgVXA\ncblxlw+yuBj6i68GigO8J2V5KdADfLbpSqSt/W88DbgnIp5pnBARvX1/Q+BfgOvT33OniHh1m3ME\n8F/AO4Cnmkz/IDAd2IeskL1H0jv7WdeNwJG54SOBZQ3jjgBujYinhxJS0tihzL+lioizcsfGmcAP\n+v7/I+J13c5Xtq39RWO4RMMLt6RtJX1J0gPpXeFn07vAXYFrgJfk3oHsIum16Z3fY5LWSDp3iC/i\nAoiIR4HvAC9LOS6XdL6kH0laD0xP27tM0u8k/UbShxrWNVbShZJ+L+kOSUfkntd7JS1LuVdIOqUx\nR3rn/Uha99/kJlwu6eym4aUH0z6YCZwB/F3axi8kndj4rkzS2ZKaFmBJe0r6fsqwXNJJafw/Af8O\n1NK6zxp0rz5/3W9MzSiPSbpJ0sG5aUskfTz9fjTt4x2arScinoiIL/WdJTVxEvDpiHgkIu4FLgBO\n7mfeG8le+PscQfbG4HUN427MZf2ApHvSMXClpBel8RPS2cZ7Jf0GuDmNf5uy5r5HJH26YZ+8TNLP\n0vGyrr+zPUkHpWOwb3hJ+jsuTst+R9KO/TxHJL1T0u1p314nad/ctE9I+m36uy6V9KaGZedI+nWa\nviS/LHC4pLvSei8a7psnScdI+lVazw2SXpabtkzSRyXdlqZfLGm7NO0wSQ/l5v2z9DdZJ+khSV8e\nTp62igj/DPEH+C3whoZxnwVuAHYBXgz8EvhImnY0sKJh/kOAV6XHewErgPem4W2A54A9+tn+z4F3\npce7Az8FLkzDlwMPA4ek4fHAN4Erge2AvYF7gBPS9NnAhvR7LPA/gEeAHdP0NwNT0uM3AE8DL809\nrw3AJ4FxZM02TwFTc1nObrYPgAeB16bHnwbm5aZtDzzet5407i5gRj/74xfA51OGV6X8r8k9v2tb\n+Js+bz5gD2A98Ja0b2YDDwDbpelLgJXp77cj8CPggha2tR54eW54LPAssG9u3BuAVf0s/8I0f99+\nvjcdd7fnxt0HvDk9fjuwGtgvHVtfAxakaRPSsXZ1eg7bAHumv+Ob0j6dCzwDvD0tsxB4X3q8LTC9\nn5wHAU/khpcAt6X17wAsBj7cz7KvT8/hL8jeBJ2Wnp/S9L8FXpSm/T3wGLBzmnYq2f/TAWl4P7Km\nRNJ816X9tTvZ2f87B/l7zenbX7lx04AnyY7rscDpaV3j0/Rl6WcKsHPa5mfTtMOA36XHInut+Hey\n4/4FwOFlv4YNeox2O8Bo/KF5YVgDvD43fDxwV3r8vMLQZJ1nApemx60UhvXAo+mf56vAC9O0y0lF\nIg2PJ/ciksadDixMj2cDv2lY/1Lgr/vZ9g+AU3PP6+m+f4Y07rvA/8xlGXJhSOPmAx9Njw8B1gJj\nmuTZJ2XYJjfuXOA/cs9vuIXhNOCHDePuAo5Pj5eQe2EDXgM81MK2GgvDjunvvXtu3CHAowOsYwlZ\nEZ8G3J7G/d/cuA3AhDT+KuCs3LK7p+3tzObCcFBu+j/3HR9peBzZC2pfYfgu8DnSi+0AGZsVhvfl\nhj8CXNbPspcBH2gY9yDwFwP8Tx6RHv8COLGf+R4DjskNfxn41CDPo1lh+CDwnYZxvwH+Kj1eBvxz\nbtpfAqvT43xheAXZ//G4wY6bTv64Kal9eshepPusAib1N7Okl0paKGmtpN8DHwOG0oH83ojYNSKm\nRMR7IuLx3LTVDbnUMK4x25qGdd9H9m4ZScdLWpSaFB4jO8DzOR+KYvv9qr5lR+hi4N3p8buByyPi\nuSbz7ZEy/KkhQ7/7fgj2SOvKa1x3437dVdILhridP7D5hbrPBLIC0p++foYjyc4YAW4ie6d9JHBH\nRPw+jS88j4j4HfAn+j8G9iD3vCLiWbIX5T7vJzsr/pWkW9V/X0gza3OPnyYris1MBf53aoZ5NB17\nO/VllvSPuWamx4DJbD4u9yQ7K+7PuhYzDKTZsXEf/e/TVWT/i40mA2vSPq4MF4b2eZDsYO4zFbg/\nPW7W8fyfwC3AXhExAfgEA3c4Nxqsc7rPWrIXnSm5cVNy2SA7OGmcruyKlm8C/wvYLSJ2Aa5v2PZu\nksY3LDvUSxSft38i4gZgW2WXtJ4AXNLPsg8AL5a0TWP+IWbob93TGsZNofgPv2fu8VSyd/kbhrKR\niNgILCd7h93nIODOARa7kawIvI7NheGnZEWh0L9A9jw2HZuSdic7k8wfn/m/wYPknpeyDuk/y+Vd\nExEnR0QP8GHgG5JePOgTHZrVZGdju6afXSJix4j4YWrL/yxwUt80sr+Jcsvu3eY8jYZzbKzl+VYD\nkyWNa2u6EXJhaJ8rgLmSdk3/eGez+cVsHbB7Q8fkjsDvI+IPkg4kaxdtu/Ru/tvApyRtL2lvslPj\n/AvtlNT5OFbSiWSF4sdkfRLjgIcgO3sAag2bGA98TNILJL0BeCNZ08VQrCNrp2/0DWAe8HBE3NrP\n87ubrO35k5LGK7tG/ST6LyRD8W3gNZKOS/vmVLK26f/KzfMPkl4iaWfg42THQVMp37ZpcJuGYnYJ\ncJak3STtRdaMddEA2W4ka0Z7C6kwRMR9ZM2Qb6ZYGC4H/knSfqkD9DPA9yLiib5oTZ73kanjfRzw\nUbI+gb7nMUvSxDT4BFlR2djf0x7gOQzky8AZfZ39knaS9NaUZyey5tFHJI2TNIfim5uvAP+a/q+Q\ntH8ub7tcDbxB0l+lY+M0sv+V/H6fLWmKpAnAv9Lk2IiIpWT9VF+QtEM6Rg5vc9Yhc2EYnmZnAB8n\na3++E7iV7J/1cwAR8StgAbAqnfq+kOxKnFMlPUHW8dR40Ax0eetQp80m+wddBfyErD0/f4XPDWxu\n6/wI8LaIWB8Rj5C1pX6PrEP7eOD7Dev+Ldk/6Vqyf8iTI6LvFLvVnFcAO6R9k78a6WKyzseLB1gP\nZJeAHpgyXA58MCJ+Psgyg4qI+8k6bv8P2fM/GTg2ipeAXkL2IrGKrMP8IwOsch1Zp+72ZO3gT6eC\nAlnn+S/IXiQWAV+NiG8NkO0hsrOMP0ZE/l3qTWTNPDfm5r0aOA/4IVlzx45kHbabZmlY92qyvoov\np8zbAnfkZjkCWJqO3a+TvXN/tL+o/W1nIBFxPdnZyFdTU9FdwN9kk+LnZMfEr8jece9K9uagb9mv\nkPW3fCc1015GVkyGlGGQfL8F3kn2d3s4PX5zQ5PmpWT/O/eSnWHM7Wd1byVrgrqX7IzjxHZkHIm+\nHn6zykmXMq4F9m948asESUuAT0TENd3OYtUiaRnZRRgLu51lOHzGYFV2OlCvYlEw25JVqsPDrI+k\nB8maXY7vdpYB+HTb+jOqjw03JZmZWYGbkszMrGCLaEqS5NMeM7MhioimlxNvMWcMI7n9e+7cuV2/\nBb0qOaqQoSo5qpChKjmqkKEqOaqQoR05BrLFFAYzM2sPFwYzMytwYQBqtVq3IwDVyFGFDFCNHFXI\nANXIUYUMUI0cVcgA5ebYIi5XlRRbwvMwM+sUScSW3vlsZmbt4cJgZmYFLgxmZlbgwmBmZgWlFwZJ\nMyQtl7RC0plNpu8n6WeS/ijpjCbTx6SvD1xQdlYzMyu5MEgaA3yR7IvgDwROkLR/w2yPkH35+Of6\nWc0csi/pMDOzDij7jOFQYGVErIrse3CvAGbmZ4iIhyPiFrJvASuQNBk4luybwczMrAPKLgyTyL56\nr8+aNK5V/wZ8iFH+2eZmZqNJZT9dVdJxwLqIWCqpxiBfKt7b27vpca1Wq8zdiWZmVVCv16nX6y3N\nW+qdz5KmA70RMSMNn0X2Zd7nNJl3LrA+Is5Nw58i+1LsZ4HtyL7M+5qIOKnJsr7z2cxsCLp55/Ni\nYB9JUyWNB2YBA11dtClkRJwdEVMi4iVpueuaFQUzM2uvUpuSImKjpNOAa8mK0PyIWCZpdjY55kma\nCNxMdkbwnKQ5wAER8WSZ2czMrDl/iJ6Z2VbIH6JnZmYtc2EwM7MCFwYzMytwYTAzswIXBjMzK3Bh\nMDOzAhcGMzMrcGEwM7MCFwYzMytwYTAzswIXBjMzK3BhMDOzAhcGMzMrcGEwM7MCFwYzMytwYTAz\nswIXBjMzK3BhMDOzAhcGMzMrcGEwM7MCF4YO6emZhqQR/fT0TOv20zCzrUDphUHSDEnLJa2QdGaT\n6ftJ+pmkP0o6Izd+sqTrJN0p6XZJp5edtUzr1q0CYkQ/2TrMzMqliChv5dIYYAVwFPAAsBiYFRHL\nc/PsBkwF3go8FhHnpvE9QE9ELJW0I3ALMDO/bG4dUebzaAdJZC/wI1oLVX+eZjY6SCIi1Gxa2WcM\nhwIrI2JVRGwArgBm5meIiIcj4hbg2YbxayNiaXr8JLAMmFRyXjOzrV7ZhWESsDo3vIZhvLhLmgYc\nDCxqSyozM+vXuG4HGExqRroKmJPOHJrq7e3d9LhWq1Gr1UrPZmY2WtTrder1ekvzlt3HMB3ojYgZ\nafgsICLinCbzzgXW9/UxpHHjgO8BP4iI8wfYjvsYzMyGoJt9DIuBfSRNlTQemAUsGGD+xpBfBe4a\nqCjY6ONLd82qrdQzBsguVwXOJytC8yPiM5Jmk505zJM0EbgZ2Al4DngSOAA4CLgRuJ3N12yeHRE/\nbLINnzGMIt4XZt030BlD6YWhE1wYRhfvC7Pu62ZTkpmZjTIuDGZmVuDCYGZmBS4MZmZW4MJgZmYF\nLgxmZlbgwmBmZgUuDGZmVuDCYGZmBS4MZmZW4MJgZmYFLgxmZlawVRQGf8yzmVnrtopPV63Cp3lW\nIUNVeF+YdZ8/XdXMzFrmwmBmZgUuDGZmVuDCYGZmBS4MZmZW4MJgZmYFLgxmZlZQemGQNEPSckkr\nJJ3ZZPp+kn4m6Y+SzhjKsmZm1n6l3uAmaQywAjgKeABYDMyKiOW5eXYDpgJvBR6LiHNbXTa3Dt/g\nNop4X5h1XzdvcDsUWBkRqyJiA3AFMDM/Q0Q8HBG3AM8OdVkzM2u/sgvDJGB1bnhNGlf2smZmNkzj\nuh2gXXp7ezc9rtVq1Gq1rmUxM6uaer1OvV5vad6y+ximA70RMSMNnwVERJzTZN65wPpcH8NQlnUf\nwyjifWHWfd3sY1gM7CNpqqTxwCxgwQDz50MOdVkzM2uDUpuSImKjpNOAa8mK0PyIWCZpdjY55kma\nCNwM7AQ8J2kOcEBEPNls2TLzmpmZv49hKFtxU1KbeF+YdZ+/j8HMzFrmwmBmZgUuDGZmVuDCYGaV\n0NMzDUkj+unpmdbtp7FFcOdz61tx53ObeF9YMz4uOsudz2Zm1jIXBjMzK3BhMDOzgpYKg6RrJB2X\nviPBzMy2YK2+0P8H8C5gpaTPSNqvxExmZtZFLRWGiPhJRLwbeCVwL/CT9HWcp0h6QZkBzcyss1pu\nGpL0IuBk4B+AJcD5ZIXix6UkMzOzrmjp01UlfRvYD7gEeEtEPJgmXSnp5rLCmZlZ57V0g5ukYyNi\nYcO4bSLiT6UlGwLf4Da6eF9YMz4uOqsdN7h9ssm4nw8/kpmZVdWATUmSeoBJwHaSXsHmb1jbGdi+\n5GxmW7yenmmsW7dqROuYOHEqa9fe255AZgzSlCTp78g6nA8h+5a1PuuBr0XENaWma5GbkkYX74vN\nvC82877orIGaklrtY/jriLi67cnaxIVhdPG+2Mz7YjPvi84aqDAM1pR0YkR8A5gm6YzG6RFxbpsy\nmplZRQx2ueoO6feOZQcxM7NqKP37GCTNAM4juwJqfkSc02SeC4BjgKeAkyNiaRr/AeDvgeeA24FT\nIuKZJsu7KalFVejsrMq+qALvi828Lzpr2H0M6QW7XxFx+iAbHgOsAI4CHgAWA7MiYnlunmOA0yLi\nOEmHAedHxHRJewA3AftHxDOSrgS+HxEXN9mOC8MoylGFDFXhfbGZ90VnDbuPAbhlhNs+FFgZEatS\nkCuAmcDy3DwzgYsBImKRpAmSJqZpY4EdJD1HdnnsAyPMY2ZmgxiwMETE10e4/knA6tzwGrJiMdA8\n9wOTIuJWSV8A7gOeBq6NiJ+MMI+ZmQ1isKuSzouIf5H0/2hyjhcRx5cVTNILyc4mpgK/B66S9K6I\nuKysbZqZ2eBNSZek358f5vrvB6bkhiencY3z7NlknjcC90TEo5B9WRDwWqBpYejt7d30uFarUavV\nhhnZzLZmVbhAowz1ep16vd7SvC1flSRpPLA/2ZnDr5tdHdRkmbHAr8k6nx8EfgmcEBHLcvMcC7w/\ndT5PB85Lnc+HAvOBVwN/Ai4CFkfEl5psx53PoyhHFTJUhffFZlXZF1XJUbaRdD73reA44ELgN2Sf\nl7SXpNkR8YOBlouIjZJOA65l8+WqyyTNzibHvIhYKOlYSXeTXa56Slr2l5KuIvvuhw3p97xW8pqZ\n2fC1+pEYy4E3R8TdaXhvsktH9y85X0t8xjC6clQhQ1V4X2xWlX1RlRxla8fHbq/vKwrJPWQfpGdm\nZluYwa5Kent6eLOkhcA3yUrpO8huVjMzsy3MYGcMb0k/2wLrgNcDNeAhYLtSk5mVrKdnGpJG9NPT\nM63bT8O2QN0+Nkv/rKROcB/D6MpRhQxVyVGFDFVRlX1RhRydyNCOq5K2JfswuwPJzh4AiIj3DC2o\nmZlVXaudz5cAPcDRwA1kN6G589nMbAvUamHYJyI+BjyVPj/pOOCw8mKZmVm3tFoYNqTfj0t6GTAB\n2L2cSGZm1k0t9TEA8yTtAnwMWED2jW4fKy2VmZl1ja9Kan0rlb/KYLTkqEKGquSoQoaqqMq+qEKO\nbl+V1FJTkqQXSfp3SbdKukXSeZJeNMy0ZmZWYa32MVwB/A74a+BvgIeBK8sKZWZm3dPqh+jdEREv\naxh3e0T8RWnJhsBNSaMrRxUyVCVHFTJURVX2RRVyjIqmJOBaSbMkjUk/7wR+NIykZmZWcQOeMUha\nT1a2BOwAPJcmjQGejIidS0/YAp8xjK4cVchQlRxVyFAVVdkXVcjR7TOGAS9XjYidRpjMzMxGmVbv\nY0DS8cCRabAeEd8rJ5KZmXVTq5erfgaYA9yVfuZI+nSZwczMrDtavSrpNuDgiHguDY8FlkTEy0vO\n1xL3MYyuHFXIUJUcVchQFVXZF1XI0e0+hlavSgJ4Ye7xhCEsZ2Zmo0irfQyfBpZIup7sCqUjgbNK\nS2VmZl0z6BmDsnOam4DpwDXA1cBrIqKlO58lzZC0XNIKSWf2M88FklZKWirp4Nz4CZK+JWmZpDsl\n+aO+zcxKNugZQ0SEpIXpLucFQ1m5pDHAF4GjgAeAxZK+GxHLc/McA+wdEfumF/4LyYoQwPnAwoh4\nh6RxwPZD2b6ZmQ1dq30Mt0p69TDWfyiwMiJWRcQGss9cmtkwz0zgYoCIWARMkDRR0s7AERFxUZr2\nbEQ8MYwMZmY2BK32MRwGnCjpXuApsn6GaOGqpEnA6tzwGrJiMdA896dxG4GHJV0EHATcDMyJiD+0\nmNnMzIah1cJwdKkpmhsHvBJ4f0TcLOk8sg7vuV3IYma21RiwMEjaFvhHYB/gdmB+RDw7hPXfD0zJ\nDU9O4xrn2bOfeVZHxM3p8VVA085rgN7e3k2Pa7UatVptCDHNzLZs9Xqder3e0ryDfYjelWTf9/xT\n4BhgVUTMaTVIuhHu12Sdzw8CvwROiIhluXmOJTsrOE7SdOC8iJiept0AnBoRKyTNBbaPiOcVB9/g\nNrpyVCFDVXJUIUNVVGVfVCFHt29wG6wp6YC+71yQNJ/shb1lEbFR0mnAtWQd3fMjYpmk2dnkmBcR\nCyUdK+lusv6LU3KrOB24VNILgHsappmZWQkGO2O4NSJe2d9wVfiMYXTlqEKGquSoQoaqqMq+qEKO\nqp8xHCSp7xJRAdul4b6rkirxfQxmZtY+g30fw9hOBTEzs2oYyofomZnZVsCFwczMClwYzMyswIXB\nzMwKXBjMzKzAhcHM6OmZhqQR/fT0TOv207A2aek7n6vON7iNrhxVyFCVHFXIUJUcVchQlRzdvsHN\nZwxmZlbgwmBmZgUuDGZmVuDCYGZmBS4MZmZW4MJgZmYFLgxmZlbgwmBmZgUuDGZmVuDCYGZmBS4M\nZmZW4MJgZmYFpRcGSTMkLZe0QtKZ/cxzgaSVkpZKOrhh2hhJt0paUHZWMzMruTBIGgN8ETgaOBA4\nQdL+DfMcA+wdEfsCs4ELG1YzB7irzJxmZrZZ2WcMhwIrI2JVRGwArgBmNswzE7gYICIWARMkTQSQ\nNBk4FvhKyTnNzCwpuzBMAlbnhtekcQPNc39unn8DPsTIP5jczMxaNK7bAfoj6ThgXUQslVQDmn6h\nRJ/e3t5Nj2u1GrVarcx4ZmajSr1ep16vtzRvqd/gJmk60BsRM9LwWUBExDm5eS4Ero+IK9PwcuD1\nZH0LJwLPAtsBOwHXRMRJTbbjb3AbRTmqkKEqOaqQoSo5qpChKjm29G9wWwzsI2mqpPHALKDx6qIF\nwEmwqZA8HhHrIuLsiJgSES9Jy13XrCiYmVl7ldqUFBEbJZ0GXEtWhOZHxDJJs7PJMS8iFko6VtLd\nwFPAKWVmMjOzgZXalNQpbkoaXTmqkKEqOaqQoSo5qpChKjm29KYkMzMbZVwYzMyswIXBzMwKXBjM\nzKzAhcHMzApcGMzMrMCFwczMClwYzMyswIXBzMwKXBjMzKzAhcHMzApcGMzMrMCFwczMClwYzMys\nwIXBzMwKXBjMzKzAhcHMzApcGMzMrMCFwczMClwYzMyswIXBzMwKSi8MkmZIWi5phaQz+5nnAkkr\nJS2VdHAaN1nSdZLulHS7pNPLzmpmZiUXBkljgC8CRwMHAidI2r9hnmOAvSNiX2A2cGGa9CxwRkQc\nCLwGeH/jsmZm1n5lnzEcCqyMiFURsQG4ApjZMM9M4GKAiFgETJA0MSLWRsTSNP5JYBkwqeS8ZmZb\nvbILwyRgdW54Dc9/cW+c5/7GeSRNAw4GFrU9oZmZFYzrdoDBSNoRuAqYk84cmurt7d30uFarUavV\nSs9mZjZa1Ot16vV6S/MqIkoLImk60BsRM9LwWUBExDm5eS4Ero+IK9PwcuD1EbFO0jjge8APIuL8\nAbYTAz0PScBIn6cYyb6qQoaq5KhChqrkqEKGquSoQoaq5OhEBklEhJpNK7spaTGwj6SpksYDs4AF\nDfMsAE6CTYXk8YhYl6Z9FbhroKJgZmbtVWpTUkRslHQacC1ZEZofEcskzc4mx7yIWCjpWEl3A08B\nJwNIOhx4N3C7pCVk5fPsiPhhmZnNzLZ2pTYldYqbkkZXjipkqEqOKmSoSo4qZKhKji29KcnMzEYZ\nFwYzMytwYTAzswIXBjMzK3BhMDOzAhcGMzMrcGEwM7MCFwYzMytwYTAzswIXBjMzK3BhMDOzAhcG\nMzMrcGEwM7MCFwYzMytwYTAzswIXBjMzK3BhMDOzAhcGMzMrcGEwM7MCFwYzMysovTBImiFpuaQV\nks7sZ54LJK2UtFTSwUNZ1szM2qvUwiBpDPBF4GjgQOAESfs3zHMMsHdE7AvMBi5sddn2qZez2iGr\ndzsA1cgA1chR73aApN7tAFQjA1QjR73bAZJ6aWsu+4zhUGBlRKyKiA3AFcDMhnlmAhcDRMQiYIKk\niS0u2yb1clY7ZPVuB6AaGaAaOerdDpDUux2AamSAauSodztAUi9tzWUXhknA6tzwmjSulXlaWdbM\nzNqsip3P6nYAM7OtmSKivJVL04HeiJiRhs8CIiLOyc1zIXB9RFyZhpcDrwf2GmzZ3DrKexJmZluo\niGj6RnxcydtdDOwjaSrwIDALOKFhngXA+4ErUyF5PCLWSXq4hWWB/p+cmZkNXamFISI2SjoNuJas\n2Wp+RCyTNDubHPMiYqGkYyXdDTwFnDLQsmXmNTOzkpuSzMxs9Kli53NHVeEmOknzJa2TdFs3tp8y\nTJZ0naQ7Jd0u6fQuZNhG0iJJS1KGuZ3OkMsyRtKtkhZ0McO9kn6V9scvu5hjgqRvSVqWjo/DOrz9\nP0/74Nb0+/fdOD5Tlg9IukPSbZIulTS+CxnmpP+P0v5Pt+ozhnQT3QrgKOABsj6RWRGxvMM5Xgc8\nCVwcES/v5LZzGXqAnohYKmlH4BZgZhf2xfYR8bSkscD/B06PiI6/KEr6APAqYOeIOL7T208Z7gFe\nFRGPdWP7uRxfA26IiIskjQO2j4gnupRlDNml64dFxOrB5m/ztvcAbgL2j4hnJF0JfD8iLu5ghgOB\ny4FXA88CPwD+MSLuaed2tvYzhg7eRNe/iLgJ6Oo/f0SsjYil6fGTwDK6cN9IRDydHm5D1gfW8Xcu\nkiYDxwJf6fS2G6PQ5f9RSTsDR0TERQAR8Wy3ikLyRuA3nS4KOWOBHfoKJNkbyk56KbAoIv4UERuB\nG4G3t3sjW3th8E10TUiaBhwMLOrCtsdIWgKsBX4cEYs7nQH4N+BDdKEoNQjgx5IWSzq1Sxn2Ah6W\ndFFqypmuTZuAAAADi0lEQVQnabsuZQH4W7J3zB0XEQ8AXwDuA+4nu4LyJx2OcQdwhKRdJG1P9gZm\nz3ZvZGsvDNYgNSNdBcxJZw4dFRHPRcQrgMnAYZIO6OT2JR0HrEtnT6K7N1weHhGvJPvnf39qcuy0\nccArgS+lLE8DZ3UhB5JeABwPfKtL238hWYvCVGAPYEdJ7+pkhtS0ew7wY2AhsATY2O7tbO2F4X5g\nSm54chq3VUqnx1cBl0TEd7uZJTVXXA/M6PCmDweOT+37lwN/Kaljbch5EfFg+v0Q8G2yps9OWwOs\njoib0/BVZIWiG44Bbkn7oxveCNwTEY+mZpxrgNd2OkREXBQRh0REDXicrJ+0rbb2wrDpBrx0dcEs\nshvuuqHb704BvgrcFRHnd2PjknaTNCE93g54E9DRzu+IODsipkTES8iOh+si4qROZoCsEz6dvSFp\nB+CvyJoROioi1gGrJf15GnUUcFencyQn0KVmpOQ+YLqkbSWJbF90/N4qSS9Ov6cAbwMua/c2yr7z\nudKqchOdpMuAGvAiSfcBc/s6+zqY4XDg3cDtqY0/gLMj4ocdjPFnwNfTlSdjgCsjYmEHt18lE4Fv\np497GQdcGhHXdinL6cClqSnnHtJNqJ2U2tPfCLy309vuExG/lHQVWfPNhvR7XheiXC1p15ThfWVc\nDLBVX65qZmbPt7U3JZmZWQMXBjMzK3BhMDOzAhcGMzMrcGEwM7MCFwYzMytwYTBrkaRdcx///KCk\nNbnhId0TlD5qfd+yspqNhO9jMBsGSR8HnoyIc7udxazdfMZgNjyFjy+R9OH0xSm3pbvpkbR3+lKX\nyyXdJekKSdukaT+V9PL0+DhJt6Szj07eaW7WlAuD2QhJOpTsc3xeRfahau9LX6gCcABwbkQcAPwJ\nmN2w7ETgP8i+FOkVZJ/PZNZVLgxmI/c64OqIeCZ9VPl3gCPStHty3ynxjTRv3mvIPqhvDUBEPN6J\nwGYDcWEw66xmnXrd/lRdswIXBrOR+ynwNknbpI/KnpnGAewl6VXp8bty4/v8DKilj1BG0i6dCGw2\nkK36Y7fN2iEiFku6HLiZ7IzgSxFxp6S9yT6v/wxJrwBuA/6zb7G07O8k/RPw3ewj/nkAOK7Tz8Es\nz5ermpUkFYarUqey2ajhpiSzcvmdl406PmMwM7MCnzGYmVmBC4OZmRW4MJiZWYELg5mZFbgwmJlZ\ngQuDmZkV/DfkpZeqO72QKwAAAABJRU5ErkJggg==\n", + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "top_probs = [sum(topic_model.get_topics(topic_ids=[i], num_words=10)['score']) for i in range(10)]\n", + "\n", + "ind = np.arange(10)\n", + "width = 0.5\n", + "\n", + "fig, ax = plt.subplots()\n", + "\n", + "ax.bar(ind-(width/2),top_probs,width)\n", + "ax.set_xticks(ind)\n", + "\n", + "plt.xlabel('Topic')\n", + "plt.ylabel('Probability')\n", + "plt.title('Total Probability of Top 10 Words in each Topic')\n", + "plt.xlim(-0.5,9.5)\n", + "plt.ylim(0,0.15)\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here we see that, for our topic model, the top 10 words only account for a small fraction (in this case, between 5% and 13%) of their topic's total probability mass. So while we can use the top words to identify broad themes for each topic, we should keep in mind that in reality these topics are more complex than a simple 10-word summary.\n", + "\n", + "Finally, we observe that some 'junk' words appear highly rated in some topics despite our efforts to remove unhelpful words before fitting the model; for example, the word 'born' appears as a top 10 word in three different topics, but it doesn't help us describe these topics at all." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Topic distributions for some example documents\n", + "\n", + "As we noted in the introduction to this assignment, LDA allows for mixed membership, which means that each document can partially belong to several different topics. For each document, topic membership is expressed as a vector of weights that sum to one; the magnitude of each weight indicates the degree to which the document represents that particular topic.\n", + "\n", + "We'll explore this in our fitted model by looking at the topic distributions for a few example Wikipedia articles from our data set. We should find that these articles have the highest weights on the topics whose themes are most relevant to the subject of the article - for example, we'd expect an article on a politician to place relatively high weight on topics related to government, while an article about an athlete should place higher weight on topics related to sports or competition." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Topic distributions for documents can be obtained using GraphLab Create's predict() function. GraphLab Create uses a collapsed Gibbs sampler similar to the one described in the video lectures, where only the word assignments variables are sampled. To get a document-specific topic proportion vector post-facto, predict() draws this vector from the conditional distribution given the sampled word assignments in the document. Notice that, since these are draws from a _distribution_ over topics that the model has learned, we will get slightly different predictions each time we call this function on a document - we can see this below, where we predict the topic distribution for the article on Barack Obama:" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "+--------------------------+---------------------------+\n", + "| predictions (first draw) | predictions (second draw) |\n", + "+--------------------------+---------------------------+\n", + "| 0.0403225806452 | 0.0456989247312 |\n", + "| 0.0591397849462 | 0.0376344086022 |\n", + "| 0.0215053763441 | 0.0161290322581 |\n", + "| 0.10752688172 | 0.177419354839 |\n", + "| 0.604838709677 | 0.594086021505 |\n", + "| 0.0134408602151 | 0.0161290322581 |\n", + "| 0.0483870967742 | 0.0376344086022 |\n", + "| 0.0510752688172 | 0.0268817204301 |\n", + "| 0.0188172043011 | 0.0215053763441 |\n", + "| 0.0349462365591 | 0.0268817204301 |\n", + "+--------------------------+---------------------------+\n", + "+-------------------------------+\n", + "| topics |\n", + "+-------------------------------+\n", + "| science and research |\n", + "| team sports |\n", + "| music, TV, and film |\n", + "| American college and politics |\n", + "| general politics |\n", + "| art and publishing |\n", + "| Business |\n", + "| international athletics |\n", + "| Great Britain and Australia |\n", + "| international music |\n", + "+-------------------------------+\n", + "[10 rows x 3 columns]\n", + "\n" + ] + } + ], + "source": [ + "obama = gl.SArray([wiki_docs[int(np.where(wiki['name']=='Barack Obama')[0])]])\n", + "pred1 = topic_model.predict(obama, output_type='probability')\n", + "pred2 = topic_model.predict(obama, output_type='probability')\n", + "print(gl.SFrame({'topics':themes, 'predictions (first draw)':pred1[0], 'predictions (second draw)':pred2[0]}))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To get a more robust estimate of the topics for each document, we can average a large number of predictions for the same document:" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "def average_predictions(model, test_document, num_trials=100):\n", + " avg_preds = np.zeros((model.num_topics))\n", + " for i in range(num_trials):\n", + " avg_preds += model.predict(test_document, output_type='probability')[0]\n", + " avg_preds = avg_preds/num_trials\n", + " result = gl.SFrame({'topics':themes, 'average predictions':avg_preds})\n", + " result = result.sort('average predictions', ascending=False)\n", + " return result" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "+---------------------+-------------------------------+\n", + "| average predictions | topics |\n", + "+---------------------+-------------------------------+\n", + "| 0.591397849462 | general politics |\n", + "| 0.137231182796 | American college and politics |\n", + "| 0.0524193548387 | Business |\n", + "| 0.0503494623656 | team sports |\n", + "| 0.0413709677419 | science and research |\n", + "| 0.0343817204301 | international athletics |\n", + "| 0.0254301075269 | Great Britain and Australia |\n", + "| 0.0232258064516 | art and publishing |\n", + "| 0.0231720430108 | international music |\n", + "| 0.0210215053763 | music, TV, and film |\n", + "+---------------------+-------------------------------+\n", + "[10 rows x 2 columns]\n", + "\n" + ] + } + ], + "source": [ + "print average_predictions(topic_model, obama, 100)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "__Quiz Question:__ What is the topic most closely associated with the article about former US President George W. Bush? Use the average results from 100 topic predictions." + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "+---------------------+-------------------------------+\n", + "| average predictions | topics |\n", + "+---------------------+-------------------------------+\n", + "| 0.42769005848 | general politics |\n", + "| 0.190029239766 | American college and politics |\n", + "| 0.102339181287 | Business |\n", + "| 0.0605847953216 | science and research |\n", + "| 0.0466666666667 | art and publishing |\n", + "| 0.045350877193 | team sports |\n", + "| 0.0388888888889 | Great Britain and Australia |\n", + "| 0.0366081871345 | international athletics |\n", + "| 0.0299707602339 | music, TV, and film |\n", + "| 0.0218713450292 | international music |\n", + "+---------------------+-------------------------------+\n", + "[10 rows x 2 columns]\n", + "\n" + ] + } + ], + "source": [ + "george_bush = gl.SArray([wiki_docs[int(np.where(wiki['name']=='George W. Bush')[0])]])\n", + "print average_predictions(topic_model, george_bush, 100)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "__Quiz Question:__ What are the top 3 topics corresponding to the article about English football (soccer) player Steven Gerrard? Use the average results from 100 topic predictions." + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "+---------------------+-------------------------------+\n", + "| average predictions | topics |\n", + "+---------------------+-------------------------------+\n", + "| 0.49124 | team sports |\n", + "| 0.17276 | Great Britain and Australia |\n", + "| 0.12868 | international athletics |\n", + "| 0.0372 | international music |\n", + "| 0.0316 | general politics |\n", + "| 0.03112 | music, TV, and film |\n", + "| 0.02932 | Business |\n", + "| 0.02752 | art and publishing |\n", + "| 0.02644 | American college and politics |\n", + "| 0.02412 | science and research |\n", + "+---------------------+-------------------------------+\n", + "[10 rows x 2 columns]\n", + "\n" + ] + } + ], + "source": [ + "steven_gerrard = gl.SArray([wiki_docs[int(np.where(wiki['name']=='Steven Gerrard')[0])]])\n", + "print average_predictions(topic_model, steven_gerrard, 100)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Comparing LDA to nearest neighbors for document retrieval\n", + "\n", + "So far we have found that our topic model has learned some coherent topics, we have explored these topics as probability distributions over a vocabulary, and we have seen how individual documents in our Wikipedia data set are assigned to these topics in a way that corresponds with our expectations. \n", + "\n", + "In this section, we will use the predicted topic distribution as a representation of each document, similar to how we have previously represented documents by word count or TF-IDF. This gives us a way of computing distances between documents, so that we can run a nearest neighbors search for a given document based on its membership in the topics that we learned from LDA. We can contrast the results with those obtained by running nearest neighbors under the usual TF-IDF representation, an approach that we explored in a previous assignment. \n", + "\n", + "We'll start by creating the LDA topic distribution representation for each document:" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "wiki['lda'] = topic_model.predict(wiki_docs, output_type='probability')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next we add the TF-IDF document representations:" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "wiki['word_count'] = gl.text_analytics.count_words(wiki['text'])\n", + "wiki['tf_idf'] = gl.text_analytics.tf_idf(wiki['word_count'])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For each of our two different document representations, we can use GraphLab Create to compute a brute-force nearest neighbors model:" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "text/html": [ + "
Starting brute force nearest neighbors model training.
" + ], + "text/plain": [ + "Starting brute force nearest neighbors model training." + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
Starting brute force nearest neighbors model training.
" + ], + "text/plain": [ + "Starting brute force nearest neighbors model training." + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "model_tf_idf = gl.nearest_neighbors.create(wiki, label='name', features=['tf_idf'],\n", + " method='brute_force', distance='cosine')\n", + "model_lda_rep = gl.nearest_neighbors.create(wiki, label='name', features=['lda'],\n", + " method='brute_force', distance='cosine')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's compare these nearest neighbor models by finding the nearest neighbors under each representation on an example document. For this example we'll use Paul Krugman, an American economist:" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "text/html": [ + "
Starting pairwise querying.
" + ], + "text/plain": [ + "Starting pairwise querying." + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
+--------------+---------+-------------+--------------+
" + ], + "text/plain": [ + "+--------------+---------+-------------+--------------+" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| Query points | # Pairs | % Complete. | Elapsed Time |
" + ], + "text/plain": [ + "| Query points | # Pairs | % Complete. | Elapsed Time |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
+--------------+---------+-------------+--------------+
" + ], + "text/plain": [ + "+--------------+---------+-------------+--------------+" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| 0            | 1       | 0.00169288  | 16.669ms     |
" + ], + "text/plain": [ + "| 0 | 1 | 0.00169288 | 16.669ms |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| Done         |         | 100         | 324.125ms    |
" + ], + "text/plain": [ + "| Done | | 100 | 324.125ms |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
+--------------+---------+-------------+--------------+
" + ], + "text/plain": [ + "+--------------+---------+-------------+--------------+" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
query_labelreference_labeldistancerank
Paul KrugmanPaul Krugman0.01
Paul KrugmanElise Brezis0.7444980172622
Paul KrugmanMaitreesh Ghatak0.815649848313
Paul KrugmanKai A. Konrad0.8237005644064
Paul KrugmanDavid Colander0.8346259277595
Paul KrugmanRichard Blundell0.8379342678746
Paul KrugmanGordon Rausser0.839415347067
Paul KrugmanEdward J. Nell0.8421785000158
Paul KrugmanRobin Boadway0.8423742605969
Paul KrugmanTim Besley0.84308810925310
\n", + "[10 rows x 4 columns]
\n", + "
" + ], + "text/plain": [ + "Columns:\n", + "\tquery_label\tstr\n", + "\treference_label\tstr\n", + "\tdistance\tfloat\n", + "\trank\tint\n", + "\n", + "Rows: 10\n", + "\n", + "Data:\n", + "+--------------+------------------+----------------+------+\n", + "| query_label | reference_label | distance | rank |\n", + "+--------------+------------------+----------------+------+\n", + "| Paul Krugman | Paul Krugman | 0.0 | 1 |\n", + "| Paul Krugman | Elise Brezis | 0.744498017262 | 2 |\n", + "| Paul Krugman | Maitreesh Ghatak | 0.81564984831 | 3 |\n", + "| Paul Krugman | Kai A. Konrad | 0.823700564406 | 4 |\n", + "| Paul Krugman | David Colander | 0.834625927759 | 5 |\n", + "| Paul Krugman | Richard Blundell | 0.837934267874 | 6 |\n", + "| Paul Krugman | Gordon Rausser | 0.83941534706 | 7 |\n", + "| Paul Krugman | Edward J. Nell | 0.842178500015 | 8 |\n", + "| Paul Krugman | Robin Boadway | 0.842374260596 | 9 |\n", + "| Paul Krugman | Tim Besley | 0.843088109253 | 10 |\n", + "+--------------+------------------+----------------+------+\n", + "[10 rows x 4 columns]" + ] + }, + "execution_count": 29, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "model_tf_idf.query(wiki[wiki['name'] == 'Paul Krugman'], label='name', k=10)" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "text/html": [ + "
Starting pairwise querying.
" + ], + "text/plain": [ + "Starting pairwise querying." + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
+--------------+---------+-------------+--------------+
" + ], + "text/plain": [ + "+--------------+---------+-------------+--------------+" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| Query points | # Pairs | % Complete. | Elapsed Time |
" + ], + "text/plain": [ + "| Query points | # Pairs | % Complete. | Elapsed Time |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
+--------------+---------+-------------+--------------+
" + ], + "text/plain": [ + "+--------------+---------+-------------+--------------+" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| 0            | 1       | 0.00169288  | 2.839ms      |
" + ], + "text/plain": [ + "| 0 | 1 | 0.00169288 | 2.839ms |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| Done         |         | 100         | 24.178ms     |
" + ], + "text/plain": [ + "| Done | | 100 | 24.178ms |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
+--------------+---------+-------------+--------------+
" + ], + "text/plain": [ + "+--------------+---------+-------------+--------------+" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
query_labelreference_labeldistancerank
Paul KrugmanPaul Krugman0.01
Paul KrugmanAllen Feldman0.002640543220932
Paul KrugmanCleo Paskal0.005032438384563
Paul KrugmanLene Auestad0.007253012279254
Paul KrugmanBen Klemens0.008782019935645
Paul KrugmanSteve Lawrence (computer
scientist) ...
0.01096904775946
Paul KrugmanRandall Hansen0.01103365010057
Paul KrugmanMetta Spencer0.01104501228228
Paul KrugmanSteven Skiena0.01146910093069
Paul KrugmanMohamed Osman Elkhosht0.011552559081210
\n", + "[10 rows x 4 columns]
\n", + "
" + ], + "text/plain": [ + "Columns:\n", + "\tquery_label\tstr\n", + "\treference_label\tstr\n", + "\tdistance\tfloat\n", + "\trank\tint\n", + "\n", + "Rows: 10\n", + "\n", + "Data:\n", + "+--------------+-------------------------------+------------------+------+\n", + "| query_label | reference_label | distance | rank |\n", + "+--------------+-------------------------------+------------------+------+\n", + "| Paul Krugman | Paul Krugman | 0.0 | 1 |\n", + "| Paul Krugman | Allen Feldman | 0.00264054322093 | 2 |\n", + "| Paul Krugman | Cleo Paskal | 0.00503243838456 | 3 |\n", + "| Paul Krugman | Lene Auestad | 0.00725301227925 | 4 |\n", + "| Paul Krugman | Ben Klemens | 0.00878201993564 | 5 |\n", + "| Paul Krugman | Steve Lawrence (computer s... | 0.0109690477594 | 6 |\n", + "| Paul Krugman | Randall Hansen | 0.0110336501005 | 7 |\n", + "| Paul Krugman | Metta Spencer | 0.0110450122822 | 8 |\n", + "| Paul Krugman | Steven Skiena | 0.0114691009306 | 9 |\n", + "| Paul Krugman | Mohamed Osman Elkhosht | 0.0115525590812 | 10 |\n", + "+--------------+-------------------------------+------------------+------+\n", + "[10 rows x 4 columns]" + ] + }, + "execution_count": 30, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "model_lda_rep.query(wiki[wiki['name'] == 'Paul Krugman'], label='name', k=10)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Notice that that there is no overlap between the two sets of top 10 nearest neighbors. This doesn't necessarily mean that one representation is better or worse than the other, but rather that they are picking out different features of the documents. \n", + "\n", + "With TF-IDF, documents are distinguished by the frequency of uncommon words. Since similarity is defined based on the specific words used in the document, documents that are \"close\" under TF-IDF tend to be similar in terms of specific details. This is what we see in the example: the top 10 nearest neighbors are all economists from the US, UK, or Canada. \n", + "\n", + "Our LDA representation, on the other hand, defines similarity between documents in terms of their topic distributions. This means that documents can be \"close\" if they share similar themes, even though they may not share many of the same keywords. For the article on Paul Krugman, we expect the most important topics to be 'American college and politics' and 'science and research'. As a result, we see that the top 10 nearest neighbors are academics from a wide variety of fields, including literature, anthropology, and religious studies.\n", + "\n", + "\n", + "__Quiz Question:__ Using the TF-IDF representation, compute the 5000 nearest neighbors for American baseball player Alex Rodriguez. For what value of k is Mariano Rivera the k-th nearest neighbor to Alex Rodriguez? (Hint: Once you have a list of the nearest neighbors, you can use `mylist.index(value)` to find the index of the first instance of `value` in `mylist`.)" + ] + }, + { + "cell_type": "code", + "execution_count": 38, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "text/html": [ + "
Starting pairwise querying.
" + ], + "text/plain": [ + "Starting pairwise querying." + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
+--------------+---------+-------------+--------------+
" + ], + "text/plain": [ + "+--------------+---------+-------------+--------------+" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| Query points | # Pairs | % Complete. | Elapsed Time |
" + ], + "text/plain": [ + "| Query points | # Pairs | % Complete. | Elapsed Time |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
+--------------+---------+-------------+--------------+
" + ], + "text/plain": [ + "+--------------+---------+-------------+--------------+" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| 0            | 1       | 0.00169288  | 20.877ms     |
" + ], + "text/plain": [ + "| 0 | 1 | 0.00169288 | 20.877ms |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| Done         |         | 100         | 302.418ms    |
" + ], + "text/plain": [ + "| Done | | 100 | 302.418ms |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
+--------------+---------+-------------+--------------+
" + ], + "text/plain": [ + "+--------------+---------+-------------+--------------+" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# Get a list of 'reference_label' based on knn\n", + "alex_rodriguez_tfidf = list(model_tf_idf.query(wiki[wiki['name'] == 'Alex Rodriguez'], label='name', k=5000)['reference_label'])" + ] + }, + { + "cell_type": "code", + "execution_count": 40, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "value of k: 52\n" + ] + } + ], + "source": [ + "print 'value of k:', alex_rodriguez_tfidf.index('Mariano Rivera')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "__Quiz Question:__ Using the LDA representation, compute the 5000 nearest neighbors for American baseball player Alex Rodriguez. For what value of k is Mariano Rivera the k-th nearest neighbor to Alex Rodriguez? (Hint: Once you have a list of the nearest neighbors, you can use `mylist.index(value)` to find the index of the first instance of `value` in `mylist`.)" + ] + }, + { + "cell_type": "code", + "execution_count": 41, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "text/html": [ + "
Starting pairwise querying.
" + ], + "text/plain": [ + "Starting pairwise querying." + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
+--------------+---------+-------------+--------------+
" + ], + "text/plain": [ + "+--------------+---------+-------------+--------------+" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| Query points | # Pairs | % Complete. | Elapsed Time |
" + ], + "text/plain": [ + "| Query points | # Pairs | % Complete. | Elapsed Time |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
+--------------+---------+-------------+--------------+
" + ], + "text/plain": [ + "+--------------+---------+-------------+--------------+" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| 0            | 1       | 0.00169288  | 3.993ms      |
" + ], + "text/plain": [ + "| 0 | 1 | 0.00169288 | 3.993ms |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
| Done         |         | 100         | 23.926ms     |
" + ], + "text/plain": [ + "| Done | | 100 | 23.926ms |" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
+--------------+---------+-------------+--------------+
" + ], + "text/plain": [ + "+--------------+---------+-------------+--------------+" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "value of k: 729\n" + ] + } + ], + "source": [ + "# Get a list of 'reference_label' based on knn\n", + "alex_rodriguez_lda = list(model_lda_rep.query(wiki[wiki['name'] == 'Alex Rodriguez'], label='name', k=5000)['reference_label'])\n", + "print 'value of k:', alex_rodriguez_lda.index('Mariano Rivera')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Understanding the role of LDA model hyperparameters\n", + "\n", + "Finally, we'll take a look at the effect of the LDA model hyperparameters alpha and gamma on the characteristics of our fitted model. Recall that alpha is a parameter of the prior distribution over topic weights in each document, while gamma is a parameter of the prior distribution over word weights in each topic. \n", + "\n", + "In the video lectures, we saw that alpha and gamma can be thought of as smoothing parameters when we compute how much each document \"likes\" a topic (in the case of alpha) or how much each topic \"likes\" a word (in the case of gamma). In both cases, these parameters serve to reduce the differences across topics or words in terms of these calculated preferences; alpha makes the document preferences \"smoother\" over topics, and gamma makes the topic preferences \"smoother\" over words.\n", + "\n", + "Our goal in this section will be to understand how changing these parameter values affects the characteristics of the resulting topic model.\n", + "\n", + "__Quiz Question:__ What was the value of alpha used to fit our original topic model? " + ] + }, + { + "cell_type": "code", + "execution_count": 43, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "text/plain": [ + "5.0" + ] + }, + "execution_count": 43, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "topic_model['alpha']" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "__Quiz Question:__ What was the value of gamma used to fit our original topic model? Remember that GraphLab Create uses \"beta\" instead of \"gamma\" to refer to the hyperparameter that influences topic distributions over words." + ] + }, + { + "cell_type": "code", + "execution_count": 44, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "text/plain": [ + "0.1" + ] + }, + "execution_count": 44, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "topic_model['beta']" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We'll start by loading some topic models that have been trained using different settings of alpha and gamma. Specifically, we will start by comparing the following two models to our original topic model:\n", + " - tpm_low_alpha, a model trained with alpha = 1 and default gamma\n", + " - tpm_high_alpha, a model trained with alpha = 50 and default gamma" + ] + }, + { + "cell_type": "code", + "execution_count": 45, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "tpm_low_alpha = gl.load_model('topic_models/lda_low_alpha')\n", + "tpm_high_alpha = gl.load_model('topic_models/lda_high_alpha')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Changing the hyperparameter alpha\n", + "\n", + "Since alpha is responsible for smoothing document preferences over topics, the impact of changing its value should be visible when we plot the distribution of topic weights for the same document under models fit with different alpha values. In the code below, we plot the (sorted) topic weights for the Wikipedia article on Barack Obama under models fit with high, original, and low settings of alpha." + ] + }, + { + "cell_type": "code", + "execution_count": 46, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAakAAAEbCAYAAABgLnslAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XmcFNXV//HPYVgU2UFRGBhQwR1/IiK4seUhoEE0Piig\nxqBRNEET8+SJQUIgRqMxxhiiPK7BhaBx30CDCY47YgRxCZsQBhhwQUABlfX8/qjbQ88w01Mj1HQz\n832/Xv2arqrbVaebYU7XrVv3mLsjIiKSi+pkOwAREZGKKEmJiEjOUpISEZGcpSQlIiI5S0lKRERy\nlpKUiIjkrESTlJndY2Yfm9m7GdpMMLNFZvaOmf2/JOMREZE9S9JnUpOAb1e00cwGAge5eydgJHB7\nwvGIiMgeJNEk5e6vAmszNBkM3B/avgk0NbPWScYkIiJ7jmxfk2oLLE9bLg7rREREsp6kREREKlQ3\ny8cvBtqlLeeHdTsxM00yKCJSg7m7lV1XHWdSFh7leRr4HoCZ9QDWufvHFe3I3WM9xo0bF7ttdT4U\n154dk+La82NSXLkbU0USPZMysylAb6ClmS0DxgH1o3zjd7r7NDM71cw+BDYCI5KMR0RE9iyJJil3\nHx6jzagkYxARkT1XjRw40bt372yHUC7FFV8uxgSKqypyMSZQXFWRCzFZpr7AXGJmvqfEKiIiVWNm\neDkDJ7I9uk9EaokOHTpQVFSU7TAkywoKCli6dGns9jqTEpFqEb4pZzsMybKKfg8qOpOqkdekRESk\nZlCSEhGRnKUkJSIiOUtJSkRqvY4dOzJjxoysHb9OnTosWbJkt7etCZSkRCRr8gs6YGaJPfILOmT7\nLcZiVtHMcbvWtibQEHQRyZriZUVcP3tzYvsf3bV+Yvvenaoy6rG2jZDUmZSISJrNmzfzk5/8hLZt\n25Kfn8+VV17Jli1bgGgGhieeeAKA1157jTp16vDcc88BMGPGDI455phy9/nWW29xwgkn0Lx5c9q2\nbcvll1/O1q1by207YsQILrvsMvr370+TJk3o06cPy5YtK9XmhRdeoHPnzrRo0YJRo3bMLLdkyRL6\n9etHq1at2G+//TjvvPP44osvdvkzySYlKRGRNNdeey2zZs3i3XffZe7cucyaNYtrr70WgF69elFY\nWAjAyy+/zEEHHcTLL78MwEsvvVThNEJ5eXnccsstrFmzhjfeeIMZM2YwceLECmOYMmUK48aN47PP\nPuPoo4/m3HPPLbV96tSpvP3228ydO5eHH36Y6dOnA9FZ1tVXX81HH33EvHnzWLFiBePHj9+1DyTL\nlKRERNKkEkTLli1p2bIl48aN44EHHgCiJPXSSy8BUZIaPXp0yfJLL71Er169yt1n165d6d69O2ZG\n+/btueSSS0peV57TTjuNE088kXr16nHdddfxxhtvUFy8o9Te6NGjady4Me3ataNPnz688847ABx0\n0EH069ePunXr0rJlS6688sqMx9kTKEmJiKRZuXIl7du3L1kuKChg5cqVAPTs2ZOFCxfyySefMHfu\nXL73ve+xfPlyPvvsM2bNmsUpp5xS7j4XLVrEoEGDOOCAA2jWrBljxoxh9erVFcbQrt2OWrD77LMP\nLVq0KIkBoHXr1iXPGzZsyIYNGwD45JNPGDZsGPn5+TRr1ozzzjsv43H2BEpSIiJp2rRpU2qOwaKi\nItq0aQPA3nvvzbHHHsuf/vQnjjzySOrWrUvPnj25+eabOfjgg2nRokW5+7zssss47LDDWLx4MevW\nreO6667LOABi+fLlJc83bNjAmjVraNu2baWxX3311dSpU4cPPviAdevWMXny5D1+oIWSlIhImmHD\nhnHttdeyevVqVq9ezW9+8xvOP//8ku2nnHIKt956a0nXXu/evUstl2f9+vU0adKEhg0bMn/+fP7v\n//4vYwzTpk3j9ddfZ/PmzYwdO5aePXuWJMpM1q9fT6NGjWjcuDHFxcX8/ve/j/muc5eSlIjUeun3\nHv3yl7+kW7dudOnShaOPPppu3boxZsyYku29evViw4YNJV17qeVMSeqmm27ir3/9K02aNGHkyJEM\nHTq0wuMDDB8+nPHjx9OyZUvmzJnD5MmTK2ybbty4cbz99ts0a9aMQYMGcdZZZ8X7AHKYZkEXkWpR\n3uzX+QUdKF6WXPmOtu0LWFG0NLH9J2HEiBG0a9eOa665JtuhJKKqs6DrZl4RyZo9LYFI9VN3n4hI\nDqlt0x5VRt19IlItVPRQQEUPRUSkBlGSEhGRnKUkJSIiOUtJSkREcpaSlIiI5CwlKRGRKrrsssu4\n7rrrdnvbTIqKiqhTpw7bt2/f5X1V1YgRI/jVr34Vq23Hjh2ZMWPGbju2buYVkazpkJ9PUVoJit2t\noG1blq5Ysdv3W9nce9+0bWVq4z1UsZKUmQ0FDnL368ysHbCfu7+dbGgiUtMVFRezLsGifM0S2Pf2\n7dupU0edUNWl0k/azG4F+gDnhVUbgduTDEpEpDrNnz+fPn360Lx5c4466iieeeaZkm0jRozghz/8\nIaeddhqNGzemsLBwp+6vG2+8kTZt2pCfn88999xDnTp1WLJkScnrU21feukl2rVrx80330zr1q1p\n27Yt9957b8l+pk2bRteuXWnatCkFBQX8+te/jv0eOnbsyE033cTRRx9N48aNufjii/nkk0849dRT\nadKkCf379+fzzz8vaf/0009z5JFH0qJFC/r27cv8+fNLts2ZM4djjz2Wpk2bMnToUL7++utSx3r2\n2Wc55phjaN68OSeddBLvvfde7DirKs7XgRPcfSTwNYC7rwHqJxaRiEg12rp1K4MGDWLAgAF8+umn\nTJgwgXPPPZdFixaVtHnwwQcZO3Ys69ev58QTTyz1+ueff55bbrmFGTNm8OGHH1JYWJixW+6jjz5i\n/fr1rFy5krvvvpsf/ehHJcmjUaNGPPDAA3z++edMnTqV22+/naeffjr2e3n88cf55z//ycKFC3n6\n6ac59dRTueGGG1i9ejXbtm1jwoQJACxcuJDhw4czYcIEPv30UwYOHMigQYPYunUrW7Zs4cwzz+SC\nCy5gzZo1DBkyhMcee6zkGHPmzOGiiy7irrvuYs2aNYwcOZLTTz+dLVu2xI6zKuIkqS1mVgdwADNr\nCVT/lTsRkQTMnDmTjRs3ctVVV1G3bl369OnDd77zHR588MGSNoMHD6ZHjx4ANGjQoNTrH3nkEUaM\nGMGhhx7KXnvtxfhKuhjr16/P2LFjycvLY+DAgTRq1IgFCxYAUa2qI444AoAjjzySoUOHVqn8++WX\nX06rVq044IADOPnkkzn++OPp0qUL9evX58wzz2TOnDkAPPzww3znO9+hb9++5OXl8bOf/Yyvv/6a\n119/nZkzZ7J161auuOIK8vLyOOusszjuuONKjnHXXXdx6aWX0q1bN8yM888/nwYNGjBz5szYcVZF\nnCR1G/AYsK+Z/Rp4FfhdItGIiFSzlStXlirXDlHJ+OK0AR1lt2d6fbt27TLOUdiyZctS17TSy7+/\n+eab9O3bl/32249mzZpxxx13VKn8e3pZ+b333nun5dRxVq5cSUFBQck2MyM/P5/i4mJWrly5UxXg\n9LZFRUX84Q9/oEWLFrRo0YLmzZuzYsWKUuXtd6dKk5S73w/8ErgJWAsMcfeHEolGRKSatWnTplS5\ndoBly5aV+kOdqfvugAMOYEXaCMJly5Z941F45557LmeccQbFxcWsW7eOkSNHJjIpb5s2bSgqKl3H\na/ny5bRt23an9wPRe0pp164dY8aMYc2aNaxZs4a1a9eyYcMGzjnnnN0eJ2RIUmbWJPUAlgOTgL8A\ny8I6EZE93vHHH0/Dhg258cYb2bp1K4WFhTz77LMMGzYs1uvPPvtsJk2axPz58/nyyy+59tprv3Es\nGzZsoHnz5tSrV49Zs2YxZcqUUtt3V8I6++yzmTp1Ki+++CJbt27lpptuYq+99uKEE06gZ8+e1KtX\njz//+c9s3bqVxx9/nFmzZpW89uKLL+b2228vWbdx40amTZvGxo0bd0tsZWU6k/oAeD/8TD1/P+25\niMger169ejzzzDNMmzaNVq1aMWrUKB544AE6deoElH8Wlb5uwIABXHHFFfTp04fOnTvTs2dPYOdr\nVxVJ39fEiRMZO3YsTZs25dprr93p7CTTGVrZbZnadu7cmcmTJzNq1Cj23Xdfpk6dyjPPPEPdunWp\nV68ejz/+OJMmTaJly5Y88sgjpcrQH3vssdx1112MGjWKFi1a0LlzZ+67775Yx/0mVE9KRKpFeXWE\n9tSbeTOZP38+Rx11FJs2bdL9VOXY7fWkzOx0M2uattzMzL6zy5GKSK23dMUK3D2xR3UlqCeffJLN\nmzezdu1arrrqKk4//XQlqN0kzqd4jbuX3AHm7uuA38Q9gJkNMLP5ZrbQzK4qZ3sTM3vazN4xs/fM\n7Ptx9y0ikgvuuOMO9ttvPzp16kS9evWYOHFitkOqMSrt7jOzue5+dJl177n7UZXuPLq/aiHQD1gJ\nvAUMdff5aW1GA03cfbSZtQIWAK3dfWuZfam7T2QPpvLxAsmUj59jZjeaWUF4/B6YEzOe7sAidy9y\n9y3AQ8DgMm0caByeNwY+K5ugRESkdoqTpEaFdk+FB8APY+6/LdHw9ZQVYV26W4HDzWwlMBf4ccx9\ni4hIDVfpLOjuvgH4WYIxfBuY4+59zewg4AUz6xKOKyIitViFScrM/uDu/2NmTxDm7Uvn7t+Nsf9i\noH3acn5Yl24EcH3Y52Iz+w9wKPCvsjtLnxOrd+/e9O7dO0YIIiKSawoLCyksLKy0XYUDJ8ysu7vP\nMrN+5W13939WunOzPKKBEP2AVcAsYJi7z0trcxvwibv/2sxaEyWno8Ns6+n70sAJkT2YBk4I7MaB\nE+6emgfjMHf/Z/oDOCxOMO6+jeia1nSimSoecvd5ZjbSzC4Jza4FTjCzd4EXgJ+XTVAiIknKVPL8\n1Vdf5bDDYv3JK6kXtTtUZV+787i5Jk5l3guJBjeku6icdeVy9+eBQ8qsuyPt+Sqi61IiUst0KNif\nomUfJ7b/gvatWVr00S7t46STTmLevHmVNwx257RAVdlXTS0tn+ma1DnAUOBAM3s8bVNjYF3SgYlI\nzVe07GM8/t//KrPDkkuAUj0yDUGfRVRLalH4mXqMAfonH5qISPWZM2cORx99NM2bN2fYsGFs3rwZ\n2Lkrbfbs2SUl3s8++2yGDh1aqpS8u1dYHr6se++9l8MPP5wmTZpw8MEHc+edd1bYtmPHjtxwww0c\nccQRtGzZkosuuqgkxsqOuytl6bMt0zWp/wAvAuvKXJOaFW7MFRGpMR555BGmT5/Of/7zH+bOnVvq\nj3yqK23Lli1897vf5cILL2TNmjUMGzaMJ554otR+MpWHL6t169ZMmzaNL774gkmTJnHllVfyzjvv\nVBjjlClTeOGFF1i8eDELFiwoVRYkybL02ZTxZt4w8CFP9aNEpKb78Y9/TOvWrWnWrBmDBg0qN1m8\n8cYbbNu2jVGjRpGXl8eZZ55J9+7dS7XJVB6+rIEDB9KhQwcATj75ZPr3788rr7xSYYyXX345bdq0\noVmzZowZM6ZUifsky9JnU5yBE58Dc81sOlBS1crdf5pYVCIi1Sy91HrDhg1ZtWrVTm1WrVq1U2n1\nsqPqMpWHL+u5557jmmuuYeHChWzfvp2vvvqKLl26VBhjfn5+yfOCgoJSJdsrK0s/evRo3n//fTZv\n3szmzZsZMmRIhcfJJXGmRXqWaJj4LEoXQBQRqVUOOOAAisvUvypbej6uzZs389///d/8/Oc/59NP\nP2Xt2rUMHDgw471k6ccqKiqiTZs2sY5VXWXpk1BpknL3e9IfwHNA08peJyJS0/Ts2ZO8vDxuu+02\ntm3bxlNPPVWqtHpVpM5oWrVqRZ06dXjuueeYPn16xtfcdtttFBcXs2bNGn77298ydOjQWMeqrCx9\nLotVlcvMWpjZJWb2IvA6UJBsWCIi1SfuPUap0up33303zZs3Z8qUKQwaNChjqfiK9t2oUSMmTJjA\nkCFDaNGiBQ899BCDB5ctElHa8OHD6d+/PwcffDCdOnVizJgxsY5bWVn6XJZpWqR9gDOA4cARRDOg\n/7e7l53FvFpoWiSRPVu55eP3gJt5K9OjRw8uu+wyLrjggkSP07FjR+655x769u2b6HGSVtVpkTIN\nnPiEaB698cBL7r7dzE7fXYF+E3G/7bRtX8CKoqXJBiMiuyzpBJKEl19+mUMOOYRWrVoxefJk3nvv\nPQYMGJDtsGqsTElqHNGMEzcDD5rZ3yhnNvTqdP3szZU3AkZ3rZ9wJCJSWy1YsICzzz6bL7/8kgMP\nPJDHHnus1MjApNTUaY8qE6d8fCdgGGGKJKIZJ55w9yXJh1cqDq9KklLXoEhu0SzoAgmUj3f3Re5+\njbsfDvQA9gMqLdMhIiKyq2KN7ktx93fc/Sp375hUQCIiIilVSlIiIiLVSUlKRERyVpy5+0REdllB\nQUGtHaEmOxQUVG0uiEqTlJkdBFwHHA7slVrv7p2rGpyI1F5Lly7NdgiyB4rT3XcvMAkwYCDwMPC3\nBGMSEREB4iWphu7+dwB3X+zuvyRKViIiIomKc01qk5nVARab2aVAMdA42bBERETiJakrgX2AK4iu\nTTUFLkwyKBEREYiRpNz9zfB0PXB+suGIiIjsEGd0X1dgNFENqZL27t41wbhERERidfc9SJSk3gO2\nJxuOiIjIDnGS1Gp3fzzxSERERMqIk6R+bWa3E818vim10t2fTiwqERER4iWpc4EuRMPOU919DihJ\niYhIouIkqR7ufkjikYiIiJQRZ8aJN81MSUpERKpdnDOpY4B3zexDomtSBriGoIuISNLiJKkzEo9C\nRESkHHFmnFgMYGYtSCvVISIikrRKr0mZ2WlmthBYAbwJLAdmJB2YiIhInIET1wEnAgvcvR0wAHgl\n0ahERESIl6S2uvunQB0zM3d/AeiecFwiIiKxBk58bmaNgNeA+83sE+CrZMMSERGJdyZ1BvA18GOg\nkKjo4aAEY9plDfLyMLPYjw75+dkOWUREyhFndN96M9sXOA5YCTwduv9iMbMBwC1ECfEed/9dOW16\nA38E6gGfunufuPsvz6Zt21g3fnzs9s2q0FZERKpPnNF9I4DZwHDgPOBfZnZBnJ2HsvO3At8GjgCG\nmdmhZdo0BW4DvuPuRwJDqvQORESkxopzTeoXQNfU2VM4q3oVuC/Ga7sDi9y9KLz2IWAwMD+tzXDg\nMXcvBnD31fHDFxGRmizONak1wLq05XVhXRxtie6rSlkR1qXrDLQwsxfN7C0zU4l6EREBMpxJmdkV\n4ekC4A0ze5KoRMcZwPu7OYauQF9gn3CsN9z9w914DBER2QNl6u7bN/xcHh4NwvLzVdh/MdA+bTk/\nrEu3gqj679fA12b2MnA0sFOS+sft15Q8P7BbLw7s1qsKoYiISK4oLCyksLCw0nbm7pU3MtsLICSS\n2Mwsj+hMrB+wCpgFDHP3eWltDgX+TDSTRQOiqZfOcfd/l9mXXz97c6zjju5av8qj++J8DiIikgwz\nw92t7PqM16TM7GIzWwJ8BHxkZovN7JK4B3X3bcAoYDrwAfCQu88zs5Gp/bj7fODvwLvATODOsglK\nRERqp0zXpEYDvYEB7r4wrOsM/MnMWrr79XEO4O7PA4eUWXdHmeWbgJuqFrqIiNR0mc6kvg+ckUpQ\nAOH5WcCIhOMSERHJmKTc3Xeao8/dvwS2JxeSiIhIJFOSWhWmKyrFzHoRXaMSERFJVKYh6FcAT5rZ\ni8DbYV03outUKikvIiKJq/BMyt3fA44kGjZ+aHjMAo4K20RERBKVce6+cE3qzmqKRUREpJQ4c/eJ\niIhkhZKUiIjkrMpmnMgzs/urKxgREZF0GZNUmNboQDOrV03xiIiIlIhT9HAx8IqZPQVsTK109wmJ\nRSUiIkK8JLUsPBqGh4iISLWoNEm5+1gAM9s7LO80VZKIiEgSKh3dZ2aHm9lbwCJgkZm9aWaHJR+a\niIjUdnGGoN8JXO3u+e6eD4wB7ko2LBERkXhJqrG7v5BacPd/AI2TC0lERCQSJ0ktNbPRZpYfHr8A\nliYcl4iISKwkdSHQDpgGTAXywzoREZFEZSoff6+7fx8Y5u4/rL6QREREIpnOpLqb2X7AxWbW2Mya\npD+qK0AREam9Mt0ndTfwGtAe+ACwtG0e1ouIiCQmU9HDm929E3C/u7d393ZpDyUoERFJXKUDJ9z9\n4uoIREREpCzVkxIRkZylJCUiIjkrztx9l5lZ0+oIJlsa1Aczi/XoULB/tsMVEak14pTqKABmm9mb\nwF/CtEg1yqbN4PPitbXDPk42GBERKRFn4MQvgE7AX4FLzWyRmV1jZh0Sjk1ERGq5WNek3H070Xx9\nS4HtwAHAU2Z2fWKRiYhIrVdpd5+Z/Qi4APgCuAcY4+6bzKwO8CEwOtkQRUSktopzTaoN0fx9i9NX\nuvt2Mzs9mbBERETidfe1LZugzOxeAHd/P4mgREREIF6S6pK+ELr5jksmHBERkR0qTFJmdpWZrQW6\nmNma8FgLrCaqLSUiIpKoTGdSNwL7An8MP/cFWrl7C3f/3+oITkREardMAycOdvdFZvYAcERqpVlU\nscPd3004NhERqeUyJanRRGXibytnmwOnJBKRiIhIUGGScvcLw8+Tqy8cERGRHSpMUpXdA+XuT8c5\ngJkNAG4huv51j7v/roJ2xwGvA+e4++Nx9i0iIjVbpu6+IRm2OVBpkgrD1W8F+gErgbfM7Cl3n19O\nuxuAv1casYiI1BqZuvvO3w377w4scvciADN7CBgMzC/T7nLgUXT/lYiIpMnU3TfM3R80syvK2+7u\nE2Lsvy2wPG15BVHiSj9OG+AMd+9jZqW2iYhI7Zapu695+LlvwjHcAlyVtmwJH09ERPYQmbr7Joaf\nY3dh/8VA+7Tl/LAuXTfgIYtuwGoFDDSzLeUNzPjH7deUPD+wWy8O7NZrF0ITEZFsKSwspLCwsNJ2\n5u6ZG0TFDf8I9AyrXgP+x92XVrpzszxgAdHAiVXALKIZ1cutg2tmk4BnyhvdZ2Z+/ezNlR0SgNFd\n67Nu/PhYbQGajR9fhcq8UNlnJiIiVWNmuPtOPWlxJph9kGgkX/vweCasq5S7bwNGAdOBD4CH3H2e\nmY00s0vKe0mc/YqISO0Qp57UPu4+KW35XjO7Mu4B3P154JAy6+6ooO2FcfcrIiI1X6bRfU3C02lm\n9jPgIaIznXOAqdUQm4iI1HKZzqQ+IEpKqT7CH6dtc+DqpIISERGBzKP72lVnICIiImXFuSaFmR0K\nHA7slVrn7lOSCkpERARiJCkz+yXQHziUaG69bwOvAkpSIiKSqDhD0M8B+gCrwnx+RwP7JBqViIgI\n8ZLUV+F+p61m1hj4CChINiwREZF416TmmFkz4C/Av4AviGaOEBERSVSlScrdR4ant5nZ34Em7j47\n2bBERETij+47HTiJ6P6oVwElKRERSVyl16TM7M9EN/IuAj4ErjCzOLWkREREdkmcM6lvAYd7mPrb\nzP4CvJ9oVCIiIsQb3fcfojpQKQcAi5MJR0REZIdME8w+QXQNai9gnpnNDJt6AG9WQ2wiIlLLZeru\nu7XaohARESlHpglm/5l6bmatiMq8A/zL3VcnHZiIiEic0X1nEQ05Px/4HvAvMzsz6cBERETijO77\nFXCcu38MYGaticrBP5FkYCIiInFG99VJJajgk5ivExER2SVxzqReMLOpwINheShRyQ4REZFExUlS\n/wMMIZoWCeA+4NHEIhIREQkyJikzywOed/f/Ah6unpBEREQiGa8thTpSeWbWpJriERERKRGnu+9z\nYK6ZTQc2pla6+08Ti0pERIR4SerZ8BAREalWlV2TOgr4DPjA3RdVT0giIiKRCq9JmdnVwJPAuUTD\n0C+stqhERETIfCZ1LtDF3Tea2b7ANOAv1ROWiIhI5tF9m9x9I4C7f1pJWxERkd0u05nUgWb2eHhu\nwEFpy7j7dxONTEREar1MSeqsMsuqLyUiItUqVj0pERGRbNB1JhERyVlKUiIikrOUpEREJGfFKR//\nvJk1S1tuHupLiYiIJCrOmVRrd1+XWnD3tUCb5EISERGJxElS280sP7VgZu0TjEdERKREnCT1K+A1\nM5tkZvcCLwNXxz2AmQ0ws/lmttDMripn+3Azmxser4ZJbUVERCov1eHuU82sO9AzrPq5u38SZ+dm\nVofoJuB+wErgLTN7yt3npzVbApzi7p+b2QDgLqBHVd6EiIjUTJlmQe8UfnYBWhMlkyXA/mFdHN2B\nRe5e5O5bgIeAwekN3H2mu38eFmcCbav2FkREpKbKdCb1C+Ai4LZytjlwSoz9twWWpy2vIEpcFfkB\n8FyM/YqISC2QaVqki8LPk6sjEDPrA4wATqqO44mISO6r9JqUmTUARhIlDwdeAe5y900x9l8MpI8G\nzA/ryh6jC3AnMCAMcS/XP26/puT5gd16cWC3XjFCEBGRXFNYWEhhYWGl7czdMzcwewjYBEwOq4YD\ne7v70Ep3bpYHLCAaOLEKmAUMc/d5aW3aA/8Eznf3mRn25dfP3lzZIQEY3bU+68aPj9UWoNn48eyI\nKDM7DCr7zEREpGrMDHe3susrPZMiqs57eNryC2b27zgHdfdtZjYKmE40SOMed59nZiOjzX4nMBZo\nAUw0MwO2uHum61YiIlJLxElSc83sOHd/C8DMjgXmxD2Auz8PHFJm3R1pzy8GLo67PxERqT3iJKmj\ngDfNbElY7gjMM7M5RGdDXROLTkREarU4SWpw5U1ERER2vzgzTiw2syOA1FD0V9z9g2TDEhERiVeq\nYxTwCNFQ8vbAw2b2w6QDExERidPddwnQ3d03AJjZb4HXgYlJBiYiIhJnFnQD0m9Q2hLWiYiIJKrC\nMykzq+vuW4EHiEb3PRY2nQncVx3BiYhI7Zapu28W0NXdbzSzQnbMqXdp6p4pERGRJGVKUiVdeu4+\niyhpiYiIVJtMSWpfM/tpRRvd/eYE4hERESmRKUnlAY3QIAkREcmSTElqlbtfk2G7VFGH/HyKineq\nVFKugrYBcrr/AAAPlElEQVRtWbpiRcIRiYjktljXpGT3KCoujl1CpPVvxxNNCh9PQfvWLC366BtG\nJiKSmzIlqX7VFoXsZNNmYte4ArDDPk4uGBGRLKnwZl53X1OdgYiIiJQVZ8YJySC/oANmFushIiJV\nE2fuPsmgeFkRVSlrLyIi8elMSkREcpaSlIiI5CwlKRERyVlKUiIikrOUpEREJGcpSYmISM5SkhIR\nkZylJCUiIjlLSUpERHKWkpSIiOQsJakaqCrzCZoZ+QUdYu+7Q8H+sffboWD/5N7kHhBXVWKq7s9L\nZE+huftqoKrMJwhVm1OwaNnHsUuIVGf5kFyMqyoxgcqtiJRHZ1IiIpKzlKREaplc7BoVqYi6+4QG\neXmqd1WL5GLXqEhFlKSETdu2sW78+Fhtm8VsJyKyO6i7T3JSh/z8Ko2ME5GaSWdSkpOKiotjn92B\nzvBqgg4F+1O0LF73YkH71iwt+ijhiHIzJsjduJKgJCVSBR3y8ykqLs52GDvJ1biqIhevleViTJCb\ncVUlcUL85KkkJdUmv6ADxcuKsh3GLqnKGV51nt3lalxSeyR1X6CSlFSbqtxkXJUbjCU31YSzO8m+\nxJOUmQ0AbiEapHGPu/+unDYTgIHARuD77v5O0nGJSLJy9bpiribPXI0r2xJNUmZWB7gV6AesBN4y\ns6fcfX5am4HAQe7eycyOB24HeiQZl0i6mtANKfHlatdoLsa1/7778vHq1dVyrIokfSbVHVjk7kUA\nZvYQMBiYn9ZmMHA/gLu/aWZNzay1u+suQqkW6oYUKd/Hq1dnPXEmfZ9UW2B52vKKsC5Tm+Jy2ojU\nKlWdyT4X4xLZHTRwQiQHJTmT/a7I1bPOXOyyzcWYIHfjqoi5e3I7N+sBjHf3AWH5F4CnD54ws9uB\nF939b2F5PtCrbHefmSUXqIiIZJ2773QKnvSZ1FvAwWZWAKwChgLDyrR5GvgR8LeQ1NaVdz2qvOBF\nRKRmSzRJufs2MxsFTGfHEPR5ZjYy2ux3uvs0MzvVzD4kGoI+IsmYRERkz5Fod5+IiMiuqHGzoJvZ\nADObb2YLzeyqbMcDYGb3mNnHZvZutmNJMbN8M5thZh+Y2XtmdkW2YwIwswZm9qaZzQlxjct2TClm\nVsfMZpvZ09mOJcXMlprZ3PB5zcp2PCnhVpJHzGxe+B07Pgdi6hw+p9nh5+e58HtvZlea2ftm9q6Z\n/dXMcuI+BzP7cfg/mNW/DzXqTCrcPLyQtJuHgaHpNw9nKa6TgA3A/e7eJZuxpJjZ/sD+7v6OmTUC\n3gYGZ/uzAjCzhu7+pZnlAa8BV7h71v8Am9mVwLFAE3c/PdvxAJjZEuBYd1+b7VjSmdm9wEvuPsnM\n6gIN3f2LLIdVIvytWAEc7+7LK2ufYBxtgFeBQ919s5n9DZjq7vdnK6YQ1xHAg8BxwFbgOeBSd19S\n3bHUtDOpkpuH3X0LkLp5OKvc/VUgp/6IuPtHqemn3H0DMI8cuT/N3b8MTxsQXTfN+jcpM8sHTgXu\nznYsZRg59v/YzJoAJ7v7JAB335pLCSr4FrA4mwkqTR6wTyqZE33BzrbDgDfdfZO7bwNeBr6bjUBy\n6pd7N4hz87CUYWYdgP8HvJndSCKhW20O8BHwgru/le2YgD8C/0sOJMwyHHjBzN4ys4uzHUzQEVht\nZpNC19qdZrZ3toMq4xyiM4WscveVwB+AZUQTGaxz939kNyoA3gdONrPmZtaQ6Atau2wEUtOSlFRR\n6Op7FPhxOKPKOnff7u7HAPnA8WZ2eDbjMbPTgI/DmaeFR6440d27Ev0R+VHoWs62ukBX4LYQ25fA\nL7Ib0g5mVg84HXgkB2JpRtTbUwC0ARqZ2fDsRgWh2/93wAvANGAOsC0bsdS0JFUMtE9bzg/rpByh\ne+FR4AF3fyrb8ZQVuoheBAZkOZQTgdPD9Z8HgT5mltVrBinuvir8/BR4gqjLO9tWAMvd/V9h+VGi\npJUrBgJvh88s274FLHH3NaFb7XHghCzHBIC7T3L3bu7eG1hHdL2/2tW0JFVy83AYITOU6GbhXJBr\n38AB/gL8293/lO1AUsyslZk1Dc/3Bv6L0hMSVzt3v9rd27v7gUS/UzPc/XvZjAmiASbhTBgz2wfo\nT9RNk1XhZvzlZtY5rOoH/DuLIZU1jBzo6guWAT3MbC+LJjzsR3R9OOvMbN/wsz1wJjAlG3HUqLn7\nKrp5OMthYWZTgN5ASzNbBoxLXVTOYkwnAucC74XrPw5c7e7PZzMu4ADgvjD6qg7wN3efluWYclVr\n4IkwZVhd4K/uPj3LMaVcAfw1dK0tIUdu0g/XV74FXJLtWADcfZaZPUrUnbYl/Lwzu1GVeMzMWhDF\n9cNsDX6pUUPQRUSkZqlp3X0iIlKDKEmJiEjOUpISEZGcpSQlIiI5S0lKRERylpKUiIjkLCWpWsTM\nWqSVKlhlZivSlqt0z1woP9JpF+NpaGYv7so+0vZ1ZVVLHJhZPzN7opz1F5nZH3dHXFWMp9LP1Mwe\nMLOdZmA3s45mds43OObNoRTDb8us72Nmu2X2Cos8b2ZrzezxMtsOtKg0y0Izmxxmvk9tm2hmi8zs\nHTOrtuoBZnZQuHewou0NzOylcPOtJExJqhYJU68cE+ZT+z/g5tSyu2+t4r4ucvdFuxjSD4CHd3Ef\nhD9sPwX2+gYvr+hGwWq/gXAXP9ODiGbDiC38kR3h7ke5+9VlNvcFen7DWErx6GbM3wEXlLP598AN\n7t4Z+Ar4fohtENDW3TsBPwIm7o5YypOeGNNU+O/v7puAQmBIUjHJDkpStVepb4Fm9vPwjfrdMGtH\n6hvl+2b2oJn928weMrMGYdsrqW+3Znaamb0dzsqeD+v6hm/As83sXxXMgn0u8FRo3ybsc3aIoUdY\nf15YftfMrgvr8sK38j+a2TtEs5PvB7xiZtNDm4Fm9no49oOp44dY55vZv8hcxqWDmRWa2QIzuzq8\n9joz+1HaZ3aDmV1W5nP8hZldGp7/2cz+Hp7/l0U1llKFOcuLLf0zHRmO/YaZ3WVmN6cdpq+ZvWZm\nH5pZ6j1cD/QOn9+oMjGZmf0h/PvONbNUyYVniSY0nZ22DjM7kOgLxM/Cth5m1sGiIpnvmNnfLaqD\nlDqzmxjey3wzK3eeRXd/EdhYJq46wCnAk2HVfcAZ4flg4P7w2teA1mbWsszrh5rZ78Lz/zGzBeF5\nJzMrDM/7h9/LuWZ2h4UeAzNbbmbXm9nbwBlm1i20mQ1cmnaMo8xsVvgc3rGoYgBEv7fnlvdeZTdz\ndz1q4QMYB/w0PO9ONB1LfaAR0TxrRxB9O98OHBfa3UdUgBDgFaAL0dQ8RUB+WN8s/JyW9rqGhNlN\n0o7fAFiRtvxz4H/DcwuvaQv8B2hOVHOnkGi277wQ1+C01y8DGofn+4a2e4Xlq4lm4d6bqJRLh7D+\nUeDxcj6bi0K7JiGOD8J7PQiYFdrUARYDTcu89kSi6YkgKmY3M7yfa4imBio3tjKfaT7RVEJNiKY7\neo3orBfggbT9HwXMC8/7lfdewraziQrpEf69lgGtwue4poLX/Cb1b5327zk0PL8YeCQtnqfD885h\n3/Uq2GepGEMs/05b7gDMDs+fA7qnbSsEupTZX1vgtfD8CaJSM/sCFwK/LuffezLR9D6E9T9J29f7\nQI/w/Oa0OCYCQ8LzekD98DyPaGb8rP9frukPnUkJwEnAY+6+2aNyHU8CJ4dtS3xHPafJoW26nkQT\nrq4AcPd1Yf1rwITwrb6ph//ZafYD1qQtvwX8wMzGAkd5VPjweOCf7r7WoxmipxB98wbY5KVnbk+f\nwPcE4HDgdYuuLQwn+gN4OLDA3ZeGdn/N8Jn83d2/CHE8CZzk7ouBLyyqWjqQqCjc52Ve9xZwnEWT\n5G4Iy8cSfZ6vVBBbQZl9pN73Fx51wz5aZvuTAO7+HlF5h8qcRJhQ1aPJX18BuoVtca+rHA/8LTy/\nn9K/Bw+HfS8kSlK7dK0yLncvBlpYNB/f/iGOXuz4rA+j9L/3/ez4/YHwfsIZ2l7uPjOsfyCtzevA\nWDP7X6C9u28Ox94GbE/1LEhylKSkqsrrq9/pD527X0f0jbsRMNPMDirT5CvSriF51B3UG1hFNMHs\nsIr2nfb6ihjwnEfX2o5x9yPd/dK0bXGUfZ+p5XuIzohGEM0iX7pR9EdsJfA9ojOpV4jOINq7+4cV\nxHZZ2f1UEuemmO0qkv6auNfeMrVL32ZV2OenQCuzkgEI6aV1iildZK+isjszic58PyD6rE8mSqiv\np8VTkY0ZtgHg7pOJuiA3Ac9b6Xpd9T26PiUJUpISiP5zn2nRqKVGRNcDXgnbOprZseH58LT1Ka8T\nXQtpD2BmzcPPA939fXe/AZgNHJL+IndfDeyddo2gPVH3yd3AvcAxRN03vS2qDlqXaGBAYdhF2T8+\nXxB1j6Vi6mVmHcO+G5rZwUTdmKlSLkZUsqEi/c2sSfiWPpjozBCiej+DgKO94gqqrwA/Iyq5/SrR\nhf9UbaWKYks3K7zvJhbNIp6pbHfqc1gPNM4Qz9Bwbao10dlcKp6K/oivZ8fnCVEyODs8Pz+8t5Qh\n4b10JkomFQ3+KFWuxt23h9hS7+8CwjVKohI73wv7PQn4yN0/K2efrxJ91i8R/Z59G1gfzoDnEf17\ndwhtz2PH70+JsN+vzOz4sKrkWpOZdXT3Je4+gegaXuqa4X6oVl21UJISQnfeg0R/uF4nqqj6Qdg8\nD/ipmf2bqI//rtTLwms/AS4DngrdV5PD9p+FC/XvEP3BK6+ExD/YUeCtH5C6cH0m8OfQnTOWHX+A\nXvcdpUTKflu/C/iHmU0PMf0A+Fs4/mtAJ3f/KsT6PFEiWJnhY3mL6A/lHKJrQO+G97uJ6A90pnpE\nrxB1Z870qDz45vCa1Od1UdnY0t+Tuy8nGvX2VnjdYuDz9DZpUstzgLphkMCoMm0eJarJ9S7Rv8OV\n4UtCeftLeQo426IBMT2IEu3IEPMQ4Mq0tsUWDUR5CrjYyxkpamavE3Wv9jezZWbWJ2z6OXCVmS0E\n9iH6ggLwDLDSzD4EbgvHL88rRInx5XDcFez4rL8i+qyfMLO5wNfA3RW87wuBO8PvX3oF2uEWDR6a\nQ/TvlPr97gNMrSAm2Y1UqkMqFLroHvWolHsS++8GXObuFyWx/ySEEWlziAZtLE3wOPu4+8ZwBvkU\nMNHdc+6Popk9QDSIIleKi1YLM3uSKNn/J9ux1HQ6k5LKJPYtxqPy4q8mtf/dzcyOBD4EpiWZoILf\nhG/1c4H5uZigglr3Ldeim8YfUYKqHjqTEhGRnKUzKRERyVlKUiIikrOUpEREJGcpSYmISM5SkhIR\nkZylJCUiIjnr/wNRBmUj3gTArwAAAABJRU5ErkJggg==\n", + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "a = np.sort(tpm_low_alpha.predict(obama,output_type='probability')[0])[::-1]\n", + "b = np.sort(topic_model.predict(obama,output_type='probability')[0])[::-1]\n", + "c = np.sort(tpm_high_alpha.predict(obama,output_type='probability')[0])[::-1]\n", + "ind = np.arange(len(a))\n", + "width = 0.3\n", + "\n", + "def param_bar_plot(a,b,c,ind,width,ylim,param,xlab,ylab):\n", + " fig = plt.figure()\n", + " ax = fig.add_subplot(111)\n", + "\n", + " b1 = ax.bar(ind, a, width, color='lightskyblue')\n", + " b2 = ax.bar(ind+width, b, width, color='lightcoral')\n", + " b3 = ax.bar(ind+(2*width), c, width, color='gold')\n", + "\n", + " ax.set_xticks(ind+width)\n", + " ax.set_xticklabels(range(10))\n", + " ax.set_ylabel(ylab)\n", + " ax.set_xlabel(xlab)\n", + " ax.set_ylim(0,ylim)\n", + " ax.legend(handles = [b1,b2,b3],labels=['low '+param,'original model','high '+param])\n", + "\n", + " plt.tight_layout()\n", + " \n", + "param_bar_plot(a,b,c,ind,width,ylim=1.0,param='alpha',\n", + " xlab='Topics (sorted by weight of top 100 words)',ylab='Topic Probability for Obama Article')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here we can clearly see the smoothing enforced by the alpha parameter - notice that when alpha is low most of the weight in the topic distribution for this article goes to a single topic, but when alpha is high the weight is much more evenly distributed across the topics.\n", + "\n", + "__Quiz Question:__ How many topics are assigned a weight greater than 0.3 or less than 0.05 for the article on Paul Krugman in the **low alpha** model? Use the average results from 100 topic predictions." + ] + }, + { + "cell_type": "code", + "execution_count": 52, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "+---------------------+-------------------------------+\n", + "| average predictions | topics |\n", + "+---------------------+-------------------------------+\n", + "| 0.629444444444 | art and publishing |\n", + "| 0.0176543209877 | Great Britain and Australia |\n", + "| 0.0161111111111 | international athletics |\n", + "| 0.0146913580247 | American college and politics |\n", + "| 0.0136419753086 | science and research |\n", + "| 0.0131481481481 | team sports |\n", + "| 0.0129012345679 | general politics |\n", + "| 0.010987654321 | Business |\n", + "+---------------------+-------------------------------+\n", + "[? rows x 2 columns]\n", + "Note: Only the head of the SFrame is printed. This SFrame is lazily evaluated.\n", + "You can use sf.materialize() to force materialization.\n", + "8\n" + ] + } + ], + "source": [ + "paul_krugman = gl.SArray([wiki_docs[int(np.where(wiki['name']=='Paul Krugman')[0])]])\n", + "paul_krugman_low_alpha = average_predictions(tpm_low_alpha, paul_krugman, 100)\n", + "print paul_krugman_low_alpha[(paul_krugman_low_alpha['average predictions'] > 0.3) |\n", + " (paul_krugman_low_alpha['average predictions'] < 0.05)\n", + " ]\n", + "print len(paul_krugman_low_alpha[(paul_krugman_low_alpha['average predictions'] > 0.3) |\n", + " (paul_krugman_low_alpha['average predictions'] < 0.05)\n", + " ])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "__Quiz Question:__ How many topics are assigned a weight greater than 0.3 or less than 0.05 for the article on Paul Krugman in the **high alpha** model? Use the average results from 100 topic predictions." + ] + }, + { + "cell_type": "code", + "execution_count": 53, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "+---------------------+--------------------+\n", + "| average predictions | topics |\n", + "+---------------------+--------------------+\n", + "| 0.629444444444 | art and publishing |\n", + "| 0.010987654321 | Business |\n", + "+---------------------+--------------------+\n", + "[? rows x 2 columns]\n", + "Note: Only the head of the SFrame is printed. This SFrame is lazily evaluated.\n", + "You can use sf.materialize() to force materialization.\n", + "2\n" + ] + } + ], + "source": [ + "paul_krugman_high_alpha = average_predictions(tpm_high_alpha, paul_krugman, 100)\n", + "print paul_krugman_low_alpha[(paul_krugman_high_alpha['average predictions'] > 0.3) |\n", + " (paul_krugman_high_alpha['average predictions'] < 0.05)\n", + " ]\n", + "print len(paul_krugman_low_alpha[(paul_krugman_high_alpha['average predictions'] > 0.3) |\n", + " (paul_krugman_high_alpha['average predictions'] < 0.05)\n", + " ])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Changing the hyperparameter gamma\n", + "\n", + "Just as we were able to see the effect of alpha by plotting topic weights for a document, we expect to be able to visualize the impact of changing gamma by plotting word weights for each topic. In this case, however, there are far too many words in our vocabulary to do this effectively. Instead, we'll plot the total weight of the top 100 words and bottom 1000 words for each topic. Below, we plot the (sorted) total weights of the top 100 words and bottom 1000 from each topic in the high, original, and low gamma models." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now we will consider the following two models:\n", + " - tpm_low_gamma, a model trained with gamma = 0.02 and default alpha\n", + " - tpm_high_gamma, a model trained with gamma = 0.5 and default alpha" + ] + }, + { + "cell_type": "code", + "execution_count": 54, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "del tpm_low_alpha\n", + "del tpm_high_alpha\n", + "tpm_low_gamma = gl.load_model('topic_models/lda_low_gamma')\n", + "tpm_high_gamma = gl.load_model('topic_models/lda_high_gamma')" + ] + }, + { + "cell_type": "code", + "execution_count": 55, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAakAAAEbCAYAAABgLnslAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XmclWX9//HXm1XJDaRUZmRAUVNxQ1JQFNQyxFzQHwiS\nW2VIkoUtZqZImaaZlanh9nXBfV/SzFzGDRQMxDQQl9hdQVQwZfv8/rjvGc6MM4d7YM6cw/B+Ph7n\nMfdynev+nDMz53Pu677u61JEYGZmVopaFDsAMzOz+jhJmZlZyXKSMjOzkuUkZWZmJctJyszMSpaT\nlJmZlayCJylJ/SVNlzRD0hn1lOknaYqkVyQ9WeiYzMxs3aBC3iclqQUwAzgImA9MAoZExPScMpsC\n44GDI2KepI4R8UHBgjIzs3VGoc+k9gJej4hZEbEMuA04olaZY4G7I2IegBOUmZlVKXSSKgPm5KzP\nTbfl2h7oIOlJSZMkHVfgmMzMbB3RqtgBkMTQAzgQ+BIwQdKEiHijuGGZmVmxFTpJzQM656yXp9ty\nzQU+iIjPgM8kPQ3sBtRIUpI8yKCZWTMWEaq9rdDNfZOAbpIqJLUBhgAP1CpzP9BHUktJ7YC9gWl1\nVRYRmR6jR4/OXLYpH45r3Y7Jca37MTmu0o2pPgU9k4qIFZJGAo+SJMRrI2KapOHJ7rgqIqZL+gfw\nMrACuCoi/lPIuMzMbN1Q8GtSEfEIsEOtbVfWWr8YuLjQsZiZ2bqlWY440a9fv2KHUCfHlV0pxgSO\nqyFKMSZwXA1RCjEV9GbexiQp1pVYzcysYSQRdXScKIUu6Ga2HujSpQuzZs0qdhhWZBUVFcycOTNz\neZ9JmVmTSL8pFzsMK7L6/g7qO5NqltekzMyseXCSMjOzkuUkZWZmJctJyszWe127duWJJ54odhhW\nBycpMyua8oouSCrYo7yiS7Ffoq0ld0E3s6KZN3sWF0xeWrD6z+zRpmB1W9PwmZSZWY6lS5fy4x//\nmLKyMsrLyxk1ahTLli0DkhEY7r33XgCee+45WrRowd///ncAnnjiCfbYY4866/zss8844YQT6NCh\nAzvvvDO///3v2Xrrrav3X3jhhXTr1o1NNtmE7t27c99991Xvu+GGG+jTpw+nn3467du3p1u3bkyY\nMIEbbriBzp07s+WWW3LjjTdWlz/ppJM49dRTGTBgABtvvDH77bcf7777LqNGjaJDhw7stNNOTJ06\nNdOxS4GTlJlZjvPOO4+JEyfy8ssvM3XqVCZOnMh5550HQN++famsrATg6aefZtttt+Xpp58G4Kmn\nnqp3GKFzzz2X2bNnM3PmTP75z39y0003Ia26Jahbt24899xzfPzxx4wePZpvf/vbvPvuu9X7J06c\nyO67787ChQsZOnQoQ4YM4cUXX+TNN99k3LhxjBw5kk8//bS6/J133sn555/PggULaNOmDb1796Zn\nz54sWLCAo48+mlGjRmU+drE5SZmZ5bjlllsYPXo0m2++OZtvvjmjR49m3LhxQJKknnrqKSBJUmee\neWb1+lNPPUXfvn3rrPPOO+/krLPOYpNNNqFTp06cdtppNfYfffTRbLHFFgAMGjSI7bbbjokTJ1bv\n79q1K8cffzySOOaYY5g7dy6jR4+mdevWfOMb36BNmza88caqKfgGDhzI7rvvTps2bRg4cCAbbrgh\nw4YNq37+Sy+9lPnYxeYkZWaWY/78+XTuvGqu1oqKCubPnw9A7969mTFjBu+99x5Tp07l+OOPZ86c\nOSxYsICJEyey//7711tneXl59XpuUx/AjTfeyB577EH79u1p3749r776Kh988EH1/qokArDhhhsC\n0LFjxxrbFi9eXG/52uu5ZVd37GJzkjIzy9GpU6caYwzOmjWLTp06AckH/J577smf//xnunfvTqtW\nrejduzeXXHIJ3bp1o0OHDvXWOXfu3Or12bNn11j+/ve/zxVXXMGHH37Ihx9+yM4779wkQ0gV89hZ\nOUmZmeUYOnQo5513Hh988AEffPABv/nNbzjuuOOq9++///5cdtll1U17/fr1q7Fel0GDBnHBBRew\naNEi5s2bx+WXX169b8mSJbRo0YKOHTuycuVKrrvuOl555ZW8Ma5tEql6/pocu6k5SZnZei+3E8Ov\nfvUrevbsya677spuu+1Gz549Oeuss6r39+3bl8WLF1c37VWt50tS55xzDmVlZXTt2pWDDz6YQYMG\n0bZtWwB23HFHfvKTn9CrVy+23HJLXn31Vfr06ZM53rrWs77eNTl2U/Mo6GbWJOoa/bq8ogvzZhdu\n+o6yzhXMnTWzYPWvqbFjx3L77bfz5JNPFjuUJudR0M1snTF31kwiomCPUklQ77zzDuPHjycieO21\n1/jDH/7AUUcdVeyw1gkeccLMrMCWLl3K8OHDmTlzJpttthlDhw5lxIgRxQ5rneDmPjNrEp700MDN\nfWZm1ow4SZmZWclykjIzs5LlJGVmZiXLScrMzEqWk5SZWQONGDGC3/72t41eNp9Zs2bRokULVq5c\nudZ1NdRJJ53EOeeck6ls165deeKJJxrt2L5PysyKpkt5ObPmzStY/RVlZczMGdi1sfz1r38tSNnV\naejwR82Bk5SZFc2sefNYdO65Bat/swLUvXLlSlq0cCNUU1ntOy3pcEkbp8s/lXSLpF0KH5qZWdOY\nPn06BxxwAO3bt2eXXXbhwQcfrN530kkn8YMf/IBDDz2UjTfemMrKyi80f1100UV06tSJ8vJyrr32\nWlq0aMFbb71V/fyqsk899RRbb701l1xyCVtssQVlZWVcf/311fU8/PDD9OjRg0033ZSKigrGjBmT\n+TV07dqViy++mN12242NN96Yk08+mffee48BAwawySabcPDBB/PRRx9Vl3/ggQfo3r07HTp04MAD\nD2T69OnV+6ZMmcKee+7JpptuypAhQ/jss89qHOtvf/tb9RxUffr04d///nfmOBsqy9eB8yLiE0m9\ngCOBu4Grsh5AUn9J0yXNkHRGHfv7SlokaXL6+FX28M3M1s7y5cs57LDD6N+/P++//z6XXnopw4YN\n4/XXX68uc+utt3L22WfzySefsO+++9Z4/iOPPMKf/vQnnnjiCd544w0qKyvzNsu98847fPLJJ8yf\nP59rrrmGU089tTp5bLTRRowbN46PPvqIhx56iLFjx/LAAw9kfi333HMPjz/+ODNmzOCBBx5gwIAB\n/O53v+ODDz5gxYoVXHrppQDMmDGDY489lksvvZT333+fQw45hMMOO4zly5ezbNkyBg4cyAknnMDC\nhQsZNGgQd999d/UxpkyZwne/+12uvvpqFi5cyPDhwzn88MNZtmxZ5jgbIkuSWpH+PAy4MiLuBtpm\nqVxSC+Ay4JvAzsBQSV+to+jTEdEjfZyXpW4zs8bw/PPPs2TJEs444wxatWrFAQccwLe+9S1uvfXW\n6jJHHHEEvXr1AqieYqPKnXfeyUknncRXv/pVNthgA85dTRNjmzZtOPvss2nZsiWHHHIIG220Ea+9\n9hqQzFW18847A9C9e3eGDBlSPT19Fj/84Q/p2LEjW221Ffvttx977703u+66a/U08lOmTAHgjjvu\n4Fvf+hYHHnggLVu25Kc//SmfffYZ48eP5/nnn2f58uWcdtpptGzZkqOPPpqvfe1r1ce4+uqrOeWU\nU+jZsyeSOO6442jbti3PP/985jgbIkuSelfSn4EhwEOS2gAtM9a/F/B6RMyKiGXAbcARdZRb/64G\nmllJmD9//hemc6+oqGBeToeO2vvzPX/rrbfOO0bh5ptvXuOaVrt27aqnc3/hhRc48MAD+cpXvsJm\nm23GlVde2aCp3LNOGz9//nwqKiqq90mivLycefPmMX/+fMrKymrUm1t21qxZ/OEPf6BDhw506NCB\n9u3bM3fuXObPn585zobIkqQGAS8Ah0fEQuDLwFn5n1KtDJiTsz433VZbb0kvSXpI0k4Z6zYzW2ud\nOnVizpw5NbbNnj27xgd1vua7rbba6gtTw69pL7xhw4Zx5JFHMm/ePBYtWsTw4cMLMihvp06dmDWr\n5jxec+bMoays7AuvB2pOd7/11ltz1llnsXDhQhYuXMiHH37I4sWLOeaYYxo9TsiTpCS1k9SOpLnv\nPuC/6fqHQON1god/AZ0jYneSpsH7GrFuM7O89t57b9q1a8dFF13E8uXLqays5G9/+xtDhw7N9PzB\ngwdz3XXXMX36dD799FPOO2/Nr1gsXryY9u3b07p1ayZOnMgtt9xSY39jJazBgwfz0EMP8eSTT7J8\n+XIuvvhiNthgA/bZZx969+5N69at+ctf/sLy5cu55557mDhxYvVzTz75ZMaOHVu9bcmSJTz88MMs\nWbKkUWKrLV8X9DeBqndkC+BTkma5DYF3gU4Z6p8HdM5ZL0+3VYuIxTnLf5d0haQO6VlbDbltvf36\n9aNfv34ZQjAzq1/r1q158MEHGTFiBOeffz7l5eWMGzeO7bbbDqj7LCp3W//+/TnttNM44IADaNmy\nJWeffTbjxo37wrWr+uTWdcUVV3D66aczcuRI+vbtyzHHHMOiRYvqLJuvntWV3X777bnpppsYOXIk\n8+fPZ/fdd+fBBx+kVaskJdxzzz1873vf41e/+hUDBgzg6KOPrn7unnvuydVXX83IkSN544032HDD\nDenTpw99+/Zd7XFzVVZWUllZudpyq51PStIVwGMRcU+6PhA4KCJGrrZyqSXwGnAQ8DYwERgaEdNy\nymwREe+my3sBd0RElzrq8nxSZuuwuuYRWldv5s1n+vTp7LLLLnz++ee+n6oODZ1PKkuS+ndE7FJr\n28sRsWvGgPoDfyZpWrw2In4naTgQEXGVpFOBEcAy4H/AqIh4oY56nKTM1mHNedLD++67jwEDBrBk\nyRJOPPFEWrVqVaPbtq1SiCT1GPAIcFO6aRgwICIOWvtws3OSMlu3NeckdcghhzBhwgRatWpFv379\nuPzyy2v0rLNVCpGkvgycB+xPco3qaeCciHivUSLOyEnKbN3WnJOUZdfQJJV37L70mtKPI2J444Vo\nZmaWTd6rehGxAvh6E8ViZmZWQ5ZR0F+UdCdwB1DdET4iHi5YVGZmZmRLUh2A5cBROdsCcJIyM7OC\nWm3HiVLhjhNm6zZ3nDBoeMeJLPNJbSnpVklz08fNkrZspHjNzIou35Tnzz77LDvuuGOmeqrmi7LG\nk+V26OtJup1vnz6eSbeZma2VLhVbIqlgjy4Va/99uk+fPkybNm31BVPr4xTvhZTlmtSWEfHXnPWx\nkk4pVEBmtv6YNftdIvvnf4Npx3cLV7k1iSxnUosk/b+qFUlHA4vylDczW+dMmTKF3Xbbjfbt2zN0\n6FCWLl0KfLEJb/LkydVTvA8ePJghQ4bUmEo+IuqdHr62mTNn0rdvXzbddFMOPvhgRo4cyXHHHVe9\nf/DgwWy11Va0b9+efv368Z///Kd630knncSpp57KgAED2Hjjjdlvv/149913GTVqFB06dGCnnXZi\n6tSp1eUbOr18vmM3pSxJ6jvA9yUtlLQAODndZmbWbNx55508+uij/Pe//2Xq1Kk1kktVE96yZcs4\n6qij+M53vsPChQsZOnQo9957b4168k0PX9uxxx5Lr169WLBgAaNHj2bcuHE1mgsHDBjAm2++yXvv\nvUePHj0YNmzYF2I+//zzWbBgAW3atKF379707NmTBQsWcPTRRzNq1Kga5bNOL5/l2E0l33xSXwKI\niLci4uCI6BARm0dE/4h4q+lCNDMrvB/96EdsscUWbLbZZhx22GG89NJLXygzYcIEVqxYwciRI2nZ\nsiUDBw5kr732qlEm3/TwuebMmcOLL77ImDFjaNWqFfvuuy+HH354jTInnngi7dq1o3Xr1pxzzjlM\nnTqVTz75pHr/wIED2X333aunh99www0ZNmwYkjjmmGO+8BqyTi+f5dhNJd+Z1BxJkyX9RdJQSe6y\nYmbNVu6AsLlTuud6++23vzC1eu3efPmmh881f/58OnTowAYbbFBnXStXruQXv/gF3bp1Y7PNNqNr\n165IqjGdfNbp4htaPsuxm0q+JLU5cCLwKnAI8KSkOZJul3RaUwRnZlZKttpqK+bVmv+q9tTzDalr\n4cKFfPbZZ3XWdfPNN/Pggw/yxBNPsGjRImbOnElENMm9ZsU8dm31JqlIvBwRYyPieJJR0C8E9gQu\nbqoAzcxKRe/evWnZsiWXX345K1as4P77768xtXpDdO7cmZ49e3LuueeybNkyJkyYwIMPPli9f/Hi\nxbRt25b27duzZMkSzjzzzAZ3b1/TpNIYx24s+a5J9ZA0UtItkiYCVwAbAd8D2jdVgGZmhZb1A7h1\n69bcc889XHPNNbRv355bbrmFww47LO9U8fnqvvnmmxk/fjwdO3bknHPOYciQIdV1HX/88XTu3Jmy\nsjK6d+/OPvvs07AXVevYDZlevjGO3VjqHRZJ0kpgMvBHkindlzVlYHXE42GRzNZhdU4fX7Els2YX\n7l6mis5bMHPWOwWrH6BXr16MGDGCE044Ya3rGjJkCDvuuCOjR49uhMhKU6NNeiipC7BP+tiTZHr3\nicAEYEJEzG+0qDNwkjJbtzWXsfuefvppdthhBzp27MhNN93ED37wA9566601mon3xRdfpEOHDnTt\n2pV//OMfHHXUUUyYMIHddtutAJGXhkab9DAiZgIzgVvSCjYi6UhxIdAVaNkYAZuZrUtee+01Bg8e\nzKeffso222zD3XffvcZTxb/zzjscddRRLFy4kPLycsaOHdusE9SayHcmtSGwN6vOpvYC3gbGA89F\nxE1NFWQaj8+kzNZhzeVMytZOYzb3fQhMImneew54PiI+btxws3OSMlu3OUkZNGJzH7B5RKxszODW\nVtYeOGWdK5g7a2ZhgzEzs4LLd02qpBIUwAWTl2Yqd2aPNgWOxMzMmkKWAWbNzMyKIst8UmZma62i\nosITAhoVFRUNKp83SUnqCxwJVI2oOA+4PyIq1yQ4M1t/zZw5s9gh2Doo37BIFwHnAFOBq9LHVOBs\nSb9vmvBKX3lFl8xTWZdXdCl2uGZm65R8Z1JHRsT2tTdKugGYAfysYFGtQ+bNnlVyHTrKK7owb/as\nzOXdG9LMSlW+JLVU0m4RMbXW9l2BzwsYk62lhiROcG9IMytd+ZLU94AbJAVQNclJZyDSfWZmZgWV\n7z6p54Hd04FmqztOpGP6mZmZFVyW+6Q2rfVoEEn9JU2XNEPSGXnKfU3SMklHNfQYZmbWPNV7JiXp\nAGAsMJ+k6zlAuaStgFMi4snVVS6pBXAZcFBazyRJ90fE9DrK/Q74xxq9CjMza5byXZO6DDg0It7I\n3ShpO+B+YKcM9e8FvB4Rs9Ln3gYcAUyvVe6HwF3A1zLGbeughvQ6dI9DM4P8SaoN8FYd2/+b7sui\njFWdLgDmkiSuapI6kXR3P0BSjX3WvJRid30zK235ktRNwARJt7Aq0WwNHJvuayx/AnKvVdU7bspj\nY39dvbxNz75s07NvI4Zh6yuf4Zk1vcrKSiorK1dbLl/vvjGSHiBpnuuebp4HjIiIyRnjmEfSbb1K\nOauub1XpCdymZFCvjsAhkpZFxAO1K/v6KedkPKxZdj7DM2t6/fr1o1+/ftXrY8aMqbNc3rH7ImIK\nMGUt4pgEdJNUQTKr7xBgaK1jbFO1LOk64MG6EpSZma1/1miqDkn3ZSkXESuAkcCjwKvAbRExTdJw\nSd+v6ylrEo+ZmTVP+bqg19d7T9Tq/JBPRDwC7FBr25X1lP1O1nrNzKz5y9fc92/gBeruyNC+MOGY\nmZmtki9JvQYcFxFv1t4haU4d5c2skXgke7NEviT1G+q/H8rTdJgVkEeyN0vk64J+a559txUmHDMz\ns1XWqHefmZlZU3CSMjOzkuUkZWZmJSvviBMAkloDJwN9SG62fRa4JiKWFTg2MzNbz602SQHXkySn\ncen6UGA/koFmzczMCiZLkuoRETvmrP9d0rRCBWRmZlYlyzWplyXtUbUiaXfgpcKFZGZmlshyJvVV\nkmnfq2bo3Q54RdIkICKi5CYqbNuyJcnMH9lUlJUxc+7cAkZkZmZrIkuSGlzwKBrZ5ytWsOjcczOX\n36wBZddGQ5KnE6eZWYYkFRGvSfoqSe8+gGcjYnphw2qeGpI8mypxmpmVstVek5I0AriPpJlve+Be\nScMLHZiZmVmW5r4RwNci4hMASb8BngPqnBPKzMyssWTp3Sfg85z1z6l7jikzM7NGleVM6hZgvKS7\n0vWjgZsKF5I1NXfoMLNSlW/6eEXiAklPsarjxGkRMaFpwrOm4A4dZlaq8p1J/QvoARAR44HxTRKR\nGb7XzcwS+ZKUrztZ0ZTqvW5m1rTyJakvSzqtvp0RcWkB4jEzM6uWr3dfS6Aj8OV6Hma2nimv6IKk\nTI/yii7FDteagXxnUm9HxDlNFomZlbx5s2dxweSlmcqe2aNNgaOx9UG+MylfkzIzs6LKl6QObrIo\nzMzM6lBvkoqI95syELN1QVXX+CyPLuXlxQ7XbJ2X72beVhGxvCmDMSt1pXrjs0cNseYqX8eJiUAP\nSddHxIlNFI+ZrYFSTZ5maytfkmojaTCwn6TDa++MiAcKF5aZWXblFV2YN3tWprJlnSuYO2tmYQOy\nRpMvSZ0KfBvYDBhUa18AmZKUpP7An0iuf10bERfW2n848BtgJbAMGBURz2WKvpG0bUP2ppLOWzBz\n1jsFjsjMGsJd45uvepNURDwFPCXpxYhYo7mjJLUALgMOAuYDkyTdX2tm38eqzsok7QLcAey4Jsdb\nU58vhZiWrax2fLewwZg1Ex5/0RpDlqk6/k/SD4D90/WngKszdqrYC3g9ImYBSLoNOAKoTlIR8WlO\n+Y1IzqjMbB3n8RetMWRJUpcBXwL+L13/NrAH8P0Mzy0D5uSszyVJXDVIOhK4gGS4pUMz1GtmZuuB\nLEmqV0TslrP+qKSpjRlERNwH3CepD3Ae8I26yj029tfVy9v07Ms2Pfs2Zhhmth5wd/3SUFlZSWVl\n5WrLZUlSKyV1iYiZAJK6kL1Jbh7QOWe9PN1Wp4h4VtI2kjpExMLa+79+iocSNLO14+76paFfv370\n69even3MmDF1lss3LFKVM4BnJD0m6XGSa1I/yxjHJKCbpApJbYAh1OoVKGnbnOUeQJu6EpSZ2bqk\nISPGe9T4+q32TCoiHpW0Pat63E2LiP9lqTwiVkgaCTzKqi7o0yQNT3bHVcDRko4HlgL/AwavyQsx\nMyslDekWD+4aX58szX2kSWnymhwgIh4Bdqi17cqc5YuAi9ak7uasIfduge/fMrPmKVOSsqbXkHu3\nwPdvmVnzlOWalJmZWVGsNklJul3SN9WQtidrtqqaITNNVVGx5Xofl5mtnSzNfdcB3wEuk3Q7cH1E\nvFHYsKxUleoQUqUal5mtndWeSUXEIxFxDMlIEe8AT0p6WtJxknxNy8zMCibTNSlJ7YFjgeOAl4Er\ngX2ARwoXmpmZre9WeyYk6U5gF+Bm4OiIqBoj5GZJUwoZnJmZrd+yNNddRTKdRlRtqJpaPiL2KFxo\nZma2vsvS3HdhboJKTSxEMGZmZrnqPZOS9BVgK2DDdDLCqi7omwDtmiA2MzNbz+Vr7juUpOt5OXBF\nzvZPgLMLGZSZmRnknz7+OuA6SYMj4o4mjMmsWfD4i2ZrL19z39CIuBXYStJptfdHxKUFjcxsHefx\nF60hPBlj3fI197VPf3ZsikDMzNZnTTUZY3lFF+bNnpWpbFnnCubOmrnGx2oM+Zr7rkh/+vqTWTPS\nkGZIN0E2Pw2Z56oU5rjK19x3Sb4nRsTpjR+OmRWaxzm0dUm+5r5XmywKM1vv+QzP6pKvue/apgzE\nzNZvPsOzuuRr7vtDRPxE0r1A7REniIijChqZmZmt9/I1992e/rysKQIxMzOrLV9z38T05+OSWgPb\nkZxRvR4Ry5soPjOzovF1suLLMlVHf5KR0GeTjN9XLunkiHi00MGZmRWTr5MVX5apOv4EfD0iZgBI\n2h64H9ixkIGZmZllmapjcVWCAkiXlxQuJDMzy6eqGTLLo0vFlsUOd63k6913eLo4UdIDwB0k16QG\nAS80QWxmZlaH9akZMl9z36Cc5Y+Ab6bLnwAbFywiMzOzVL7efcc1ZSBmZma1Zend1xY4EdgZ2KBq\ne0R8v3BhmZmZZes4cSPQBfgWybWobYHPChiTmZkZkC1JbR8RZ5L08rsW6A/sVdiwzMzMsiWpZenP\nRZJ2JOk08ZWsB5DUX9J0STMknVHH/mMlTU0fz0raJWvdZmbWvGW5mfdaSe2B0cA/gHbAOVkql9SC\nZOy/g4D5wCRJ90fE9JxibwH7R8RH6egWVwO9GvAazMysmVptkoqIK9PFJ4HODax/L5Kx/mYBSLoN\nOAKoTlIR8XxO+eeBsgYew8zMmqnVNvdJai/pj5ImSnpB0sXpmVUWZcCcnPW55E9C3wP+nrFuMzNr\n5rJck7oN+BgYBnyb5Gbe2/M+Yw1IOgA4CfjCdSszM1s/ZbkmVRYRo3PWx0h6JWP986jZRFiebqtB\n0q4kI633j4gP66vssbG/rl7epmdftunZN2MYZmbWUG1btsw+VUlZGTPnzs1cd2VlJZWVlastlyVJ\nPS7p/0XEXQCSjgL+mTGOSUA3SRXA28AQYGhuAUmdgbuB4yLizXyVff2UTP01zMysEXy+YgWLzj03\nU9ktzj83c0KDL86/NWbMmDrL5Rtg9kOSAWUF/FDS8pznLAJGrS6IiFghaSTwKEnT4rURMU3S8GR3\nXAWcDXQArlDyCpdFhO/DMjNbhzRk0FvIPvBtvjOpjtkPV7+IeATYoda2K3OWTwZOboxjmZlZ85Jv\ngNkVVcuSBgD7p6uVaeIxMzMrqCxd0H8L/Jzkptu3gJ9LOq/QgZmZmWXpOHEYsEfVmZWk/wMmA78q\nZGBmZmZZ7pMC2CRn2RMemplZk8hyJnURMFnS4yQ9/fqR9MgzMzMrqLxJKu0S/jjJuH17p5vPiYgv\n3JBrZmbW2PImqYgISf+MiO7APU0Uk5mZGZDtmtRLkvYoeCRmZma1ZLkmtQfJPFBvAktIrktFRPQo\naGRmZraCEh0KAAARKElEQVTey5KkDi94FGZmZnXIN3ZfW5LhiroB/wauzx2FwszMrNDyXZO6HugD\nvA4cCVzcFAGZmZlVydfc1z0idgGQdBXwQtOEZGZmlsh3JrWsaiEiluUpZ2ZmVhD5zqR2k7QwXRaw\ncbpe1buvQ8GjMzOz9Vq+JNWmyaIwMzOrQ6b5pMzMzIoh6yjoZmZmTc5JyszMSpaTlJmZlax8I058\nCERdu3DvPjMzawL5evd1bLIozMzM6pC5d5+kDsAGOZvmFyooMzMzyHBNStKhkmYAc0mGRpoLPFHo\nwMzMzLJ0nPgtsC/wWkRsDXwTeKagUZmZmZEtSS2PiPeBFpIUEf8E9ipwXGZmZpkmPfxI0kbAs8CN\nkt4D/lfYsMzMzLKdSR1JkpR+DFQC84BvFTAmMzMzIFuSOjMiVkTEsoi4NiIuAU4vdGBmZmZZklT/\nOrYd2tiBmJmZ1VZvkpI0XNIUYAdJk3MerwPTsh5AUn9J0yXNkHRGHft3kDRe0meSfIZmZmbV8nWc\nuAN4HLgA+EXO9k8i4r0slUtqAVwGHERy8+8kSfdHxPScYguAH5Jc+zIzM6tW75lURHwYEW9ExCCS\nkSa+kT6+3ID69wJej4hZ6RT0twFH1DrOBxHxL2B5g6M3M7NmLcuIE6cCdwKd08cdkn6Qsf4yYE7O\n+tx0m5mZ2WpluU9qOLBXRCwGkHQ+MB64opCBmZmZZUlSApbmrC9Lt2Uxj+Tsq0p5um2NPDb219XL\n2/TsyzY9+65pVWZmVmTnnnvuasvkm0+qVUQsB8YBL0i6O901ELghYwyTgG6SKoC3gSHA0Dzl8ya/\nr59yTsbDmplZqctNUmPGjKmzTL4zqYlAj4i4SFIl0CfdfkpETMoSQESskDQSeJTk+te1ETFN0vBk\nd1wlaQvgRWBjYKWkHwE7VTUvmpnZ+itfkqo+q4mIiSRJq8Ei4hFgh1rbrsxZfhfYek3qNjOz5i1f\nkvpyvptr0+GRzMzMCiZfkmoJbET2ThJmZmaNKl+Sejsifp1nv5mZWUHlu5nXZ1BmZlZU+ZLUQU0W\nhZmZWR3yjd23sCkDMTMzqy3LfFJmZmZF4SRlZmYly0nKzMxKlpOUmZmVLCcpMzMrWU5SZmZWspyk\nzMysZDlJmZlZyXKSMjOzkuUkZWZmJctJyszMSpaTlJmZlSwnKTMzK1lOUmZmVrKcpMzMrGQ5SZmZ\nWclykjIzs5LlJGVmZiXLScrMzEqWk5SZmZUsJykzMytZTlJmZlaynKTMzKxkOUmZmVnJKniSktRf\n0nRJMySdUU+ZSyW9LuklSbsXOiYzM1s3FDRJSWoBXAZ8E9gZGCrpq7XKHAJsGxHbAcOBsYWMyczM\n1h2FPpPaC3g9ImZFxDLgNuCIWmWOAG4EiIgXgE0lbVHguMzMbB1Q6CRVBszJWZ+bbstXZl4dZczM\nbD3kjhNmZlayFBGFq1zqBZwbEf3T9V8AEREX5pQZCzwZEben69OBvhHxbq26CheomZkVXUSo9rZW\nBT7mJKCbpArgbWAIMLRWmQeAU4Hb06S2qHaCgrqDNzOz5q2gSSoiVkgaCTxK0rR4bURMkzQ82R1X\nRcTDkgZIegNYApxUyJjMzGzdUdDmPjMzs7XR7DpOZLl5uKlJulbSu5JeLnYsVSSVS3pC0quS/i3p\ntGLHBCCpraQXJE1J4xpd7JiqSGohabKkB4odSxVJMyVNTd+vicWOp4qkTSXdKWla+je2dwnEtH36\nPk1Of35UCn/3kkZJekXSy5JultSm2DEBSPpR+j9Y1M+HZnUmld48PAM4CJhPck1sSERML3JcfYDF\nwI0RsWsxY6kiaUtgy4h4SdJGwL+AI4r9XgFIahcRn0pqCTwHnBYRRf8AljQK2BPYJCIOL3Y8AJLe\nAvaMiA+LHUsuSdcDT0XEdZJaAe0i4uMih1Ut/ayYC+wdEXNWV76AcXQCngW+GhFLJd0OPBQRNxYr\npjSunYFbga8By4G/A6dExFtNHUtzO5PKcvNwk4uIZ4GS+hCJiHci4qV0eTEwjRK5Py0iPk0X25Jc\nNy36NylJ5cAA4Jpix1KLKLH/Y0mbAPtFxHUAEbG8lBJU6uvAm8VMUDlaAl+qSuYkX7CLbUfghYj4\nPCJWAE8DRxUjkJL6424EWW4etlokdQF2B14obiSJtFltCvAO8M+ImFTsmIA/Aj+jBBJmLQH8U9Ik\nSScXO5hUV+ADSdelTWtXSdqw2EHVcgzJmUJRRcR84A/AbJKBDBZFxGPFjQqAV4D9JLWX1I7kC9rW\nxQikuSUpa6C0qe8u4EfpGVXRRcTKiNgDKAf2lrRTMeORdCjwbnrmqfRRKvaNiB4kHyKnpk3LxdYK\n6AFcnsb2KfCL4oa0iqTWwOHAnSUQy2YkrT0VQCdgI0nHFjcqSJv9LwT+CTwMTAFWFCOW5pak5gGd\nc9bL021Wh7R54S5gXETcX+x4akubiJ4E+hc5lH2Bw9PrP7cCB0gq6jWDKhHxdvrzfeBekibvYpsL\nzImIF9P1u0iSVqk4BPhX+p4V29eBtyJiYdqsdg+wT5FjAiAirouInhHRD1hEcr2/yTW3JFV983Da\nQ2YIyc3CpaDUvoED/B/wn4j4c7EDqSKpo6RN0+UNgW8ARe3MERG/jIjOEbENyd/UExFxfDFjgqSD\nSXomjKQvAQeTNNMUVXoz/hxJ26ebDgL+U8SQahtKCTT1pWYDvSRtIEkk79W0IscEgKQvpz87AwOB\nW4oRR6FHnGhS9d08XOSwkHQL0A/YXNJsYHTVReUixrQvMAz4d3r9J4BfRsQjxYwL2Aq4Ie191QK4\nPSIeLnJMpWoL4N50yLBWwM0R8WiRY6pyGnBz2rT2FiVyk356feXrwPeLHQtAREyUdBdJc9qy9OdV\nxY2q2t2SOpDE9YNidX5pVl3QzcyseWluzX1mZtaMOEmZmVnJcpIyM7OS5SRlZmYly0nKzMxKlpOU\nmZmVLCep9YikDjlTFbwtaW7OeoPumUunH9luLeNpJ+nJtakjp65RDZ3iQNJBku6tY/t3Jf2xMeJq\nYDyrfU8ljZP0hRHYJXWVdMwaHPOSdCqG82ttP0BSo4xeocQjkj6UdE+tfdsomZplhqSb0pHvq/Zd\nIel1SS9JarLZAyRtm947WN/+tpKeSm++tQJzklqPpEOv7JGOp/ZX4JKq9YhY3sC6vhsRr69lSN8D\n7ljLOkg/2E4HNliDp9d3o2CT30C4lu/ptiSjYWSWfsieFBG7RMQva+0+EOi9hrHUEMnNmBcCJ9Sx\n+/fA7yJie+B/wIlpbIcBZRGxHXAqcEVjxFKX3MSYo97ff0R8DlQCgwoVk63iJLX+qvEtUNLP02/U\nL6ejdlR9o3xF0q2S/iPpNklt033PVH27lXSopH+lZ2WPpNsOTL8BT5b0Yj2jYA8D7k/Ld0rrnJzG\n0Cvd/u10/WVJv023tUy/lf9R0ksko5N/BXhG0qNpmUMkjU+PfWvV8dNYp0t6kfzTuHSRVCnpNUm/\nTJ/7W0mn5rxnv5M0otb7+AtJp6TLf5H0j3T5G0rmWKqamLOu2HLf0+HpsSdIulrSJTmHOVDSc5Le\nkFT1Gi4A+qXv38haMUnSH9Lf71RJVVMu/I1kQNPJOduQtA3JF4ifpvt6SeqiZJLMlyT9Q8k8SFVn\ndlekr2W6pDrHWYyIJ4ElteJqAewP3JduugE4Ml0+Argxfe5zwBaSNq/1/CGSLkyXfyLptXR5O0mV\n6fLB6d/lVElXKm0xkDRH0gWS/gUcKalnWmYycErOMXaRNDF9H15SMmMAJH+3w+p6rdbIIsKP9fAB\njAZOT5f3IhmOpQ2wEck4azuTfDtfCXwtLXcDyQSEAM8Au5IMzTMLKE+3b5b+fDjnee1IRzfJOX5b\nYG7O+s+Bn6XLSp9TBvwXaE8y504lyWjfLdO4jsh5/mxg43T5y2nZDdL1X5KMwr0hyVQuXdLtdwH3\n1PHefDctt0kax6vpa90WmJiWaQG8CWxa67n7kgxPBMlkds+nr+fXJEMD1Rlbrfe0nGQooU1Ihjt6\njuSsF2BcTv27ANPS5YPqei3pvsEkE+mR/r5mAx3T93FhPc/5TdXvOuf3OSRdPhm4MyeeB9Ll7dO6\nW9dTZ40Y01j+k7PeBZicLv8d2CtnXyWwa636yoDn0uV7Saaa+TLwHWBMHb/vm0iG9yHd/uOcul4B\neqXLl+TEcQUwKF1uDbRJl1uSjIxf9P/l5v7wmZQB9AHujoilkUzXcR+wX7rvrVg1n9NNadlcvUkG\nXJ0LEBGL0u3PAZem3+o3jfQ/O8dXgIU565OA70k6G9glkokP9wYej4gPIxkh+haSb94An0fNkdtz\nB/DdB9gJGK/k2sKxJB+AOwGvRcTMtNzNed6Tf0TEx2kc9wF9IuJN4GMls5YeQjIp3Ee1njcJ+JqS\nQXIXp+t7kryfz9QTW0WtOqpe98eRNMPeVWv/fQAR8W+S6R1Wpw/pgKqRDP76DNAz3Zf1usrewO3p\n8o3U/Du4I617BkmSWqtrlVlFxDygg5Lx+LZM4+jLqvd6R2r+vm9k1d8PpK8nPUPbICKeT7ePyykz\nHjhb0s+AzhGxND32CmBlVcuCFY6TlDVUXW31X/igi4jfknzj3gh4XtK2tYr8j5xrSJE0B/UD3iYZ\nYHZofXXnPL8+Av4eybW2PSKie0SckrMvi9qvs2r9WpIzopNIRpGvWSj5EJsPHE9yJvUMyRlE54h4\no57YRtSuZzVxfp6xXH1yn5P12lu+crn71IA63wc6StUdEHKn1plHzUn26pt253mSM99XSd7r/UgS\n6viceOqzJM8+ACLiJpImyM+BR1Rzvq42kVyfsgJykjJI/rkHKum1tBHJ9YBn0n1dJe2ZLh+bs73K\neJJrIZ0BJLVPf24TEa9ExO+AycAOuU+KiA+ADXOuEXQmaT65Brge2IOk+aafktlBW5F0DKhMq6j9\n4fMxSfNYVUx9JXVN624nqRtJM2bVVC4imbKhPgdL2iT9ln4EyZkhJPP9HAbsFvXPoPoM8FOSKbef\nJbnwXzW3Un2x5ZqYvu5NlIwinm/a7qr34RNg4zzxDEmvTW1BcjZXFU99H+KfsOr9hCQZDE6Xj0tf\nW5VB6WvZniSZ1Nf5o8Z0NRGxMo2t6vWdQHqNkmSKnePTevsA70TEgjrqfJbkvX6K5O/sm8An6Rnw\nNJLfd5e07LdZ9fdTLa33f5L2TjdVX2uS1DUi3oqIS0mu4VVdM/wKnquuSThJGWlz3q0kH1zjSWZU\nfTXdPQ04XdJ/SNr4r656Wvrc94ARwP1p89VN6f6fphfqXyL5wKtrConHWDXB20FA1YXrgcBf0uac\ns1n1ATQ+Vk0lUvvb+tXAY5IeTWP6HnB7evzngO0i4n9prI+QJIL5ed6WSSQflFNIrgG9nL7ez0k+\noPPNR/QMSXPm85FMD740fU7V+/Xd2rHlvqaImEPS621S+rw3gY9yy+SoWp8CtEo7CYysVeYukjm5\nXib5PYxKvyTUVV+V+4HBSjrE9CJJtMPTmAcBo3LKzlPSEeV+4OSoo6eopPEkzasHS5ot6YB018+B\nMyTNAL5E8gUF4EFgvqQ3gMvT49flGZLE+HR63Lmseq//R/Je3ytpKvAZcE09r/s7wFXp31/uDLTH\nKuk8NIXk91T1930A8FA9MVkj8lQdVq+0ie6uSKZyL0T9PYEREfHdQtRfCGmPtCkknTZmFvA4X4qI\nJekZ5P3AFRFRch+KksaRdKIolclFm4Sk+0iS/X+LHUtz5zMpW52CfYuJZHrxZwtVf2OT1B14A3i4\nkAkq9Zv0W/1UYHopJqjUevctV8lN43c6QTUNn0mZmVnJ8pmUmZmVLCcpMzMrWU5SZmZWspykzMys\nZDlJmZlZyXKSMjOzkvX/AbawIRe6AqQgAAAAAElFTkSuQmCC\n", + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAakAAAEbCAYAAABgLnslAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XmcFNW5//HPw66RZRAFYWQgIorghqigGEdNVHBBNCKI\nG/ozuHBNSHKvGhfwRmM0anKNWzReRVzirqBEURE1bqMBcUVwYdgEBEQFr7I9vz/q9NDTzPTUDFPd\nDfN9v179mq6qc6qe7pnpp+vUqXPM3RERESlEjfIdgIiISHWUpEREpGApSYmISMFSkhIRkYKlJCUi\nIgVLSUpERApW4knKzI40s5lmNsvMLqymzI1mNtvM3jGzvWqqa2ZFZjbZzD42s2fNrHVY/1Mze9vM\nZpjZW2Z2SFqd3mb2btjXX5J8zSIiUj8STVJm1gi4CTgC6AkMM7NdM8oMAHZy952BkcBtMepeBDzv\n7rsAU4CLw/ovgaPdfU/gDGB82qFuBc5y9+5AdzM7op5froiI1LOkz6T2A2a7e7m7rwH+AQzKKDMI\nuAfA3d8EWptZ+xrqDgLGhefjgONC/Rnuvig8/wBoYWZNzawD0NLd3wp17knVERGRwpV0kuoEzEtb\nnh/WxSmTrW57d18MEJLS9pkHNrOfA9NCgusU6meLQ0RECkyTfAdQBatDnUpjO5lZT+Bq4Gf1EpGI\niORF0klqAdA5bbk4rMsss2MVZZplqbvIzNq7++LQlLckVcjMioHHgFPdfU4Nx9iImWkwQxGRHHP3\nKk9Qkm7uewvoZmYlZtYMGApMyCgzATgNwMz6AitCU162uhOIOkYAnA48Geq3AZ4CLnT3N1IHCE2C\nX5vZfmZm4XhPVhe0u9f4GDNmTKxyuXgoFsWyOcejWBRLNomeSbn7OjMbBUwmSoh3uvtHZjYy2uy3\nu/skMxtoZp8Aq4AR2eqGXV8DPGRmZwLlwJCw/nxgJ+ByMxtD1Ax4uLsvDdvuBloAk9z9mSRfu4iI\nbLrEr0mFZLBLxrq/ZSyPils3rF8O/LSK9VcBV1Wzr38Du8cOXERE8k4jTtRRaWlpvkOooFiqpliq\nV0jxKJaqKZaI1dQe2NCYmes9ERHJHTPDq+k4UYhd0EVkC9SlSxfKy8vzHYbkUUlJCXPmzKlVHZ1J\nZdCZlEgywrflfIcheVTd30C2MyldkxIRkYKlJCUiIgVLSUpERAqWkpSINHhdu3ZlypQp+Q5DqqAk\nJSJ5U1zSBTNL7FFc0iXfL1E2kbqgi0jeLJhbztXTVie2/4t7N0ts35IbOpMSEUmzevVqfvWrX9Gp\nUyeKi4sZPXo0a9asAaKRFx5//HEAXn31VRo1asQ///lPAKZMmcLee+9d5T6///57Tj/9dNq2bUvP\nnj3505/+xI47bpiY4ZprrqFbt260atWKXr168cQTT1RsGzduHP379+fXv/41RUVFdOvWjddff51x\n48bRuXNnOnTowD333FNRfsSIEZx//vkMHDiQli1bctBBB7F48WJGjx5N27Zt2W233ZgxY0asYxcC\nJSkRkTRXXnklZWVlvPvuu8yYMYOysjKuvPJKAA4++GCmTp0KwMsvv8xOO+3Eyy+/DMBLL71U7fBB\nY8eOZe7cucyZM4fnnnuOe++9l2hChki3bt149dVX+eabbxgzZgynnHIKixcvrtheVlbGXnvtxfLl\nyxk2bBhDhw7l7bff5tNPP2X8+PGMGjWK7777rqL8ww8/zB/+8AeWLVtGs2bN6NevH3369GHZsmWc\ncMIJjB49Ovax801JSkQkzf3338+YMWPYdttt2XbbbRkzZgzjx48HoiT10ksvAVGSuvjiiyuWX3rp\nJQ4++OAq9/nwww9zySWX0KpVKzp27MgFF1xQafsJJ5xA+/btATjxxBPZeeedKSsrq9jetWtXTjvt\nNMyMk046ifnz5zNmzBiaNm3Kz372M5o1a8Ynn3xSUX7w4MHstddeNGvWjMGDB7PVVlsxfPjwivrv\nvPNO7GPnm5KUiEiahQsX0rnzhvlWS0pKWLhwIQD9+vVj1qxZLFmyhBkzZnDaaacxb948li1bRllZ\nGT/5yU+q3WdxcXHFcnpTH8A999zD3nvvTVFREUVFRXzwwQcsXbq0YnsqiQBstdVWALRr167SupUr\nV1ZbPnM5vWxNx843Jakc6FLSIVZPpC4lHfIdqkiD17Fjx0pjDJaXl9OxY0cg+oDfZ599+J//+R96\n9epFkyZN6NevHzfccAPdunWjbdu21e5z/vz5Fctz586t9PwXv/gFt9xyC1999RVfffUVPXv2zMkQ\nUvk8dlxKUjlQPncx/hE1PsrnFk47sEhDNWzYMK688kqWLl3K0qVL+f3vf8+pp55asf0nP/kJN910\nU0XTXmlpaaXlqpx44olcffXVrFixggULFnDzzTdXbFu1ahWNGjWiXbt2rF+/nrvuuov3338/a4yb\nmkRS9ety7FyrVZKyyI+SCkZEJB/SOzFceuml9OnThz322IM999yTPn36cMkll1RsP/jgg1m5cmVF\n015qOVuSuvzyy+nUqRNdu3bl8MMP58QTT6R58+YA9OjRg9/85jf07duXDh068MEHH9C/f//Y8Va1\nHPf11uXYuVbjKOhmdg8wClgLlAHbAn9y9xuSDy/3khgF3cyomPg+W7kem/4NSaRQVTUCdnFJFxbM\nTW76jk6dS5hfPiex/dfVbbfdxoMPPsiLL76Y71ByKqlR0Pdw92+A44DngBLgjE2IU0QEgPnlc3D3\nxB6FkqAWLVrEa6+9hrvz8ccfc/3113P88cfnO6zNQpwRJ5qaWRNgEHCru682s/UJxyUissVYvXo1\nI0eOZM6cObRp04Zhw4Zx7rnn5juszUKcJPV3YC7wPvCSmXUGVmavIiIiKZ07d+a9997LdxibpVrP\nzGvRFbem7p7cgFt5pGtSIsnQzLxSl2tS1Z5JmdkF1W0LbqxdeCIiIrWTrblvu/BzZ2A/YGJYPhp4\nEyUpERFJWJwu6C8DR4cefphZK2Ciu1d/U8BmTM19IslQc58k1QW9PfB92vIPgMbvERGRxMVJUvcB\nb5rZpWZ2KfAaMD7ZsERECte5557LVVddVe9lsykvL6dRo0asX5/7O4BGjBjB5ZdfHqts165dmTJl\nSr0du8Yu6O7+32b2TyA1vO857v5WvUUgIg1Wl+JiyhcsSGz/JZ06MSdtYNf6cuuttyZStia1Hf5o\nS5A1SZlZY+Bdd+8JKDGJSL0qX7CAFWPHJrb/Ngnse/369TRqpLG5cyXrO+3u64DPzKxTjuIREcm5\nmTNncsghh1BUVMTuu+/OxIkTK7aNGDGC8847j6OOOoqWLVsyderUjZq/rr32Wjp27EhxcTF33nkn\njRo14rPPPquonyr70ksvseOOO3LDDTfQvn17OnXqxN13312xn0mTJtG7d29at25NSUkJV1xxRezX\n0LVrV6677jr23HNPWrZsydlnn82SJUsYOHAgrVq14vDDD+frr7+uKD9hwgR69epF27ZtOfTQQ5k5\nc2bFtunTp7PPPvvQunVrhg4dyvfff1/pWE899VTFHFT9+/dP9EblOF8HtgE+MrNnzeyx1COxiERE\ncmjt2rUcc8wxHHnkkXz55ZfceOONDB8+nNmzZ1eUeeCBB7jsssv49ttvOfDAAyvVf+aZZ/jLX/7C\nlClT+OSTT5g6dWrWZrlFixbx7bffsnDhQv7+979z/vnnVySPbbbZhvHjx/P111/z9NNPc9tttzFh\nwoTYr+Wxxx7jhRdeYNasWUyYMIGBAwfyxz/+kaVLl7Ju3TpuvDG6c2jWrFmcfPLJ3HjjjXz55ZcM\nGDCAY445hrVr17JmzRoGDx7M6aefzvLlyznxxBN59NFHK44xffp0zjrrLO644w6WL1/OyJEjOfbY\nY1mzZk3sOGsjTpK6EhgMXAvcnPYQEdnsvfHGG6xatYoLL7yQJk2acMghh3D00UfzwAMPVJQZNGgQ\nffv2BaiYYiPl4YcfZsSIEey66660aNGCsTU0MTZr1ozLLruMxo0bM2DAALbZZhs+/vhjIJqrqmfP\nngD06tWLoUOHVkxPH8d//Md/0K5dO3bYYQcOOugg9t9/f/bYY4+KaeSnT58OwEMPPcTRRx/NoYce\nSuPGjfntb3/L999/z2uvvcYbb7zB2rVrueCCC2jcuDEnnHAC++67b8Ux7rjjDs455xz69OmDmXHq\nqafSvHlz3njjjdhx1kaNScrdXwBmAE3DY0ZYJyKy2Vu4cOFG07mXlJSwIK1DR+b2bPV33HHHrPeD\nbbvttpWuaW299dYV07m/+eabHHrooWy//fa0adOGv/3tb7Wayj3utPELFy6kpKSkYpuZUVxczIIF\nC1i4cCGdOlW+wpNetry8nOuvv562bdvStm1bioqKmD9/PgsXLowdZ23UmKTM7ARgGnAqcBrwtpkN\nTiQaEZEc69ixI/Pmzau0bu7cuZU+qLM13+2www4bTQ1f1154w4cP57jjjmPBggWsWLGCkSNHJnID\ndMeOHSkvrzyP17x58+jUqdNGrwcqT3e/4447cskll7B8+XKWL1/OV199xcqVKznppJPqPU6I19x3\nObCvuw9395OB/YGxiUQjIpJj+++/P1tvvTXXXnsta9euZerUqTz11FMMGzYsVv0hQ4Zw1113MXPm\nTL777juuvPLKOseycuVKioqKaNq0KWVlZdx///2VttdXwhoyZAhPP/00L774ImvXruW6666jRYsW\nHHDAAfTr14+mTZvy17/+lbVr1/LYY49RVlZWUffss8/mtttuq1i3atUqJk2axKpVq+oltkxxklQj\nd1+ctrwkZj0RkYLXtGlTJk6cyKRJk2jXrh2jRo1i/Pjx7LzzzkDVZ1Hp64488kguuOACDjnkELp3\n706/fv2Aja9dVSd9X7fccguXXXYZrVu35sorr9zo7CTbGVptppTv3r079957L6NGjWK77bbj6aef\nZuLEiTRp0oSmTZvy2GOPcdddd7Htttvy8MMPc8IJJ1TU3WeffbjjjjsYNWoUbdu2pXv37owbNy7W\ncesizth91wO7AqmriEOBme7+23qNpEBo7D6RZFQ1btvmejNvNjNnzmT33Xfnhx9+0P1UGeoydl+c\nJGXAECDV7/IV4JF6/yQvEEpSIsnYkgeYfeKJJxg4cCCrVq3ijDPOoEmTJpW6bUukXpOUmY0iGqfv\nHXdvMNPFK0mJJGNLTlIDBgzg9ddfp0mTJpSWlnLzzTdX6lknkfpOUn8BDgC6AdOBV4mS1uvu/nWV\nlbYASlIiydiSk5TEk1RzXwuiSQ8PAPoR9e5b4u57bHLEBUhJSiQZSlJSr9PHZ5RpBjQPj4XAB5sQ\np4iISCzVJikzuwXYA/iOaAT0N4Bb3P3LHMUmIiINXLb+kd2BFsBc4FPgEyUoERHJpazXpMysEdHZ\n1AHh0YPoZt7X3P33OYkwx3RNSiQZuiYldbkmVdN8Uuvd/R3gUeBxYCrRGdZ/bXK0IiIFItuU5//6\n17/o0aNHrP2k5ouS+lNtkjKz88zsXjP7HCgDfg7MIbqxt01uwhORLVmXkg6YWWKPLiUdNjnG/v37\n89FHMZpCgoY4xXuSsvXu2xWYCFzs7vOylBMRqZPyuYtjNYXXlfVYXHMhKWjVnkm5+wXu/qASlIg0\nBNOnT2fPPfekqKiIYcOGsXr1amDjJrxp06ZVTPE+ZMgQhg4dWmkqeXevdnr4THPmzOHggw+mdevW\nHH744YwaNYpTTz21YvuQIUPYYYcdKCoqorS0lA8//LBi24gRIzj//PMZOHAgLVu25KCDDmLx4sWM\nHj2atm3bsttuuzFjxoyK8rWdXj7bsXNJox+KiBDNsDt58mQ+//xzZsyYUSm5pJrw1qxZw/HHH8+Z\nZ57J8uXLGTZsGI8//nil/WSbHj7TySefTN++fVm2bBljxoxh/PjxlZoLBw4cyKeffsqSJUvo3bs3\nw4cP3yjmP/zhDyxbtoxmzZrRr18/+vTpw7JlyzjhhBMYPXp0pfJxp5ePc+xcSTxJmdmRZjbTzGaZ\n2YXVlLnRzGab2TtmtldNdc2syMwmm9nHZvasmbUO69ua2RQz+9bMbsw4xothX9PNbJqZtUvqNYvI\n5ueXv/wl7du3p02bNhxzzDG88847G5V5/fXXWbduHaNGjaJx48YMHjyY/fbbr1KZbNPDp5s3bx5v\nv/02V1xxBU2aNOHAAw/k2GOPrVTmjDPOYOutt6Zp06ZcfvnlzJgxg2+//bZi++DBg9lrr70qpoff\naqutGD58OGbGSSedtNFriDu9fJxj50qsJGVmrcysVW13Hrqw3wQcAfQEhpnZrhllBgA7ufvOwEjg\nthh1LwKed/ddgCnAxWH998ClwG+qCWmYu+/t7r3dPf6czCKyxUsfEDZ9Svd0X3zxxUZTq2f25ss2\nPXy6hQsX0rZtW1q0aFHlvtavX89FF11Et27daNOmDV27dsXMKk0nH3e6+NqWj3PsXMnWu6849O77\nEpgBvGtmS8K6zjH3vx8w293L3X0N8A9gUEaZQcA9AO7+JtDazNrXUHcQkJplaxxwXKj/nbu/BvxQ\n29crIlKTHXbYgQUZ819lTj1fm30tX76c77//vsp93XfffUycOJEpU6awYsUK5syZg7vn5F6zfB47\nU7YP7QeBfwId3b2ru3cBOgHPECWMODoB6b/B+WFdnDLZ6rZPzRbs7ouA7WPGc3do6rs0ZnkRkQr9\n+vWjcePG3Hzzzaxbt44nn3yy0tTqtdG5c2f69OnD2LFjWbNmDa+//joTJ06s2L5y5UqaN29OUVER\nq1at4uKLL6519/a6JpX6OHZ9yZaktnf3+8JZDADuvsbd7wW2SzCmurwTcX4TJ7v77sBBwEFmdkod\njiMiW6C4H8CpqdX//ve/U1RUxP33388xxxyTdar4bPu+7777eO2112jXrh2XX345Q4cOrdjXaaed\nRufOnenUqRO9evXigAMOqN2Lyjh2baaXr49j15ds80k9DHxB1JyWOqPZETgD2MHdf17jzs36AmPd\n/ciwfBHg7n5NWpnbgBfd/cGwPBM4GOhaXV0z+wgodffFZtYh1O+Rts/TgX3c/YJq4qp2u5n5mDFj\nKpZLS0spLS2t6aXW9D5oWCRp8KoaEqdLSQfK5yZ3L1NJ5/bMKV+U2P4B+vbty7nnnsvpp5++yfsa\nOnQoPXr0IP0zaEuS+huYOnUqU6dOrVh/xRVX1GnSw+bAL4iu/6Sa2RYAE4Db3f37KitW3kdj4GPg\nMKKEV0bUeeGjtDIDgfPd/aiQ1P7i7n2z1TWza4DlIWFdCBS5+0Vp+zwd6OPu/5EWRxt3X2ZmTYH7\ngefc/fYqYtbYfSIJ2FLG7nv55ZfZZZddaNeuHffeey/nnXcen332WZ1m4n377bdp27YtXbt25dln\nn+X444/n9ddfZ88990wg8vyr1/mk3P0H4K/hUSfuvs6iaegnEzUt3hmSzMhos9/u7pPMbKCZfQKs\nAkZkqxt2fQ3wkJmdCZQTDdWUerGfAy2BZmY2CDicaCT3Z82sCdAYeB64o66vS0Qaro8//pghQ4bw\n3Xff8eMf/5hHH320zlPFL1q0iOOPP57ly5dTXFzMbbfdtsUmqLqqaRT0w4h6zqWfST3p7s/nILa8\n0JmUSDK2lDMpqbt6PZMys+uBXsB4op51AMXAf5rZQHf/9aaHLCIiUr1s16RmuXv3KtYbMCvcfLvF\n0ZmUSDJ0JiV1OZPK1gX9BzPrXcX63lR/s6yIiEi9yTZVx5nA7aGXX6oLemeioYfOTDowERGRbL37\n3gL6mFkxaR0n3H1+dXVERKpTUlKiCQEbuJKSklrXyXYmldI+PADWsqEThYhIbHPmzMl3CLIZyta7\n7zDgVqL7kFIjKhaHwWXPdfcXchCfiIg0YNnOpP4KHOnun6WvNLOdgKeAHlXWEhERqSfZevc1JTqL\nyjQ3bBMREUlUtjOpccCbZvYAlQeYHQbcnXBcIiIiNQ6LtDtVDDDr7u/mILa80M28IiK5VadhkQDc\n/T3gvUSiEhERqUGdplM3s4k1lxIREdk02bqg71HdJqBPMuGIiIhskK25bzrwKlVP594mmXBEREQ2\nyJakZgJnuvsnmRvMbF4V5UVEROpVtmtSV1B9EhudQCwiIiKVZO2C3hCpC7qISG7VdT4pERGRvFKS\nEhGRgqUkJSIiBavG+aTMrBFwJNAlvby735hcWCIiIvEmPXwScKLhkdYnG46IiMgGcZJUF3ffPfFI\nREREMsS5JvWsmR2aeCQiIiIZ4iSpV4CJZrbSzJab2VdmtjzpwCQZXUo6YGY1PrqUdMh3qCIiNd/M\na2afAyeQcU3K3dclG1p+bOk38xZSLCIisAnzSQXzgen1/sktIiJSgzhJ6hNgiplNAn5IrVQXdBER\nSVrcM6n5QKuEYxEREamkxiTl7pcBmNlWYfn/kg5KREQEYvTuM7PdzOwtYDYw28zeNLMeyYcmIiIN\nXZwu6LcDv3P3YncvBi4B7kg2LBERkXhJqqW7P5dacPfngZbJhSQiIhKJk6TmmNnFZlYcHhcBcxKO\nS0REJFaSOhPYEZgEPA0UAyOSDEoaBo1+ISI1idMF/WB3Py99hZkdDzyWTEjSUJTPXRxz9IvFyQcj\nIgUpzpnUpVWsu6S+AxHJJ53ViRSmas+kzOwIoskOO5nZDWmbWqF5pWQLo7M6kcKUrblvCfA+8D3w\nQdr6b4GLkgxKREQEsiQpd58OTDezH7n7nenbzGwUcFPSwYmISMMW55rUGVWsO6ue4xAREdlItmtS\nJwFDga5mlt6TrxWwIunAREREsl2TKgOWEd0XdXPa+m+B6UkGJSIiAtmvSX0OfA48b2btgD5h02fu\nviYXwYmISMMWZxT044FpwKnAacDbZjY46cBERETijDgxBtjX3RcDmFl7YDLweJKBiYiIxOnd1yiV\noIIlMeuJiIhskjhnUpPN7GnggbA8FHg2uZBEREQicZLUb4ETgf5heRzwSGIRbSa6FBdTvmBBvsMQ\nEdmi1Zik3N2Bh4CHzKyNu9fqHikzOxL4C1ET4Z3ufk0VZW4EBgCrgDPc/Z1sdc2sCHgQKCGa22qI\nu39tZm2JEui+wF3ufkHaMXoDdwMtgEnu/qvavI5M5QsWsGLs2Fhl28QsJyIilVV7bcnM9jOz583s\nITPb08zeBT4xs8VmdnicnZtZI6Lhk44AegLDzGzXjDIDgJ3cfWdgJHBbjLoXAc+7+y7AFODisP57\nolHbf1NFOLcCZ7l7d6B7GEB3i9CluDjWCN5mlu9QRURqJduZ1M1EPftaAy8Cx7j7q2bWExhP1MOv\nJvsBs929HMDM/gEMAmamlRkE3APg7m+aWevQg7BrlrqDgIND/XHAVOAid/8OeM3Mdk4Pwsw6AC3d\n/a2w6h7gOLaQa2uFdFanZlARqU/ZklQTd58EYGaXu/urAO7+gcX/St4JmJe2PJ8ocdVUplMNddun\nehy6+yIz2z5GHPOrOIbUs0JKmCKy+cuWpDzt+f9l2Vbf6tImVa/xjE378CwtLaW0tLQ+dy8i0qBN\nnTqVqVOnxiqbLUntaWbLiZJGy/CcsLxNzFgWAJ3TlovDuswyO1ZRplmWuovMrL27Lw5NeUtixFHV\nMao0Vt/wRUQSk/nl/4orrqi2bLabcpsB2wHtgObheWq5RcxY3gK6mVmJmTUjusdqQkaZCUTDLWFm\nfYEVoSkvW90JbJhC5HTgySqOXXFG5u6LgK9DZxALx6uqjmxB1KFEZPOXbYDZdZu6c3dfFyZInMyG\nbuQfmdnIaLPf7u6TzGygmX1C1AV9RLa6YdfXEHWJPxMoB4akjmlmnwMtgWZmNgg43N1nAudTuQv6\nM5v6+qSw6fqYyOYvzs28myQkg10y1v0tY3lU3Lph/XLgp9XU6VrN+n8Du8eLWkRECkG2+6QST2Ai\nIiLZZLsmVQZgZnfnJhQREZHKsp0tNTOzIcBBZnZs5kZ3z+wAISIiUq+yJanzgVOANkQDzKZzNu6l\nJyIiUq+y9e57CXjJzN7O7OggIiKSC3EmL/xfMzvPzP4RHueqU4VI7dTmnq0uxcX5DlekYMRJNjcB\nPwL+NyyfAuwN/CKpoES2NLpnS6Ru4iSpvu6+Z9ryZDObkVRAIiIiKXGa+9abWZfUQni+PplwRERE\nNoiTpC4EXgkTIL4AvAT8Z7JhiUiS4l4j0/Uxybc408dPNrPuQI+w6iN3z5y6Q0Q2I3GvkeXi+ljc\niTJLOnVizvz5NZaTLUusXnohKU1LOBYRaYAKKWFK4YnT3CciIpIXSlIiIoGu1RWeGpv7zOxBonuk\nJrt7ktPGiwjQvBmxJ2Is6dyeOeWLEo6o4VDTY+GJc03qLuBM4KaQsO5290+SDUuk4fphNVRM71kD\n67E40ViUMCXf4vTuewZ4xsyKgOHAi2H22zuAB9x9bcIxikieFFLClIYp1jWpkKBOBk4F3gX+BhwA\naAp2ERFJTJxrUg8TTbt+H3CCu6duVLjPzKYnGZyIiDRsca5J3Q48n95pwsyauPtad987udBERKSh\ni9Pcd00VvfrKkghGREQi6g4fqfZMysy2B3YAtjKz3YFUF59WwNY5iE1EpMFSd/hItua+o4i6nhcD\nt6St/xa4LMmgREREIPv08XcBd5nZEHd/KIcxiYiIANmb+4a5+wPADmZ2QeZ2d78x0chERKTBy9Zx\noij8bAdsV8VDRCRnUqNfxOpMUNKhwcSypcvW3HdL+KnrTyKSd4U0+kUhxbKly9bcd0O2iu7+6/oP\nR0REamNLH18xW+++D3IWhYiI1MmWflaXrbnvzlwGIiIihalLcTHlCxbUWK6kUyfmzJ9fY7nayNbc\nd727/8bMHgc2mkfK3Y+v10hERKQg5fPG4mzNfQ+GnzfV+1FFRERiyNbcVxZ+vmBmTYGdic6oZmsO\nKRERyYU4U3UcSTQS+lyi8fuKzexsd5+cdHAiItKwxZmq4y/AT919FoCZdQeeBHokGZiIiEicqTpW\nphIUQHi+KrmQREREItl69x0bnpaZ2QTgIaJrUicCb+YgNhERaeCyNfedmPb8a+CI8PxboGViEYmI\niATZevdkHn/QAAAUxUlEQVSdmstAREREMsXp3dccOAPoCbRIrXf3XyQXloiISLyOE/cAXYCjia5F\n7QR8n2BMIiKyGUpiCpM4XdC7u/tJZnaUu99pZvcAr2zSKxERkS1OEoPdxjmTWhN+rjCzHkSdJraP\nF4aIiEjdxTmTutPMioAxwLPA1sDliUYlIiJCjCTl7n8LT18EOicbjoiIyAY1NveZWZGZ/dnMyszs\nTTO7LpxZiYiIJCrONal/AN8Aw4FTiG7mfTBrDRERkXoQ55pUJ3cfk7Z8hZm9n1RAIiIiKXHOpF4w\ns5+nFszseOC55EISERGJVJukzOwrM1sOnAY8ZGarzWw18AhwetwDmNmRZjbTzGaZ2YXVlLnRzGab\n2TtmtldNdcN1sslm9rGZPWtmrdO2XRz29ZGZHZ62/sWwr+lmNs3M2sV9DSIikh/ZzqTaAduFn02B\nrcKjaVhfIzNrRDT9/BFEwyoNM7NdM8oMAHZy952BkcBtMepeBDzv7rsAU4CLQ53dgCFEc10NAG4x\nM0s73DB339vde7v70jivQURE8qfaJOXu61IPokRxVXj8LKyLYz+i6ebL3X0NUSeMQRllBhENvYS7\nvwm0NrP2NdQdBIwLz8cBx4XnxwL/cPe17j4HmB32U+PrFRGRwhOnC/pVwH8Bn4XHf5nZlTH33wmY\nl7Y8P6yLUyZb3fbuvhjA3RexYQSMzDoLMo53d2jquzRm/CIikkdxevcdA+ydOnsys/8FpgFJfdBb\nzUU24jHKnOzuX5jZj4DHzOwUd7+3DscSEZEciZOkAFoBX4XntZnwcAGVR6koDusyy+xYRZlmWeou\nMrP27r7YzDoAS2rYF+7+Rfi5yszuJ2oGrDJJjR07tuJ5aWkppaWl2V6jiIjUwtSy6Gf6Z2114iSp\na4FpZvYC0VlOKXBZzFjeArqZWQnwBTAUGJZRZgJwPvCgmfUFVoTkszRL3QlEc1xdQ9TT8Mm09feZ\n2Z+Jmvm6AWVm1hho4+7LzKwp0bQj1Xajj/PGiYhI3ZSGngKpz9orrrii2rJZk1ToGfcC0bh9+4fV\nl7t75tlQldx9nZmNAiYTXf+6090/MrOR0Wa/3d0nmdlAM/sEWAWMyFY37Poaom7xZwLlRD36cPcP\nzewh4EOi0dvPc3cPEzc+a2ZNgMbA88AdcV6DiIjkT9YkFT7gn3P3XsBjdTmAuz8D7JKx7m8Zy6Pi\n1g3rlwM/rabO1cDVGeu+A/rUKnAREcm7OF2y3zGzvROPREREJEOca1J7A2+Z2adEzXFGdJLVO9HI\nRESkwYuTpI5NPAoREZEqVJukQmeDs4l6yL0H3F2LkSZEREQ2WbZrUncD/YmGFjoOuC4XAYmIiKRk\na+7r5e67A5jZ7cCbuQlJREQkku1Mak3qSRjgVUREJKeynUntGeaTgqhHX8uwnOrd1zbx6EREpEHL\nlqSa5SwKERGRKlSbpNSTT0RE8k2TAIqISMFSkhIRkYKlJCUiIgUr24gTX1H1jLfq3SciIjmRrXdf\nu5xFISIiUoXYvfvMrC3QIm3VwqSCEhERgRjXpMzsKDObBcwnGhppPjAl6cBERETidJy4CjgQ+Njd\ndwSOAF5JNCoRERHiJam17v4l0MjMzN2fA/ZLOC4REZFYkx5+bWbbAP8C7jGzJcD/JRuWiIhIvDOp\n44iS0q+AqcAC4OgEYxIREQHiJamL3X2du69x9zvd/Qbg10kHJiIiEidJHVnFuqPqOxAREZFM2Uac\nGAmcA3Q3s2lpm1oC/046MBERkWwdJx4CXgCuBi5KW/+tuy9JNCoRERGyjzjxFfAVcKKZ9QQOCpte\nAZSkREQkcXFGnDgfeBjoHB4Pmdl5SQcmIiIS5z6pkcB+7r4SwMz+ALwG3JJkYCIiInF69xmwOm15\nTVgnIiKSqGy9+5q4+1pgPPCmmT0aNg0GxuUiOBERadiynUmVAbj7tURNft+Fxznufl0OYssbM6vx\nkQvFJV0KJhYRkXzIdk2q4tPP3csISashuHra6hrLXNy7WeJxLJhbXjCxiIjkQ7YktZ2ZVTv8URge\nSRqI4pIuLJhbnu8wRKSByZakGgPboE4SQmGd1RVSwlQsVSukWGTzli1JfeHu/52zSERiKqSEuTnG\nAsnHU0ixKGFu3mJdkxIR2VwpYVatkGLJJluSOixnUYiINACFlDALKZZsqu2C7u7LcxmIiIhIpjgj\nToiIiOSFkpSIiBQsJSkRESlYSlIiIlKwlKRERKRgKUmJiEjBUpISEZGCpSQlIiIFS0lKREQKlpKU\niIgULCUpEREpWEpSIiJSsBJPUmZ2pJnNNLNZZnZhNWVuNLPZZvaOme1VU10zKzKzyWb2sZk9a2at\n07ZdHPb1kZkdnra+t5m9G/b1l6Rer4iI1J9Ek5SZNQJuAo4AegLDzGzXjDIDgJ3cfWdgJHBbjLoX\nAc+7+y7AFODiUGc3YAjQAxgA3GJmqXmxbgXOcvfuQHczOyKZVy0iIvUl6TOp/YDZ7l7u7muAfwCD\nMsoMAu4BcPc3gdZm1r6GuoOAceH5OOC48PxY4B/uvtbd5wCzgf3MrAPQ0t3fCuXuSasjIiIFKukk\n1QmYl7Y8P6yLUyZb3fbuvhjA3RcB21ezrwVp+5pfQxwiIlJgCrHjRF2mrfd6j0JERPLP3RN7AH2B\nZ9KWLwIuzChzG3BS2vJMoH22usBHRGdTAB2Aj6raP/AMsH96mbB+KHBrNTG7HnrooYceuX1Ul0ea\nkKy3gG5mVgJ8QZQchmWUmQCcDzxoZn2BFe6+2MyWZqk7ATgDuAY4HXgybf19ZvZnoua8bkCZu7uZ\nfW1m+4WYTgNurCpgd6/LmZyIiCQg0STl7uvMbBQwmahp8U53/8jMRkab/XZ3n2RmA83sE2AVMCJb\n3bDra4CHzOxMoJyoRx/u/qGZPQR8CKwBzvNwekSUCO8GWgCT3P2ZJF+7iIhsOtvwGS4iIlJYCrHj\nRMGLc4NyjuK408wWm9m7+YohLZZiM5tiZh+Y2XtmdkEeY2luZm+a2fQQy5h8xZIWUyMzm2ZmE/Ic\nxxwzmxHem7I8x9LazB4ON95/YGb75ymO7uH9mBZ+fp3nv9/RZvZ+GHzgPjNrlsdYfhn+h/L2P60z\nqVoKNxnPAg4DFhJd4xrq7jPzEEt/YCVwj7vvkevjZ8TSAejg7u+Y2TbAv4FB+XhfQjxbu/t3ZtYY\neBW4wN3z9qFsZqOBfYBW7n5sHuP4DNjH3b/KVwxpsdwNvOTud5lZE2Brd/8mzzE1IrpFZX93n1dT\n+QSO3xH4F7Cru682sweBp939njzE0hN4ANgXWAv8EzjH3T/LZRw6k6q9ODco54S7/wvI+4cNgLsv\ncvd3wvOVRD0w83Yvmrt/F542J7r2mrdvY2ZWDAwE/p6vGNIYBfB/b2atgIPc/S6AcAN+XhNU8FPg\n03wkqDSNgR+lEjfRl+F86AG86e4/uPs64GXg+FwHkfc/1s1QnBuUGzQz6wLsBbyZxxgamdl0YBHw\nXNpoI/nwZ+A/yWOiTOPAc2b2lpmdncc4ugJLzeyu0Mx2u5ltlcd4Uk4iOnvIC3dfCFwPzCUajGCF\nuz+fp3DeBw4KY6VuTfRFa8dcB6EkJfUqNPU9AvwynFHlhbuvd/e9gWJg/zCuY86Z2VHA4nCWadTt\nZvX6dKC79yb6wDk/NBnnQxOgN3BziOc7ovsc88bMmhINrfZwHmNoQ9QyUwJ0BLYxs5PzEUtoqr8G\neA6YBEwH1uU6DiWp2lsAdE5bLg7rGrzQPPEIMN7dn6ypfC6EJqQXgSPzFMKBwLHhWtADwCFmlvPr\nCynu/kX4+SXwOFHzdT7MB+a5+9th+RGipJVPA4B/h/cmX34KfObuy0MT22PAAfkKxt3vcvc+7l4K\nrCC6Hp9TSlK1V3GDcuh1M5ToJuJ8KYRv5yn/C3zo7v+TzyDMrF1q+pbQhPQzopFMcs7df+fund39\nx0R/K1Pc/bR8xGJmW4czXczsR8DhRE06ORfG3pxnZt3DqsOI7m/Mp2HksakvmAv0NbMWYQaHw4iu\n7+aFmW0XfnYGBgP35zqGpEec2OLUcJNxTpnZ/UApsK2ZzQXGpC5E5yGWA4HhwHvhWpADv8vTTdM7\nAONCT61GwIPuPikPcRSa9sDjZuZE//v3ufvkPMZzAdEIMU2Bzwg38udDuObyU+AX+YoBwN3LzOwR\noqa1NeHn7XkM6VEza8uGwRFy3rlFXdBFRKRgqblPREQKlpKUiIgULCUpEREpWEpSIiJSsJSkRESk\nYClJiYhIwVKSakDMrG3alARfmNn8tOVa3TMXpgnZeRPj2drMXtyUfaTta3RtpzQws8PM7PEq1p9l\n0ezOORXnPTWz8Wa20SjqZtbVzE6qwzFvCNMw/CFj/e9rMzVD5vHNbG8zO6K28cQ81pAwtcc6M9sj\nY9ulZjbbzD40s8PS1u8bXucsM7s+bX1zi6YLmW1mr4bBgHOiur+/tO3tzezpXMVTqJSkGpAw1Mre\nYay0W4EbUsvuvraW+zrL3WdvYkj/D3hoE/dBmI7j10SzLtdWdTcK5vwGwk18T3ciGtEitjCiwQh3\n393df1fH41Z3/N4kNxTVu0Tj272avtLMdgeOA3YFjib6G0+5FTjd3bsDvdIS2C+AL9x9Z+AW4I8J\nxZyaBiRTtX9nYVSOpWa2b1IxbQ6UpBquSkMpmdl/hW+a74YRNTCznSyafO2B8M30H2bWPGx7JfUt\n1syOMrN/h7OyZ8K6Q83snXCW9nY1I1wPB54M5TuGfU4LMfQN608Jy++a2VVhXWMz+8rM/mxm7xCN\nML498IqZTQ5lBpjZa+HYD6SOH2KdaWZvk32KlS5mNtXMPjaz34W6V5nZ+Wnv2R/N7NyM9/EiMzsn\nPP+rmT0bnv/MovmTUpNmVhVb+ns6Mhz7dTO7w8xuSDvMoeFb/ydmlnoNVwOl4f0blRGTmdn14fc7\nw8xS0y08RTSA6bS0del6h+N/bGYjathX+vEvAC4HTk7t28y2NbMnQ51/WRjw16IztrvCa//czAaZ\n2XVh/xOr+mB395nu/gkbDwc2CHjA3deFOY/KzWyfcHbU3N2nhXLjiZJZqs648PwhYKOzvxp+p3eF\n53H+Tvet7u8vy//Lk8ApVfxuGg5316MBPoAxwK/D8/2Ihl9pBmxDNIZaT6Jvx+uBfUO5cUSTBwK8\nAuxBNNROOVAc1rcJPyel1duaMLpJ2vGbA/PTlv8L+M/w3EKdTsDnQBHRHDtTiUbvbhziGpRWfy7Q\nMjzfLpRtEZZ/RzTC9lZE06x0CesfAR6r4r05K5RrFeL4ILzWnYCyUKYR8CnQOqPugUTDDUE0ed0b\n4fX8N9GwP1XGlvGeFhMNE9SKaPiiV4nOeiH6gE3tf3fgo/D8sKpeS9g2hGjiPMLvay7QLryPy6up\n83vgbaBpiHle+FndviodP7yHN6Qt3wJcHJ7/DHgr7TgvhveoN7AKODRsmwAMzPI3/AqwR9ryrcCQ\ntOW7iUY13x+YlLa+NBUr0bh426dt+5xoYsra/E5j/Z2S5e+Pav5fiAaznpbvz4t8PnQmJQD9gUfd\nfbVH02s8ARwUtn3mG+ZiujeUTdePaNDU+QDuviKsfxW4MXyrb+3hPy7N9sDytOW3gP9nZpcBu3s0\naeH+wAvu/pVHI0LfD/wklP/BK4+0nj7Q7gHAbsBrFo0jeDLQJaz72N3nhHL3ZXlPnnX3b0IcTwD9\n3f1T4BuLZiwdQDQh3NcZ9d4i+sbcmmjW5LeIZuQ9iOhDtarYSjL2kXrd33jUDPtIxvYnANz9PaLp\nHGrSnzBwqkdNSK8AfcK2bIMTP+HuazwaFfwloi8z2fZVUwzjQ73ngB3SzhYmhb+P96LNPiWsf4/o\n95ZLVb0fNf1O4/6dZvv7q+7/ZQnRWJQNlpKU1FZVbegb/WO7+1XA2URnZm+Y2U4ZRf6PtGtI7v4i\n0TfcL4gGhx1W3b7T6lfHgH96dK1tb3fv5e7n1LC/jV5CNct3En17HkE06nvlQu6riWZSPY3oW/cr\nRGcZnX1DE1VmbOdm7qeGOH+IWa466XWyXXtL32ZEZwXZ9pVNtuOkXs96YHXa+vXUbhDsBVSelC81\njU626XUq6lg00O1GU9jH+J1C/L/TKstl+X9pUcU+GhQlKYHon26wRT2dtiFqK38lbOtqZvuE5yen\nrU95jehaRGcAMysKP3/s7u+7+x+BacAu6ZXcfSmwlYVehaH+Ynf/O1Ezzd5EM/uWWjQzaBOiC/NT\nwy4y/9m/IWoeS8V0sJl1Dfve2sy6ETVjpqZZMaKpGapzuJm1smh07PSL9I8BxwB7evUzpr4C/JZo\nuu1/AecTNZ1liy1dWXjdrcIHZ7Ypu1Pvw7dAyyzxDA3Xk9oTnc2l4smWZI4zs6YWTdfQP9Spbl/f\nsuH9p4rlVwjXVszsp8ACd6/qw7e2STe9/ARgWIh5J6Ik8u9wlv99uD5lwKmEa6Ghzhnh+UlEsxtU\nJdvvNO7fabV/f1n+X7qTp+lUCoWSlBCa8x4g+qd7jWi21A/C5o+AX5vZh0Rt6nekqoW6S4BzgSdD\n89W9Yftvw8Xvd4g+sKr653+eDRO6HQbMMLNpRPPW/NXdFwCXETU1TQNe8w1Tf2R+M78DeN7MJoeY\n/h/wYDj+q8DO4UPxXOAZokSwMMvb8hbRB9h0ousR74bX+wPRB1W2eYdeIWrOfMOj6cBXhzqp9+us\nzNjSX5O7zwP+FGJ4meja19fpZdKklqcDTSzqvDIqo8wjRPNpvUv0exgdviRUtb9077PhQ/ny0OxX\n3b6mA43Tjj8F2NOiDjXHE3Wk6GdmM4CxbEgMmWrsVWlmPzezeUTNjM+Y2USA8Dt6guhv9imi33XK\neUTXVGcBH6R9wbidqOlxdihTXS/HbL/TWH+n4e/vHKr++6vu/+UQoEF3Q9dUHVKt8G30EY+mYU9i\n/32Ac939rCT2n4TQ22w60cXwOQke50fuvip8M38SuMXdG/SHVUNkZi8DR7n7t/mOJV90JiU1Sexb\njEdTh/8rqf3XNzPrBXxCdKF/TsKH+304q5wBzFSCanjMbHvg2oacoEBnUiIiUsB0JiUiIgVLSUpE\nRAqWkpSIiBQsJSkRESlYSlIiIlKwlKRERKRg/X9CuYVdWIuo+QAAAABJRU5ErkJggg==\n", + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "a_top = np.sort([sum(tpm_low_gamma.get_topics(topic_ids=[i], num_words=100)['score']) for i in range(10)])[::-1]\n", + "b_top = np.sort([sum(topic_model.get_topics(topic_ids=[i], num_words=100)['score']) for i in range(10)])[::-1]\n", + "c_top = np.sort([sum(tpm_high_gamma.get_topics(topic_ids=[i], num_words=100)['score']) for i in range(10)])[::-1]\n", + "\n", + "a_bot = np.sort([sum(tpm_low_gamma.get_topics(topic_ids=[i], num_words=547462)[-1000:]['score']) for i in range(10)])[::-1]\n", + "b_bot = np.sort([sum(topic_model.get_topics(topic_ids=[i], num_words=547462)[-1000:]['score']) for i in range(10)])[::-1]\n", + "c_bot = np.sort([sum(tpm_high_gamma.get_topics(topic_ids=[i], num_words=547462)[-1000:]['score']) for i in range(10)])[::-1]\n", + "\n", + "ind = np.arange(len(a))\n", + "width = 0.3\n", + " \n", + "param_bar_plot(a_top, b_top, c_top, ind, width, ylim=0.6, param='gamma',\n", + " xlab='Topics (sorted by weight of top 100 words)', \n", + " ylab='Total Probability of Top 100 Words')\n", + "\n", + "param_bar_plot(a_bot, b_bot, c_bot, ind, width, ylim=0.0002, param='gamma',\n", + " xlab='Topics (sorted by weight of bottom 1000 words)',\n", + " ylab='Total Probability of Bottom 1000 Words')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "collapsed": true + }, + "source": [ + "From these two plots we can see that the low gamma model results in higher weight placed on the top words and lower weight placed on the bottom words for each topic, while the high gamma model places relatively less weight on the top words and more weight on the bottom words. Thus increasing gamma results in topics that have a smoother distribution of weight across all the words in the vocabulary." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "__Quiz Question:__ For each topic of the **low gamma model**, compute the number of words required to make a list with total probability 0.5. What is the average number of words required across all topics? (HINT: use the get_topics() function from GraphLab Create with the _cdf_\\__cutoff_ argument)." + ] + }, + { + "cell_type": "code", + "execution_count": 72, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "def calculate_avg_words(model, num_words=547462, cdf_cutoff=0.5, num_topics=10):\n", + " avg_num_of_words = []\n", + " for i in range(num_topics):\n", + " avg_num_of_words.append(len(model.get_topics(topic_ids=[i], num_words=547462, cdf_cutoff=.5)))\n", + " avg_num_of_words = np.mean(avg_num_of_words)\n", + " return avg_num_of_words" + ] + }, + { + "cell_type": "code", + "execution_count": 73, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "text/plain": [ + "252.40000000000001" + ] + }, + "execution_count": 73, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "calculate_avg_words(tpm_low_gamma)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "__Quiz Question:__ For each topic of the **high gamma model**, compute the number of words required to make a list with total probability 0.5. What is the average number of words required across all topics? (HINT: use the get_topics() function from GraphLab Create with the _cdf_\\__cutoff_ argument)." + ] + }, + { + "cell_type": "code", + "execution_count": 74, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "text/plain": [ + "576.20000000000005" + ] + }, + "execution_count": 74, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "calculate_avg_words(tpm_high_gamma)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We have now seen how the hyperparameters alpha and gamma influence the characteristics of our LDA topic model, but we haven't said anything about what settings of alpha or gamma are best. We know that these parameters are responsible for controlling the smoothness of the topic distributions for documents and word distributions for topics, but there's no simple conversion between smoothness of these distributions and quality of the topic model. In reality, there is no universally \"best\" choice for these parameters. Instead, finding a good topic model requires that we be able to both explore the output (as we did by looking at the topics and checking some topic predictions for documents) and understand the impact of hyperparameter settings (as we have in this section)." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 2", + "language": "python", + "name": "python2" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 2 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython2", + "version": "2.7.12" + }, + "toc": { + "toc_cell": false, + "toc_number_sections": false, + "toc_threshold": "8", + "toc_window_display": false + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git a/machine_learning/4_clustering_and_retrieval/assigment/week5/images/Screen Shot 2016-07-31 at 7.12.04 PM.png b/machine_learning/4_clustering_and_retrieval/assigment/week5/images/Screen Shot 2016-07-31 at 7.12.04 PM.png new file mode 100644 index 0000000..2103c74 Binary files /dev/null and b/machine_learning/4_clustering_and_retrieval/assigment/week5/images/Screen Shot 2016-07-31 at 7.12.04 PM.png differ diff --git a/machine_learning/4_clustering_and_retrieval/assigment/week5/images/Screen Shot 2016-07-31 at 7.12.11 PM.png b/machine_learning/4_clustering_and_retrieval/assigment/week5/images/Screen Shot 2016-07-31 at 7.12.11 PM.png new file mode 100644 index 0000000..9c3c535 Binary files /dev/null and b/machine_learning/4_clustering_and_retrieval/assigment/week5/images/Screen Shot 2016-07-31 at 7.12.11 PM.png differ diff --git a/machine_learning/4_clustering_and_retrieval/assigment/week5/images/Screen Shot 2016-07-31 at 7.12.16 PM.png b/machine_learning/4_clustering_and_retrieval/assigment/week5/images/Screen Shot 2016-07-31 at 7.12.16 PM.png new file mode 100644 index 0000000..c6709d3 Binary files /dev/null and b/machine_learning/4_clustering_and_retrieval/assigment/week5/images/Screen Shot 2016-07-31 at 7.12.16 PM.png differ diff --git a/machine_learning/4_clustering_and_retrieval/assigment/week5/images/Screen Shot 2016-07-31 at 7.12.20 PM.png b/machine_learning/4_clustering_and_retrieval/assigment/week5/images/Screen Shot 2016-07-31 at 7.12.20 PM.png new file mode 100644 index 0000000..66eebdd Binary files /dev/null and b/machine_learning/4_clustering_and_retrieval/assigment/week5/images/Screen Shot 2016-07-31 at 7.12.20 PM.png differ diff --git a/machine_learning/4_clustering_and_retrieval/assigment/week5/images/Screen Shot 2016-07-31 at 7.12.26 PM.png b/machine_learning/4_clustering_and_retrieval/assigment/week5/images/Screen Shot 2016-07-31 at 7.12.26 PM.png new file mode 100644 index 0000000..13aca9b Binary files /dev/null and b/machine_learning/4_clustering_and_retrieval/assigment/week5/images/Screen Shot 2016-07-31 at 7.12.26 PM.png differ diff --git a/machine_learning/4_clustering_and_retrieval/assigment/week5/images/Screen Shot 2016-07-31 at 7.12.30 PM.png b/machine_learning/4_clustering_and_retrieval/assigment/week5/images/Screen Shot 2016-07-31 at 7.12.30 PM.png new file mode 100644 index 0000000..11fd8ef Binary files /dev/null and b/machine_learning/4_clustering_and_retrieval/assigment/week5/images/Screen Shot 2016-07-31 at 7.12.30 PM.png differ diff --git a/machine_learning/4_clustering_and_retrieval/assigment/week5/images/Screen Shot 2016-07-31 at 7.12.34 PM.png b/machine_learning/4_clustering_and_retrieval/assigment/week5/images/Screen Shot 2016-07-31 at 7.12.34 PM.png new file mode 100644 index 0000000..24e4714 Binary files /dev/null and b/machine_learning/4_clustering_and_retrieval/assigment/week5/images/Screen Shot 2016-07-31 at 7.12.34 PM.png differ diff --git a/machine_learning/4_clustering_and_retrieval/assigment/week5/images/Screen Shot 2016-07-31 at 7.12.39 PM.png b/machine_learning/4_clustering_and_retrieval/assigment/week5/images/Screen Shot 2016-07-31 at 7.12.39 PM.png new file mode 100644 index 0000000..b6fd355 Binary files /dev/null and b/machine_learning/4_clustering_and_retrieval/assigment/week5/images/Screen Shot 2016-07-31 at 7.12.39 PM.png differ diff --git a/machine_learning/4_clustering_and_retrieval/assigment/week5/images/Screen Shot 2016-07-31 at 7.12.44 PM.png b/machine_learning/4_clustering_and_retrieval/assigment/week5/images/Screen Shot 2016-07-31 at 7.12.44 PM.png new file mode 100644 index 0000000..bf2d8d2 Binary files /dev/null and b/machine_learning/4_clustering_and_retrieval/assigment/week5/images/Screen Shot 2016-07-31 at 7.12.44 PM.png differ diff --git a/machine_learning/4_clustering_and_retrieval/assigment/week5/images/Screen Shot 2016-07-31 at 7.12.49 PM.png b/machine_learning/4_clustering_and_retrieval/assigment/week5/images/Screen Shot 2016-07-31 at 7.12.49 PM.png new file mode 100644 index 0000000..a265d66 Binary files /dev/null and b/machine_learning/4_clustering_and_retrieval/assigment/week5/images/Screen Shot 2016-07-31 at 7.12.49 PM.png differ diff --git a/machine_learning/4_clustering_and_retrieval/assigment/week5/images/Screen Shot 2016-07-31 at 7.12.53 PM.png b/machine_learning/4_clustering_and_retrieval/assigment/week5/images/Screen Shot 2016-07-31 at 7.12.53 PM.png new file mode 100644 index 0000000..dab5bbb Binary files /dev/null and b/machine_learning/4_clustering_and_retrieval/assigment/week5/images/Screen Shot 2016-07-31 at 7.12.53 PM.png differ diff --git a/machine_learning/4_clustering_and_retrieval/assigment/week5/images/Screen Shot 2016-07-31 at 7.12.57 PM.png b/machine_learning/4_clustering_and_retrieval/assigment/week5/images/Screen Shot 2016-07-31 at 7.12.57 PM.png new file mode 100644 index 0000000..f34cc74 Binary files /dev/null and b/machine_learning/4_clustering_and_retrieval/assigment/week5/images/Screen Shot 2016-07-31 at 7.12.57 PM.png differ diff --git a/machine_learning/4_clustering_and_retrieval/assigment/week5/quiz-week5-assignment.ipynb b/machine_learning/4_clustering_and_retrieval/assigment/week5/quiz-week5-assignment.ipynb new file mode 100644 index 0000000..1bead40 --- /dev/null +++ b/machine_learning/4_clustering_and_retrieval/assigment/week5/quiz-week5-assignment.ipynb @@ -0,0 +1,194 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Modeling text topics with Latent Dirichlet Allocation" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Question 1\n", + "\n", + "\n", + "\n", + "*Screenshot taken from [Coursera](https://www.coursera.org/learn/ml-clustering-and-retrieval/exam/LXSnc/modeling-text-topics-with-latent-dirichlet-allocation)*\n", + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Question 2\n", + "\n", + "\n", + "\n", + "*Screenshot taken from [Coursera](https://www.coursera.org/learn/ml-clustering-and-retrieval/exam/LXSnc/modeling-text-topics-with-latent-dirichlet-allocation)*\n", + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Question 3\n", + "\n", + "\n", + "\n", + "*Screenshot taken from [Coursera](https://www.coursera.org/learn/ml-clustering-and-retrieval/exam/LXSnc/modeling-text-topics-with-latent-dirichlet-allocation)*\n", + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Question 4\n", + "\n", + "\n", + "\n", + "*Screenshot taken from [Coursera](https://www.coursera.org/learn/ml-clustering-and-retrieval/exam/LXSnc/modeling-text-topics-with-latent-dirichlet-allocation)*\n", + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Question 5\n", + "\n", + "\n", + "\n", + "*Screenshot taken from [Coursera](https://www.coursera.org/learn/ml-clustering-and-retrieval/exam/LXSnc/modeling-text-topics-with-latent-dirichlet-allocation)*\n", + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Question 6\n", + "\n", + "\n", + "\n", + "*Screenshot taken from [Coursera](https://www.coursera.org/learn/ml-clustering-and-retrieval/exam/LXSnc/modeling-text-topics-with-latent-dirichlet-allocation)*\n", + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Question 7\n", + "\n", + "\n", + "\n", + "*Screenshot taken from [Coursera](https://www.coursera.org/learn/ml-clustering-and-retrieval/exam/LXSnc/modeling-text-topics-with-latent-dirichlet-allocation)*\n", + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Question 8\n", + "\n", + "\n", + "\n", + "*Screenshot taken from [Coursera](https://www.coursera.org/learn/ml-clustering-and-retrieval/exam/LXSnc/modeling-text-topics-with-latent-dirichlet-allocation)*\n", + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Question 9\n", + "\n", + "\n", + "\n", + "*Screenshot taken from [Coursera](https://www.coursera.org/learn/ml-clustering-and-retrieval/exam/LXSnc/modeling-text-topics-with-latent-dirichlet-allocation)*\n", + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Question 10\n", + "\n", + "\n", + "\n", + "*Screenshot taken from [Coursera](https://www.coursera.org/learn/ml-clustering-and-retrieval/exam/LXSnc/modeling-text-topics-with-latent-dirichlet-allocation)*\n", + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Question 11\n", + "\n", + "\n", + "\n", + "*Screenshot taken from [Coursera](https://www.coursera.org/learn/ml-clustering-and-retrieval/exam/LXSnc/modeling-text-topics-with-latent-dirichlet-allocation)*\n", + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Question 12\n", + "\n", + "\n", + "\n", + "*Screenshot taken from [Coursera](https://www.coursera.org/learn/ml-clustering-and-retrieval/exam/LXSnc/modeling-text-topics-with-latent-dirichlet-allocation)*\n", + "\n", + "" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 2", + "language": "python", + "name": "python2" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 2 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython2", + "version": "2.7.12" + }, + "toc": { + "toc_cell": false, + "toc_number_sections": false, + "toc_threshold": "8", + "toc_window_display": false + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git a/machine_learning/4_clustering_and_retrieval/assigment/week5/topic_models.zip b/machine_learning/4_clustering_and_retrieval/assigment/week5/topic_models.zip new file mode 100644 index 0000000..23d415f Binary files /dev/null and b/machine_learning/4_clustering_and_retrieval/assigment/week5/topic_models.zip differ diff --git a/machine_learning/4_clustering_and_retrieval/assigment/week5/week5-assignment1.pdf b/machine_learning/4_clustering_and_retrieval/assigment/week5/week5-assignment1.pdf new file mode 100644 index 0000000..6ab0c27 Binary files /dev/null and b/machine_learning/4_clustering_and_retrieval/assigment/week5/week5-assignment1.pdf differ