Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Joey, Austin, and Dan: produced record tying 100% solution #48

Open
wants to merge 1 commit into
base: gutenberg
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions algorithms
Submodule algorithms added at abaa65
1 change: 1 addition & 0 deletions data/arch.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
burial,dead,body,indians,feet,bones,dr,time,grave,place,house,timbers,work,hogán,houses,doorway,na,ia,small,ditto,tsa,sá,pa,zuñi,kiva,pueblo,village,house,stone,wall,walls,built,omaha
1 change: 1 addition & 0 deletions data/astronomy.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
star,stars,situated,culminates,color,constellation,line,head,sun,day,moon,year,time,stars,month,sun,earth,page,stars,miles,fig,days,star,power
1 change: 1 addition & 0 deletions data/phil.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
philosophy,religion,knowledge,nature,science,experience,world,real,nature,substance,reality,individual,actual,distinct,human,mind,plato,principle,intellect,gutenberg,project,prior,gods
1 change: 1 addition & 0 deletions data/religion.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
christ spirit heart praise heaven hell glory jesus luke mary holy scripture sin virtue lord thou thy great god faith man disciple psalm david king isreal
2 changes: 1 addition & 1 deletion data/stopwords.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
a,able,about,across,after,all,almost,also,am,among,an,and,any,are,as,at,be,because,been,but,by,can,cannot,could,dear,did,do,does,either,else,ever,every,for,from,get,got,had,has,have,he,her,hers,him,his,how,however,i,if,in,into,is,it,its,just,least,let,like,likely,may,me,might,most,must,my,neither,no,nor,not,of,off,often,on,only,or,other,our,own,rather,said,say,says,she,should,since,so,some,than,that,the,their,them,then,there,these,they,this,tis,to,too,twas,us,wants,was,we,were,what,when,where,which,while,who,whom,why,will,with,would,yet,you,your,one,out,more,now,first,two,very,such,same,shall,upon,before,therefore,great,made,even,same,work,make,being,through,here,way,true,see,time,those,place,much,without,body,whole,another,thus,set,new,given,both,above,well,part,between,end,order,each,form,gutenberg
a,able,about,across,after,all,almost,also,am,among,an,and,any,are,as,at,be,because,been,but,by,can,cannot,could,dear,did,do,does,either,else,ever,every,for,from,get,got,had,has,have,he,her,hers,him,his,how,however,i,if,in,into,is,it,its,just,least,let,like,likely,may,me,might,most,must,my,neither,no,nor,not,of,off,often,on,only,or,other,our,own,rather,said,say,says,she,should,since,so,some,than,that,the,their,them,then,there,these,they,this,tis,to,too,twas,us,wants,was,we,were,what,when,where,which,while,who,whom,why,will,with,would,yet,you,your,one,out,more,now,first,two,very,such,same,shall,upon,before,therefore,great,made,even,same,work,make,being,through,here,way,true,see,time,those,place,much,without,body,whole,another,thus,set,new,given,both,above,well,part,between,end,order,each,form,god,life,word,many,man,world,himself,name,words,things,day,good,death,thing,never,done,nothing,though,right,again,against,still,three,question,called,reason,
78 changes: 69 additions & 9 deletions lib/complex_predictor.rb
Original file line number Diff line number Diff line change
@@ -1,22 +1,82 @@
require_relative 'predictor'


class ComplexPredictor < Predictor
# Public: Trains the predictor on books in our dataset. This method is called
# before the predict() method is called.
#
# Returns nothing.
def train!
@data = {}

@all_books.each do |category, books|
@data[category] = {}
books.each do |filename, tokens|
tokens = tokens.select {|token| good_token?(token)}
sorted_token_array = tokens.inject(Hash.new(0)) {|k, v| k[v] +=1 ; k }.sort_by {|k,v| v}.reverse.slice(0..100).flatten
@data[category] = @data[category].merge(Hash[*sorted_token_array])
# @data[category][:books] += 1
end
end
puts @data
end

# Public: Predicts category.
#
# tokens - A list of tokens (words).
#
# Returns a category.
def predict(tokens)
# Always predict astronomy, for now.
:astronomy
# Public: Predicts category.
#
# tokens - A list of tokens (words).
#
# Returns a category.


def predict(tokens)

# philosophy = %w(philosophy knowledge nature science experience world real nature substance reality individual actual distinct human mind plato principle intellect gutenberg project prior)
# religion = %w(christ passion spirit heart praise heaven hell glory jesus luke mary holy scripture virtue lord thou thy great god faith man disciple psalm david king drink brazen israel psalms love men flesh sacrament words tithe paul john miracle)
# astronomy = %w(star stars situated culminates color constellation line head sun day moon year time stars month sun earth page stars miles fig days star power)
# archeology = %w(burial dead body indians feet bones dr time grave place house timbers work hogán houses doorway na ia small ditto tsa sá pa zuñi kiva pueblo village house stone wall walls built omaha)
temp_array = []
correct_category = nil
result_array = []

philosophy = @data[:philosophy]
archeology = @data[:archeology]
religion = @data[:religion]
astronomy = @data[:astronomy]



categories = [archeology, astronomy, philosophy, religion]
categories.each do |category|
result_array = []
tokens[300..-100].each do |token|
if category[token]
result_array << true
end
end
correct_category = category if result_array.count(true) > temp_array.count(true)
temp_array = result_array if result_array.count(true) > temp_array.count(true)
puts result_array.count(true)
end
end
# categories.each do |category|
# # temp_array = result_array
# result_array = []

# category.each do |word|
# result = tokens.include?(word)
# result_array << result
# correct_category = category if result_array.count(true) > temp_array.count(true)
# temp_array = result_array if result_array.count(true) > temp_array.count(true)
# end

# puts result_array.count(true)
# end

return :philosophy if correct_category == philosophy
return :religion if correct_category == religion
return :astronomy if correct_category == astronomy
return :archeology if correct_category == archeology
# Always predict astronomy, for now.
# :astronomy
end
end
# ComplexPredictor.predict(['brutal', 'spirit', 'praise'])
13 changes: 13 additions & 0 deletions lib/predictor.rb
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,19 @@ def tokenize(string)
string.split(/\W+/).map(&:downcase) # Split by non-words
end

# def keyword_density(dataset)
# CATEGORIES.each do |category|
# books[category] = {}
# Find.find("data/#{dataset}/#{category}") do |file|
# next if File.directory?(file)
# next if file.split("/").last[0] == "." # Ignore hidden files

# content = tokenize(File.read(file))
# books[category] << [file, content]
# end
# end
# end

# Internal: Load books from files.
#
# dataset - The dataset to use: sample, training, test.
Expand Down