-
Notifications
You must be signed in to change notification settings - Fork 1
/
index.html
216 lines (183 loc) · 11.4 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
<!DOCTYPE html>
<html lang="en">
<head>
<!-- Required meta tags -->
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<title>HiCAL - A System for Efficient High-Recall Retrieval</title>
<!-- Bootstrap & Font Awesome CSS -->
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0-alpha.6/css/bootstrap.min.css" integrity="sha384-rwoIResjU2yc3z8GV/NPeZWAv56rSmLldC3R/AZzGRnGxQQKnKkoFVhFQhNUwEyJ" crossorigin="anonymous">
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/font-awesome/4.5.0/css/font-awesome.min.css">
<link rel="stylesheet" href="static/style.css">
<link rel="shortcut icon" href="images/favicon.ico" type="image/x-icon">
<link rel="icon" href="images/favicon.ico" type="image/x-icon">
<script src="https://code.jquery.com/jquery-3.1.1.slim.min.js" integrity="sha384-A7FZj7v+d/sdmMqp/nOQwliLvUsJfDHW+k9Omg/a/EheAdgtzNs3hpfag6Ed950n" crossorigin="anonymous"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/tether/1.4.0/js/tether.min.js" integrity="sha384-DztdAPBWPRXSA/3eYEEUWrWCy7G5KFbe8fFjk5JAIxUYHKkDx6Qin1DkWx51bBrb" crossorigin="anonymous"></script>
<script src="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0-alpha.6/js/bootstrap.min.js" integrity="sha384-vBWWzlZJ8ea9aCX4pEW3rVHjgjt7zpkNpZk+02D9phzyeVkE+jo0ieGizqPLForn" crossorigin="anonymous"></script>
<!-- Global site tag (gtag.js) - Google Analytics -->
<script async src="https://www.googletagmanager.com/gtag/js?id=UA-122235632-1"></script>
<script>
window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date());
gtag('config', 'UA-122235632-1');
</script>
</head>
<body>
<div class="navbar-container">
<nav class="navbar navbar-toggleable-md navbar-light ml-4 mr-4">
<a class="navbar-brand" href="index.html"><img src="images/hical.png" width="100px"></a>
<span class="navbar-text mb-3 mr-5">
<span class=" mr-1"></span> A System for Efficient High-Recall Retrieval.
</span>
<div class=" navbar-collapse-right" id="navbarTogglerDemo03">
<ul class="navbar-nav mr-auto mb-3">
<li class="nav-item active">
<a class="nav-link" href="#">Install<span class="sr-only">(current)</span></a>
</li>
<li class="nav-item">
<a class="nav-link" href="usage.html">Usage</a>
</li>
<li class="nav-item">
<a class="nav-link" href="documentation.html">Documentation</a>
</li>
<li class="nav-item">
<a class="nav-link" href="paper.html">Paper</a>
</li>
<li class="nav-item">
<a class="nav-link" href="https://github.com/hical">Github</a>
</li>
</ul>
</div>
</nav>
</div>
<div class="content">
<h2>Want to find all relevant documents?</h2>
<p>HiCAL is a system for efficient high-recall retrieval.
The system allows retrieving and assessing relevant documents and provides high data processing performance and a user-friendly document assessment interface.
</p>
<hr>
<h2>Installation</h2>
<h4><a name="requirements">Requirements</a></h4>
<p>The system is dockerized into different docker images. Make sure your machine has these installed </p>
<ul>
<li><code>docker</code>: Refer to this <a href="https://www.digitalocean.com/community/tutorials/how-to-install-and-use-docker-on-ubuntu-16-04">installation guide</a> if you would like to install it on Ubuntu 16.04</li>
<li><code>docker-compose</code>
</ul>
<h4><a name="corpus">Corpus and How to Install</a></h4>
<p>Before you start installing, you need to make sure your corpus is ready and in the right format.
This step is necessary to be able run the system. We will work with a toy corpus from <a href="https://github.com/hical/sample-dataset">https://github.com/hical/sample-dataset</a></p>
<pre><code>
# Checkout the repo
git clone https://github.com/HiCAL/HiCAL.git
cd HiCAL
# Checkout the sample dataset
git clone https://github.com/hical/sample-dataset.git
cd sample-dataset
python process.py athome4_sample.tgz
# Create the data directory which is mounted to the docker containers
mkdir ../data
cp athome4_sample.tgz athome4_sample_para.tgz ../data/
cd ..
</code></pre>
<p><a href="https://github.com/hical/sample-dataset/blob/master/process.py"><code>process.py</code></a> is an example of how one might clean the corpus and generate excerpts. We will use the <code>athome4_sample.tgz</code> and the newly generated <code>athome4_sample_para.tgz</code> to generate document features.</p>
<pre><code>
# Build and access the shell from the cal container
docker-compose -f HiCAL.yml run cal bash
root@container-id:/# cd src && make corpus_parser
# Generate features
root@container-id:/# ./corpus_parser --in /data/athome4_sample.tgz --out /data/athome4_sample.bin --para-in /data/athome4_sample_para.tgz --para-out /data/athome4_para_sample.bin
# Exit the shell with Ctrl+D
</code></pre>
<p>We will now generate the document and paragraph files which will be showed to the assessors. Access and parsing logic for a corpus is decribed in the <a href="https://github.com/hical/sample-dataset/blob/master/functions.py"><code>functions.py</code></a> file. HiCAL by default supports the xml documents from the New York Times corpus. We have provided an alternative <a href="https://github.com/hical/sample-dataset/blob/master/functions.py"><code>functions.py</code></a> which works for <code>athome4</code>.</p>
<pre><code>
# Extract the tgz files
cd data
tar xvzf athome4_sample.tgz
mv athome4_test docs
tar xvzf athome4_sample_para.tgz
mv athome4_test para
cd ..
# Use the modified functions.py
cp sample-dataset/functions.py HiCALWeb/hicalweb/interfaces/DocumentSnippetEngine/functions.py
# We are all set! Lets fire up the containers
DOC_BIN=/data/athome4_sample.bin PARA_BIN=/data/athome4_para_sample.bin docker-compose -f HiCAL.yml up -d
# Visit localhost:9000
</code></pre>
<p>If you get a <code>502 Bad Gateway</code> error, please wait few seconds while the containers finish processing.</p>
<p>Port <code>9001</code> and <code>9000</code> will be used by system. Make sure these ports are not being used by other applications in your machine. If you would like to change these ports, please read the configuration section below.</p>
<p>We strongly recommend installing via docker. If you would like to not use docker, please look at the documentation page for more details.</p>
<h4>How to run</h4>
<p>Once your docker images are up and running (you can verify by running <span class="code-inline">docker-compose -f HiCAL.yml ps</span>), open your browser to <a href="http://localhost:9000/" class="code-inline">http://localhost:9000/</a>.
You should be able to access system's web interface. If you are still unable to view the web interface, try replacing <span class="code-inline">http://localhost</span> with the ip address of your docker machine (you can get the ip by running <span class="code-inline">docker-machine ip</span>)
</p>
<div class="row">
<div class="col-md-4">
<figure class="figure">
<img src="images/login.png" class="figure-img img-fluid rounded p-1 " alt="Login page of HiCAL.">
<figcaption class="figure-caption ">Figure 1: Login page of HiCAL. You can also use the practice account by clicking on the "click to practice" button.</figcaption>
</figure>
</div>
<div class="col-md-4">
<figure class="figure">
<img src="images/home_notopic.png" class="figure-img img-fluid rounded p-1 " alt="Homepage HiCAL.">
<figcaption class="figure-caption ">Figure 2: After logging in to HiCAL, create or select a topic of search. This will start a session and train the Machine learning model.</figcaption>
</figure>
</div>
<div class="col-md-4">
<figure class="figure">
<img src="images/home_topicinit.png" class="figure-img img-fluid rounded p-1 " alt="Homepage HiCAL.">
<figcaption class="figure-caption ">Figure 3: After initiating a topic, you can select any of the retrieval component (CAL or Search) from navigation bar.</figcaption>
</figure>
</div>
</div>
<h4>Configuration</h4>
<p>Most of the configuration can be performed through these two files:</p>
<ul>
<li><span class="code-inline">config/nginx/nginx.conf</span> :</li>
<li style="list-style-type: none;">
<p>This file controls the nginx server. By default, the CAL is accessed through port <code>9001</code> and the web interface is accessed through port <code>9000</code>. These ports are exposed to the outside world by the docker (specified in <span class="code-inline">HiCAL.yml</span>).
Our system uses the nginx instance to serve document and paragraphs to the web interface.</p>
</li>
<li><span class="code-inline">HiCAL.yml</span> :</li>
<ul>
<li>CAL:</li>
<li style="list-style-type: none;">
<p>
<code>./data</code> is mounted to the volume <span class="code-inline">/data</span> which is meant to be used the <span class="code-inline">bmi_fcgi</span>. Keep document features and related files over there. Modify <span class="code-inline">command</span> field if required.</li>
</p>
</li>
<li>nginx:</li>
<li style="list-style-type: none;">
<p>
the container uses <span class="code-inline">config/nginx/nginx.conf</span> as the config. Make changes to the volumes as required.
</p>
</li>
</ul>
</ul>
<h5>Configuring Search</h5>
<p>
The system allows integration of a search engine. Any search engine can be integrated with minimal effort.
<p>To integrate a search engine, please read the <a href="documentation.html#IntegratingSearch">documentation</a>.</p>
</p>
<h2 class="mt-5">Bugs, issues or feature requests?</h2>
<p>Please report <a href="https://www.github.com/hical/hical/issues">here</a>.</p>
<h2 class="mt-5">LICENSE</h2>
<a href="http://www.gnu.org/licenses/gpl.html"><img src="images/gpl.png"></a>
</div>
<div class="content">
<ul class="sm" style="list-style: none;">
<li>
<a href="https://github.com/HiCAL" target="_blank"><i class="fa fa-github"></i></a> <span class="ml-3">/</span>
</li>
<li class="small mt-1">
GitHub code.
</li>
<li class="small mt-1">
GNU General Public License v3.0
</li>
</ul>
<p class="copy"></p>
</div>
</body>
</html>