Threads-JDBS, Grade 78.00 / 100.00

In this exercise, I build a simple web crawler that scans URLs of images, record thread performance, and stores them in a database (only the URL, not the image itself).

Requirements for my exercise:

Input: a TEXT file containing one URL per line. The program must check that the file exists (if it doesn’t it displays an error message and exits), the file may be empty in which case it will do nothing.
The program receives 4 mandatory arguments: a. first a pool size (positive non zero number, see below), b. second a delay for retries (positive non zero milliseconds), c. third a number of retries, d. fourth a file name (1)
If any argument is missing or invalid it displays a usage message and exits
The program creates one thread per URL to be processed.
The number of simultaneous threads is limited (the pool size) therefore you must handle a thread pool for threads processing the URLs (explained in class)
The URL will be analyzed and only URL of images will be inserted in the database. If the image URL is already in the database, it should not be inserted (no duplicates).
When connecting to a URL, in case of connection/read failure your thread should sleep() for the duration given as delay for retries (the 2nd argument given in the program), then try X times to reconnect (X being the 3rd argument given in the program)
In the case the URL is simply malformed there is no point to try again (you give up), the retries should be used only on valid URLs (connection problems). Program Arguments The main class of your program is named Crawl. For example: java Crawl 5 1000 3 urls.txt means the program will handle a pool size of 5 threads, the retry delay will be one second, the max number of retries will be 3 times and the file containing the urls will be named urls.txt.

Output: The program records the performance of each thread: elapsed time (thread duration measured with System.currentTimeMillis()), and print out results in the order the input file was given (which is not necessarily the threads execution order): http://a.b.c/logo.jpeg : 1002 ms http://a.b.c/abc: 45 ms http://a.b.c/def : timeout x.com/gh : failed

I noted the possible cases: • Success opening the URL, displaying the thread duration • Malformed URL: displaying “failed” • Tried and ran out of time: “timeout”

The program also displays the whole content of the database after all URL are entered (perform a “SELECT *” in the table after all threads are finished).

The database: The database named ex2 is defined as follow. I assume it already exists.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Blank.html		Blank.html
Crawl.html		Crawl.html
Crawl.java		Crawl.java
LessThenZeroExcepton.html		LessThenZeroExcepton.html
LessThenZeroExcepton.java		LessThenZeroExcepton.java
LineOfThreads.html		LineOfThreads.html
LineOfThreads.java		LineOfThreads.java
NotEnoughtArgumentsException.html		NotEnoughtArgumentsException.html
NotEnoughtArgumentsException.java		NotEnoughtArgumentsException.java
README.html		README.html
README.md		README.md
ReadFromFile.html		ReadFromFile.html
ReadFromFile.java		ReadFromFile.java
ReadURL.html		ReadURL.html
ReadURL.java		ReadURL.java
allclasses-index.html		allclasses-index.html
allclasses.html		allclasses.html
allpackages-index.html		allpackages-index.html
configuration.png		configuration.png
configuration2.png		configuration2.png
constant-values.html		constant-values.html
deprecated-list.html		deprecated-list.html
help-doc.html		help-doc.html
index-1.html		index-1.html
index-10.html		index-10.html
index-11.html		index-11.html
index-12.html		index-12.html
index-2.html		index-2.html
index-3.html		index-3.html
index-4.html		index-4.html
index-5.html		index-5.html
index-6.html		index-6.html
index-7.html		index-7.html
index-8.html		index-8.html
index-9.html		index-9.html
index.html		index.html
overview-tree.html		overview-tree.html
package-summary.html		package-summary.html
package-tree.html		package-tree.html
serialized-form.html		serialized-form.html
urls		urls
web.xml		web.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Threads-JDBS, Grade 78.00 / 100.00

About

Releases

Packages

Languages

igorm705/Threads-JDBS

Folders and files

Latest commit

History

Repository files navigation

Threads-JDBS, Grade 78.00 / 100.00

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages