forked from apache/any23
-
Notifications
You must be signed in to change notification settings - Fork 0
/
RELEASE-NOTES.txt
459 lines (347 loc) · 18 KB
/
RELEASE-NOTES.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
Apache Any23 2.0
Release Notes
03/02/2017 (dd/mm/yyy)
Sub-task
[ANY23-243] - Overhaul and update README.txt
Bug
[ANY23-79] - No execute permissions in command line tool
[ANY23-92] - NQuadsParser does not require whitespace between elements
[ANY23-99] - NQuadsWriter should force ASCII in OutputStream constructor
[ANY23-153] - Automatically Generate EARL reports for Any23 RDF Parsers
[ANY23-176] - DOC: Apache Any23 Installation Guide
[ANY23-200] - Build revision is not correctly defined
[ANY23-219] - rover is does not work with -f nquads option
[ANY23-235] - NQuads links broken on Supported Formats Page
[ANY23-236] - Port Any23 site to Apache CMS
[ANY23-248] - NTriplesWriter on hadoop : issue with MIME type/Upgrade sesame dependencies to 2.7.14
[ANY23-252] - JSON-LD format MIME type is not detected
[ANY23-253] - JSON-LD cannot be processed by Rover
[ANY23-255] - apache-any23-quads dependency should not be <scope> test in core pom.xml
[ANY23-265] - ThreadSafety issue in ItemPropValue
[ANY23-272] - Service fails to start with any23server.bat
[ANY23-277] - Any23 master branch will not build to to build due to lacking maven-assembly-plugin
[ANY23-279] - Fix EmbeddedJSONLDExtractor ExtractorDescription getDescription() implementation
[ANY23-296] - Tar complains about groupid value being too big
[ANY23-302] - rover JSON output is not valid
Improvement
[ANY23-80] - Split out command line tools into a separate module
[ANY23-163] - VocabPrinter tool broken with No writer factory available for RDF format N-Quads (mimeTypes=text/x-nquads; ext=nq)
[ANY23-185] - Add missing <meta> element attributes to HTMLMetaExtractor
[ANY23-207] - Implement Microformats2
[ANY23-246] - Add Open Graph Protocol and Facebook prefixes to popular.prefixes
[ANY23-247] - FIX Attribute name "itemscope" associated with an element type "html" must be followed by the ' = ' character.
[ANY23-250] - Upgrade to Tika 1.7
[ANY23-261] - Tiny typo in Data Extraction documentation source example
[ANY23-263] - Upgrade to Tika 1.14
[ANY23-274] - Change any23.microdata.ns.default configuration value to http://schema.org
[ANY23-276] - Upgrade sesame dependencies to RDF4J
[ANY23-278] - Upgrade all Maven plugin versions in parent pom.xml
[ANY23-293] - Package log4j configuration with core appassembler
[ANY23-297] - Any23 doesn't build under JDK1.8
[ANY23-299] - Missing YAML to RDF parser
[ANY23-300] - Ignore NetBeans configuration files
Task
[ANY23-141] - Upgrade OpenRDF Sesame to 2.7.0
[ANY23-242] - Address issues with 1.1 #1 RC
Wish
[ANY23-19] - Abstract away any specific RDF APIs
[ANY23-226] - Extract JSON-LD embedded in HTML
Apache Any23 1.1
Release Notes
15/10/2014 (dd/mm/yyyy)
Bug
[ANY23-205] - Remove xrefs from Any23 site and replave with Git(hub) links
[ANY23-220] - Run crawler plugin on Apache Any23 site
[ANY23-234] - No writer factory available for RDF format N-Quads (mimeTypes=text/x-nquads; ext=nq)
Improvement
[ANY23-157] - Update Any23 site to accommodate move to Git.
[ANY23-197] - Extract embedded json-ld from html documents
[ANY23-204] - fix url encoding problem : PR#3
[ANY23-209] - Bug in site generation
[ANY23-221] - Enable JSON-LD as an input format for the WebService at any23.org
[ANY23-238] - Fix generation of BNode name for microdata when 'itemid' is given without a value.
New Feature
[ANY23-7] - Performance test suite
[ANY23-160] - [SECURITY] Frame injection vulnerability in published Javadoc
Task
[ANY23-222] - Push 1.1-SNAPSHOT artifacts to the Any23 website
Apache Any23 1.0
Release Notes
09/05/2014 (dd/mm/yyyy)
Sub-task
[ANY23-148] - Programmes Ontology
Bug
[ANY23-100] - Issue with RDFa extractor while processing nested properties
[ANY23-135] - Any23 RDFa Extractor ignores multiple prefix and property statements
[ANY23-136] - Some RDFa tests have incorrect expected results
[ANY23-168] - RDFa properties in <meta> elements not picked up
[ANY23-170] - Dependency error org.apache.commons:commons-csv:1.0-SNAPSHOT-rev1148315
[ANY23-172] - Fix minor issues with Any23 0.9.0 RC
[ANY23-173] - Please delete old releases from mirroring system
[ANY23-174] - Incorrect RDFa extractions
[ANY23-203] - Update version revisions from 0.9.1 to 1.0
Improvement
[ANY23-65] - Update to RDFa extraction stylesheet
[ANY23-128] - html-rdfa11 extractor fails on mailto: anchors
[ANY23-130] - Improve aesthetics of the output format when straying from default java.io.PrintStream
[ANY23-137] - RDFa parser implementation proposal
[ANY23-179] - Improve Javadoc and throwing of IllegalArgumentException in Any23#createDocumentSource
[ANY23-180] - Create an Apache hosted jail running an Any23 service instance
[ANY23-181] - Upgrade NekoHTML to 1.9.20
New Feature
[ANY23-134] - Create o.a.a.extractor.tika Parser and Extractor implementations
[ANY23-177] - Add support for JSON-LD
Task
[ANY23-162] - Add package.java for all LKIFCore classes
Apache Any23 0.9.0
Release Notes
28/10/2013 (dd/mm/yyyy)
Sub-task
[ANY23-142] - LKIF-Core Vocabulary
[ANY23-143] - LRICore Vocabulary
Bug
[ANY23-111] - Any23 raises an unmanaged exception from the Microdata parser
[ANY23-115] - Empty spans seem to break ANY23
[ANY23-161] - Fix service file generation
[ANY23-165] - "Invalid content" error if TITLE precedes encoding declaration in the document
[ANY23-171] - form.html not in correct location in service.
Improvement
[ANY23-47] - Migrate basic-crawler classes to org.apache.nutch
[ANY23-164] - office-scraper ExcelExtractorFactory.java to accept application/x-tika-ooxml and application/x-tika-msoffice formats
New Feature
[ANY23-120] - Split CLI tools out into a new module
Task
[ANY23-122] - Cleanup Distribution Mirrors
Apache Any23 0.8.0
Release Notes
01/05/2013 (dd/mm/yyyy)
Sub-task
[ANY23-109] - Missing tika-config.xml in o.a.a.mime
[ANY23-110] - DOAP Vocabulary
Bug
[ANY23-44] - error when parsing a document from http://www.afdsi.org/docs/test/html/RDFa/_food-stream_.htm
[ANY23-78] - Download page links are broken
[ANY23-108] - Broken schema.org microdata extraction
[ANY23-112] - Fix incubation disclaimer
[ANY23-113] - Remove dependencies from parent pom.xml file
[ANY23-116] - Empty values are skipped when reading tab separated CSV.
[ANY23-156] - Add logging dependencies to plugins and service
Improvement
[ANY23-2] - Add support for hreview-aggregate microformat.
[ANY23-26] - Upgrade dependency to Apache Tika 1.2
[ANY23-46] - Update Any23 web service
[ANY23-83] - Remove hardcoded formats throughout Any23 to make it useful as a library
[ANY23-101] - Use RDFFormat.NQUADS in nquads module
[ANY23-139] - Simplify site deploy plugging the maven-scm-publish-plugin
[ANY23-144] - Implement comprehensive naming of o.a.a.api.vocab classes
New Feature
[ANY23-4] - Integrate W3C's RDFa test suite and pass all tests
[ANY23-85] - Split NQuads out into its own module
[ANY23-96] - Add user agent string to basic-crawler
[ANY23-117] - Split Mime type detection out into its own module
[ANY23-118] - Split Encoding detection out into its own module
Task
[ANY23-41] - Write basic-crawler plugin documentation
[ANY23-125] - Drop the Incubating DISCLAIMER
Apache Any23 0.7.0-incubating
Release Notes
25/06/2012
Sub-task
[ANY23-25] - Update all Maven POM's in trunk
[ANY23-31] - Move any23 site documentation out of trunk and into its own SVN directory
[ANY23-53] - Bad Web Service documentation
Bug
[ANY23-14] - Add support for Extractor sub results
[ANY23-20] - The Any23 PluginManager fails handing resource paths containing spaces.
[ANY23-34] - Plugin Integration Test Fails
[ANY23-37] - LGPL'ed components cannot be included in distribution packages
[ANY23-42] - Fix issue in RDFa11Parser.java is not resolving relative URIs correctly
[ANY23-49] - N3/NQ parsers ignoring stopAtFirstError flag
[ANY23-58] - HCardExtractor infinite loop and memory exhaustion
[ANY23-62] - ExtractionResultImpl loses all issues generated by sub extractions
[ANY23-73] - The ToolRunner CLI driver -p (--plugins-dir) option doesn't work because parsed after the Tool list loading
[ANY23-77] - Facing a infinite loop problem in version 0.6.1 - Verify
[ANY23-78] - Download page links are broken
[ANY23-87] - Bogus arguement in o.a.a.cli.CrawlerTest
[ANY23-88] - any23 script -v or --version option doesn't display actual version
[ANY23-94] - The Microdata CLI tool doesn't work anymore
[ANY23-95] - Activate the IgnoreAccidentalRDFa filter for the Any23 Service instance
[ANY23-97] - The test suite was not running all tests, minor regressions occurred
Improvement
[ANY23-18] - Add a new extractor for RDFa using java-rdfa
[ANY23-28] - Document munging of Any23 history to CHANGES.txt
[ANY23-32] - replace hardcoded bash script with generated via appassembler
[ANY23-33] - Replace proprietary SUN imports from Any23 classes.
[ANY23-45] - Improve issue verification support in Extractor tests
[ANY23-50] - Simplify plugin loading avoiding the classpath scanning
[ANY23-56] - Change repo-ext to Any23 SVN mirrior repo.
[ANY23-63] - The Any23 web service doesn't return the Issue Report generated by activated Extractors, hiding major metadata issues
[ANY23-64] - Improve CLI uage aesthetics
[ANY23-70] - Establish searchable list archives
[ANY23-71] - improve the current CLI engine
[ANY23-74] - Disable domain triple generation in default configuration
[ANY23-75] - Improve runtime of the Microdata extractor on documents with many relations.
[ANY23-76] - Improve runtime of the Microformat extractor on documents with many relations.
[ANY23-82] - Don't use explicit reference to Log4j classes
[ANY23-86] - Better logging in SiteCrawlerTest
New Feature
[ANY23-9] - Prepare a dedicated homepage for Any23
[ANY23-29] - Migrate code base to ASF infrastructure
[ANY23-57] - Create Any23 History documentation and add to site
[ANY23-59] - Create KEYS file for Any23
[ANY23-68] - Create Powered By documentation/page
[ANY23-102] - Any23 DOAP file
Task
[ANY23-21] - Migrate all packages and classes to ORG.APACHE.ANY23
[ANY23-27] - Import revisions r1547 to r1607 from Google Code SVN to ASF SVN
[ANY23-36] - Merge GCode specific CHANGES.txt report in main changes.xml
[ANY23-39] - Write Down Overall Architecture Document to help new developers maintaining the Any23 core
[ANY23-48] - Update Documentation (Site + READMEs) to reflect changes in shell script usage
[ANY23-52] - Remove non ASF logos from Any23 Service page
[ANY23-66] - Fix Javadoc
==========================================================================
Apache Any23 0.6.1
Release Notes
Fixes
* Improved MIMEType detection for CSV input. [172, 176]
==========================================================================
Apache Any23 0.6.0
Release Notes
Fixes
* Fixed several bugs. [151, 153, 154, 155, 156, 164, 168]
* Removed unused Apache Any23 dependencies. [162]
* Introduced parent POM dependencyManagement. [163]
* Minor code refactoring. [142]
* Updated project documentation. [161]
Enhancements
* Added support for Microdata [114, 141, 144, 145, 152, 157]
* Added RDFa 1.1 support for new prefix specification. [143]
* Added CSV Extractor (RDFizer). [150, 165]
* Added HTML/META Extractor. [148, 149]
* Improved Configuration programmatic management. [147]
* Added several flags to control metadata triples generation. [146]
* Improved nesting relationship explicitation in Microformat extractors. [80]
* Major Extractor interface refactoring. [160, 167]
* Improved TagSoup Extractor based error reporting. [159]
* Added command-line tool to print out the Apache Any23 declared vocabularies. [114]
==========================================================================
Apache Any23 0.6.0-M2
Release Notes
The release 0.6.0-M2 introduces major fixes on M1 milestone
[154, 155, 156] and improves Configuration [147] and Microdata
error management[157].
==========================================================================
Apache Any23 0.6.0-M1
Release Notes
The release 0.6.0-M1 is an early preview of the
Microdata support. [114]
==========================================================================
Apache Any23 0.5.0
Release Notes
Fixes
* Fixed wrong conversion of a generic XML file to RDF. [131]
* Fixed usage of 'base' tag when resolving relative URIs
in RDFa. [75]
* Fixed error parsing Turtle data. [87]
* Fixed issue with escaping in NQuads parser. [126]
* Fixed XML DTD validation attempt. [95]
* Fixed concurrent modification exception in
ExtractionContentBlocker filter. [86]
* Fixed mime type detection of direct input when source
contains blank chars. [83, 90]
* Fixed reporting when producing no triples. [79]
* Fixed any23-service packaging, added profile for excluding
embedded dependencies. [113]
Enhancements
* Improved extraction report: added list of
activated extractors. [89]
* Improved extraction of HTML link element. [133]
* Added XPath HTML extractor. [124]
* Added HRecipe Microformat extractor. [103]
* Added plugin support for Apache Any23. [111]
* Implemented HTML Scraper Plugin. [123]
* Upgraded to Sesame 2.4.0. [136]
* Upgraded to Jetty 8.0.0 [138]
* Upgraded maven-site-plugin. [85]
* Added flags to exclude metadata triples [134]
* Added removal of CSS related triples. [135]
* Improved overall documentation. [130]
* Overall POM refactoring. [125]
==========================================================================
Apache Any23 0.4.0
Release Notes
* The any23-service module has been separated from the any23-core module,
the Ant build system has been dropped. [Issue 44]
* Added support for HTML metadata (RDFa / Microformats) validation
and correction (validator). [Issue 77]
* Added flag to disable the nesting relationship property
enrichment. [Issue 67]
* Improved coverage of Microformats tests. [Issue 65]
* Improved documentation. [Issue 44]
* Various code consolidation. [Issues 68, 69, 70, 71, 72, 73, 74, 77]
==========================================================================
Apache Any23 0.3.0
Release Notes
* Added detection and enrichment of nested microformats. [Issue #61]
* Added detection and support of N-Quads as input and output format. [Issue #7]
* General Improvements in RDFa extraction. [Issue #12, Issue #14]
* Added support of Turtle embedded in HTML script tag. [Issue #62]
* Improvement in encoding support. [Issue #43]
* Improvement in Core API. [Issue #27]
* Improved support for Species Microformat. [Issue #63]
* General Code prettification.
==========================================================================
Apache Any23 0.2.2
Release Notes
* Fixed dependency management on Maven. A second level dependency of Xerces
introduced a conflict on the java.xml.transform API causing wrong XSLT
transformations within RDFa extractor.
==========================================================================
Apache Any23 0.2.1
Release Notes
* Major applyFix on Tika configuration management. This applyFix solves the
auto detection of the main Semantic Web related formats.
==========================================================================
Apache Any23 0.2
Release Notes
============
Introduction
============
This release features a redesigned API and incorporating enhancements and
bug fixes that have accumulated since the 0.1 release.
Apart from some new or changed dependencies on the underlying libraries,
this version comes with an improved unit test coverage and other features
like the automatic charset encoding detection and an improved documentation.
Maven build system has been introduced.
==================================
Summary of major changes since 0.1
==================================
* Redesigned Java API
- Input from string, stream, file, or URI
- Allow choosing which extractors to use
- Report origin of triples (document/extractor) to client processors
- Various processors/serializers for extracted triples
* Added flexible command-line tool for easy testing
* Vastly improved website and documentation
* Media type and encoding detection via Apache Tika
* Switched RDF library from Jena to Sesame
* Added Maven build
* Better RDF extraction from Microformats
* Extractors now come with an example file to document typical in- and output
* Major refactoring
* Lots and lots of bugfixes
=================
Supported formats
=================
* RDF/XML
* Notation3 and Turtle
* N-Triples
* RDFa
Various microformats, see http://sindice.com/developers/microformat on Sindice Microformats support.
===================
Dependency Upgrade
===================
CyberNeko Html parser has been upgraded to 1.9.14.
Apache Tika 0.3 has been replaced with 0.6, with the
new support for the automatic encoding detection.
EOF