java.io.UTFDataFormatException: encoded string too long #57

kvndrsslr-zz · 2016-03-19T17:33:25Z

Hello,

when attempting to generate links for two files with Linked Geospatial Data with SILK singlemachine version 2.7.0 I get the exception java.io.UTFDataFormatException: encoded string too long: 83821 bytes.
Indeed, the string literals in this file are very long WKT serializations of polygons and multipolygons, which to my understanding should be supported by SILK as it contains plugins designated to geospatial relations and distances.
The problem seems to lie in the core silk code rather than the plugins.
For reproducability I have created 3 pastebins containing the log with the exceptions stacktrace, the configuration file and the very small source dataset used in the linking task. The target dataset can be downloaded from datahub.io.
This article shows why this exception occurs and how it can be fixed.

Edit: As a temporary workaround I used awk 'length($0) < 65536' file > new_file on the datasets (Java streams can't store strings longer than 64 kbyte = 65536 byte).

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

java.io.UTFDataFormatException: encoded string too long #57

java.io.UTFDataFormatException: encoded string too long #57

kvndrsslr-zz commented Mar 19, 2016

java.io.UTFDataFormatException: encoded string too long #57

java.io.UTFDataFormatException: encoded string too long #57

Comments

kvndrsslr-zz commented Mar 19, 2016