Skip to content
This repository has been archived by the owner on Aug 27, 2024. It is now read-only.

Time series naming: Encoding issue with Umlaut in results of x13/requests #13

Open
suweg opened this issue May 8, 2018 · 3 comments
Open
Assignees
Labels

Comments

@suweg
Copy link

suweg commented May 8, 2018

Hi,

This issue is related to the bug fixed in #10. In 2.2.1 the timeseries are not renamed into "series1" ... anymore. Thanks for that fix.

However, there seems to be a problem with the encoding of Umlauts in the timeseries names of the resulting xml-file.

This is a snippet of the x13requests.xml I am sending. The encodig is correctly set to UTF-8 in the header:

<x13:Series name="Auftragsbestand Inland / in jeweiligen Preisen / Deutschland / Konsumgüter">

The umlaut ü is correctly encoded as u'\u00fc'.

And this is what I receive from the WS:
<tss:item name="Auftragsbestand Inland / in jeweiligen Preisen / Deutschland / Konsumg端ter">
The unicode character changed into u'\u7aef', a Chinese character.

I am direcly fetching the results. When I try to print the response, I am getting an UnicodeEcodeError.

Any idea about this?

Regards,
Susanne

@maggima
Copy link
Collaborator

maggima commented Apr 9, 2019

Dear Susanne,

Can you give me further information about which call are you making to the webservice please ?
Is the request made from Java or another language ? What's the calling source code ? Are you adding special headers ?

I don't have encoding problems calling the service locally on my machine.
But I have a solution in mind adding on the service a filter before returning the response which adds the encoding to the Content-Type :

public class CharsetResponseFilter implements ContainerResponseFilter {

    @Override
    public void filter(ContainerRequestContext request, ContainerResponseContext response) {
        MediaType type = response.getMediaType();
        if (type != null) {
            if (!type.getParameters().containsKey(MediaType.CHARSET_PARAMETER)) {
                MediaType typeWithCharset = type.withCharset("utf-8");
                response.getHeaders().putSingle("Content-Type", typeWithCharset);
            }
        }
    }
}

And then you can register it in the ApplicationConfig.java :

resources.add(ec.nbb.ws.filters.CharsetResponseFilter.class);

@suweg
Copy link
Author

suweg commented Apr 11, 2019

Dear Mats,

I am sending the request with a Python script using the following code line and then saving it in the next one:

    response = requests.post(url=ws_server_url+x13_api_point, headers=headers, data=x13reqFile)
    with codecs.open(response_filename, 'w', 'utf-8') as f:
        f.write(response.text)

We generate the requests (x13reqFile) with the CLI-tools and they only have the standard header.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

Actually, we didn't encounter this issue any more. It was indirectly fixed with " Fixed ts naming in requests. #5 ". Now we don't have ts-names with non ascii characters any more.

But I did a test and we still would have the problem otherwise:
I sent a request with the time series name " pdp310fä" and when I opened the result in Notepad++ (set to UTF-8, of course), it looked like this: "pdp310fä". No error was thrown though.

Does this answer your questions?

@maggima
Copy link
Collaborator

maggima commented Apr 30, 2019

It would really useful to get one of your files that causes the problem so I can test better during which operation we have that encoding issue (request, response, reading the file,...)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants