Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for timezone conversions in DateFormatTagger? #75

Open
ronjakoi opened this issue Feb 19, 2018 · 2 comments
Open

Support for timezone conversions in DateFormatTagger? #75

ronjakoi opened this issue Feb 19, 2018 · 2 comments

Comments

@ronjakoi
Copy link

I have to crawl an intranet site that provides the last modified timestamps of articles in a meta tag like this: <meta name="LASTMODIFIED" content="19.02.2018 12:40">

This is easily handled by DateFormatTagger. However, there is a problem with timezones: the intranet provides the time in local time, while Solr expects it in UTC.

Can you please add support for timezone conversions in DateFormatTagger? In the meantime, is there a workaround for my problem, other than using ScriptTagger to manipulate the date after DateFormatTagger?

@essiembre
Copy link
Contributor

Good suggestion. I am making this a feature request.

In the meantime, here are a few workarounds you can try:

  • Modify the launch script to add the following argument to the java command executed (or change GMT with another timezone):
java -Duser.timezone=GMT
  • Modify the launch script to set the timezone environment variable. On Linux it could look like this:
export TZ=UTC
# or export TZ=UTC+4:00 (or whatever difference)

@ronjakoi
Copy link
Author

I did this:

<!-- meta-lastmod -->
<tagger class="com.norconex.importer.handler.tagger.impl.DateFormatTagger"
    fromField="LASTMODIFIED"
    toField="meta-lastmod"
    toFormat="yyyy-MM-dd'T'HH:mm:ss" >
    <fromFormat>dd.MM.yyyy HH:mm</fromFormat>
</tagger>

<!-- meta-published -->
<tagger class="com.norconex.importer.handler.tagger.impl.DateFormatTagger"
    fromField="PUBLISHED"
    toField="meta-published"
    toFormat="yyyy-MM-dd'T'HH:mm:ss" >
    <fromFormat>dd.MM.yyyy HH:mm</fromFormat>
</tagger>

<tagger class="com.norconex.importer.handler.tagger.impl.ScriptTagger">
    <script><![CDATA[
        var date_fields = ['meta-lastmod', 'meta-published'];
        date_fields.forEach(function(df) {
            if(metadata[df]) {
                var d = new Date(metadata[df][0]);
                // Date.toISOString() always returns UTC time
                metadata.setString(df, d.toISOString());
            }
        });
    ]]></script>
</tagger>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants