You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm wondering if there is a way to get better recovery after failures. I have about 35K unique lat/lon points I'm getting census blocks for. At 1 second/request that's about 9 hours. After about 7 hours I got this failure below. The script didn't recover and the data collected before was lost. I'm wondering if there is a more graceful way that the code can recover and at least keep what was done previously.
[01:53:45] [INFO] [dku.utils] - 17651 - processing: (40.0309365,-105.2930896)
[01:53:46] [INFO] [dku.utils] - 17652 - processing: (40.0309393,-105.2643413)
[01:55:43] [INFO] [dku.utils] - *************** Recipe code failed **************
[01:55:43] [INFO] [dku.utils] - Begin Python stack
[01:55:43] [INFO] [dku.utils] - Traceback (most recent call last):
[01:55:43] [INFO] [dku.utils] - File "/Users/jeffers/Library/DataScienceStudio/dss_home/jobs/BOULDERCOUNTYSOURCETRANSFORMS/Build_census_blocks_2017-09-13T00-37-50.653/compute_census_blocks_NP/custompyrecipehdMLSMuNyO49/python-exec-wrapper.py", line 3, in<module>
[01:55:43] [INFO] [dku.utils] - execfile(sys.argv[1])
[01:55:43] [INFO] [dku.utils] - File "/Users/jeffers/Library/DataScienceStudio/dss_home/jobs/BOULDERCOUNTYSOURCETRANSFORMS/Build_census_blocks_2017-09-13T00-37-50.653/compute_census_blocks_NP/custompyrecipehdMLSMuNyO49/script.py", line 68, in<module>
[01:55:43] [INFO] [dku.utils] - 'showall': 'true'
[01:55:43] [INFO] [dku.utils] - File "/Applications/DataScienceStudio.app/Contents/Resources/kit/python.packages/requests/api.py", line 70, in get
[01:55:43] [INFO] [dku.utils] - return request('get', url, params=params, **kwargs)
[01:55:43] [INFO] [dku.utils] - File "/Applications/DataScienceStudio.app/Contents/Resources/kit/python.packages/requests/api.py", line 56, in request
[01:55:43] [INFO] [dku.utils] - return session.request(method=method, url=url, **kwargs)
[01:55:43] [INFO] [dku.utils] - File "/Applications/DataScienceStudio.app/Contents/Resources/kit/python.packages/requests/sessions.py", line 488, in request
[01:55:43] [INFO] [dku.utils] - resp = self.send(prep, **send_kwargs)
[01:55:43] [INFO] [dku.utils] - File "/Applications/DataScienceStudio.app/Contents/Resources/kit/python.packages/requests/sessions.py", line 609, in send
[01:55:43] [INFO] [dku.utils] - r = adapter.send(request, **kwargs)
[01:55:43] [INFO] [dku.utils] - File "/Applications/DataScienceStudio.app/Contents/Resources/kit/python.packages/requests/adapters.py", line 499, in send
[01:55:43] [INFO] [dku.utils] - raise ReadTimeout(e, request=request)
[01:55:43] [INFO] [dku.utils] - ReadTimeout: HTTPConnectionPool(host='data.fcc.gov', port=80): Read timed out. (read timeout=None)
[01:55:43] [INFO] [dku.utils] - End Python stack
[01:55:43] [INFO] [com.dataiku.dip.recipes.customcode.CustomPythonRecipeRunner] - Error file found, trying to throw it: /Users/jeffers/Library/DataScienceStudio/dss_home/jobs/BOULDERCOUNTYSOURCETRANSFORMS/Build_census_blocks_2017-09-13T00-37-50.653/compute_census_blocks_NP/custompyrecipehdMLSMuNyO49/error.json
[01:55:43] [ERROR] [com.dataiku.dip.dataflow.streaming.DatasetWritingService] - Wait session error: null
org.eclipse.jetty.io.EofException
at org.eclipse.jetty.server.HttpInput$3.noContent(HttpInput.java:464)
at org.eclipse.jetty.server.HttpInput.read(HttpInput.java:124)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
at java.io.InputStreamReader.read(InputStreamReader.java:184)
at java.io.BufferedReader.fill(BufferedReader.java:161)
at java.io.BufferedReader.readLine(BufferedReader.java:324)
at java.io.BufferedReader.readLine(BufferedReader.java:389)
at com.dataiku.dip.input.stream.InputStreamLineReader.readLine(InputStreamLineReader.java:30)
at com.dataiku.dip.input.formats.csv.RFC4180CSVParser.next(RFC4180CSVParser.java:21)
at com.dataiku.dip.dataflow.streaming.DatasetWriter.appendFromCSVStream(DatasetWriter.java:139)
at com.dataiku.dip.dataflow.streaming.DatasetWritingService.pushData(DatasetWritingService.java:255)
at com.dataiku.dip.dataflow.kernel.slave.KernelSession.pushData(KernelSession.java:237)
at com.dataiku.dip.dataflow.kernel.slave.KernelServlet.service(KernelServlet.java:199)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:738)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:551)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1111)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:478)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1045)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:462)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:279)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:232)
at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:534)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:607)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:536)
at java.lang.Thread.run(Thread.java:745)
[01:55:43] [INFO] [dku.flow.activity] - Run thread failed for activity compute_census_blocks_NP
com.dataiku.common.server.APIError$SerializedErrorException: Error in Python process: <class 'requests.exceptions.ReadTimeout'>: HTTPConnectionPool(host='data.fcc.gov', port=80): Read timed out. (read timeout=None)
at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:304)
at com.dataiku.dip.recipes.customcode.CustomPythonRecipeRunner.run(CustomPythonRecipeRunner.java:79)
at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:353)
[01:55:43] [ERROR] [com.dataiku.dip.dataflow.streaming.DatasetWritingService] - Push data error during streaming:null
org.eclipse.jetty.io.EofException
at org.eclipse.jetty.server.HttpInput$3.noContent(HttpInput.java:464)
at org.eclipse.jetty.server.HttpInput.read(HttpInput.java:124)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
at java.io.InputStreamReader.read(InputStreamReader.java:184)
at java.io.BufferedReader.fill(BufferedReader.java:161)
at java.io.BufferedReader.readLine(BufferedReader.java:324)
at java.io.BufferedReader.readLine(BufferedReader.java:389)
at com.dataiku.dip.input.stream.InputStreamLineReader.readLine(InputStreamLineReader.java:30)
at com.dataiku.dip.input.formats.csv.RFC4180CSVParser.next(RFC4180CSVParser.java:21)
at com.dataiku.dip.dataflow.streaming.DatasetWriter.appendFromCSVStream(DatasetWriter.java:139)
at com.dataiku.dip.dataflow.streaming.DatasetWritingService.pushData(DatasetWritingService.java:255)
at com.dataiku.dip.dataflow.kernel.slave.KernelSession.pushData(KernelSession.java:237)
at com.dataiku.dip.dataflow.kernel.slave.KernelServlet.service(KernelServlet.java:199)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:738)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:551)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1111)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:478)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1045)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:462)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:279)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:232)
at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:534)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:607)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:536)
at java.lang.Thread.run(Thread.java:745)
[01:55:43] [DEBUG] [dku.jobs] - Command /tintercom/datasets/push-data processed in 26269806ms
[01:55:43] [DEBUG] [dku.jobs] - Command /tintercom/datasets/wait-write-session processed in 26269807ms
[01:55:43] [INFO] [dku.flow.activity] running compute_census_blocks_NP - activity is finished
[01:55:43] [ERROR] [dku.flow.activity] running compute_census_blocks_NP - Activity failed
com.dataiku.common.server.APIError$SerializedErrorException: Error in Python process: <class 'requests.exceptions.ReadTimeout'>: HTTPConnectionPool(host='data.fcc.gov', port=80): Read timed out. (read timeout=None)
at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:304)
at com.dataiku.dip.recipes.customcode.CustomPythonRecipeRunner.run(CustomPythonRecipeRunner.java:79)
at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:353)
[01:55:43] [INFO] [dku.flow.activity] running compute_census_blocks_NP - Executing default post-activity lifecycle hook
[01:55:43] [INFO] [dku.flow.activity] running compute_census_blocks_NP - Removing samples for BOULDERCOUNTYSOURCETRANSFORMS.census_blocks
[01:55:43] [INFO] [dku.flow.activity] running compute_census_blocks_NP - Done post-activity tasks
The text was updated successfully, but these errors were encountered:
I'm wondering if there is a way to get better recovery after failures. I have about 35K unique lat/lon points I'm getting census blocks for. At 1 second/request that's about 9 hours. After about 7 hours I got this failure below. The script didn't recover and the data collected before was lost. I'm wondering if there is a more graceful way that the code can recover and at least keep what was done previously.
The text was updated successfully, but these errors were encountered: