Bill 1900145 Parsing Error Due to Crawling Error #34

hunkim · 2015-12-01T23:58:11Z

html2json에 을 돌리다 에러가 나서 뭔일인가 보니

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/gevent/greenlet.py", line 327, in run
    result = self._run(*self.args, **self.kwargs)
  File "/home/ubuntu/crawlers/bills/specific/html2json.py", line 242, in parse_page
    d = extract_specifics(assembly_id, bill_id, meta)
  File "/home/ubuntu/crawlers/bills/specific/html2json.py", line 166, in extract_specifics
    table       = utils.get_elems(page, X['spec_table'])[1]
IndexError: list index out of range
<Greenlet at 0x7f27e79417d0: parse_page(19, '1900145',        bill_id  status                            , u'./json/19')> failed with IndexError

sources/specifics/19/1900145.html 파일을 받을때 오류가 발생한것 같습니다.

^M
^M
^M
<SCRIPT LANGUAGE="javascript">^M
<!--^M
        function onLoad() {^M
                alert(document.all["MSG"].innerText);^M
        }^M
-->^M
</SCRIPT>^M
^M
^M
^M
<HTML>^M
<BODY ONLOAD="javascript:onLoad()">^M
        <TEXTAREA ID="MSG" STYLE="display:none">[SQLException] Code[24757] Msg[ORA-24757: Æ®·£Àè¼Ç ½Äº°ÀÚ°¡ Áßº¹µÇ¾ú½À´Ï´Ù
ORA-02063: line°¡ ¼±ÇàµÊ (NALAW_LINK·Î ºÎÅÍ)
][µ¥ÀÌÅÍº£ÀÌ½º ¿À·ù]</TEXTAREA> ^M
</BODY>^M
</HTML>

이런 경우 어떻게 하면 될까요? SQL Exception이 나왔는데 이런경우 crawler에서 다시 받아 오기 기능이 필요할듯 합니다.

Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

The text was updated successfully, but these errors were encountered:

e9t added the bug label Dec 3, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bill 1900145 Parsing Error Due to Crawling Error #34

Bill 1900145 Parsing Error Due to Crawling Error #34

hunkim commented Dec 1, 2015 •

edited by e9t

Loading

Bill 1900145 Parsing Error Due to Crawling Error #34

Bill 1900145 Parsing Error Due to Crawling Error #34

Comments

hunkim commented Dec 1, 2015 • edited by e9t Loading

hunkim commented Dec 1, 2015 •

edited by e9t

Loading