Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore big dataset disk_stories_full in mongo db #197

Open
YanLiang1102 opened this issue Sep 30, 2017 · 10 comments
Open

Explore big dataset disk_stories_full in mongo db #197

YanLiang1102 opened this issue Sep 30, 2017 · 10 comments

Comments

@YanLiang1102
Copy link
Contributor

using c++ directly plan to to gain the efficiency,
here is one record like a schema in the table:
{ "_id" : ObjectId("572c0953172ab83173aaf011"), "news_source" : "Associated Press International", "position_section" : "SPORTS NEWS", "word_count" : "577", "states" : [ "OHIO, USA" ], "id_type" : "DOC-ID", "date_added" : ISODate("2016-05-06T03:02:43.316Z"), "cities" : [ "OAKLAND, CA, USA" ], "article_title" : "Raiders go QB route for Fresno State's Carr", "article_body" : "NEW YORK (AP) - Derek Carr, the brother of former No.1 draft pick David Carr, was selected by the Oakland Raiders with selection 36 of the 2014 edition on Friday. A day after the first-round selections were completed, teams went about filling their rosters with further selections on Friday. Derek Carr is a quarterback like his brother, went to Fresno State, like his brother, and enters the league with a wife and child, like his brother, but is hoping the similarities end there, given David Carr's disappointing career after being drafted by Houston with much fanfare back in 2002. "I learned everything that he did right and everything that he did wrong," Derek Carr said. "He told me that if he could do anything, he hopes he made the path smoother for me as I transition into the NFL." Derek Carr rewrote the record book in his time in college, throwing for more than 10,000 yards and 100 touchdown passes, leading Fresno State to consecutive Mountain West Conference titles. Oakland has veteran Matt Schaub earmarked to be its starting quarterback, but he will get a serious push from Carr. In other picks Friday: - Houston used the first pick of the second round on UCLA guard Xavier Su'a-Filo, who joins the first overall pick, Jadeveon Clowney, in a upgraded defensive line. The 6-foot-4, 307-pound Su'a-Filo, who went on a Mormon mission while in college, also has played tackle. - The Cowboys took Boise State defensive end Demarcus Lawrence, who they hope will emulate their departed sacks leader with the same first name, DeMarcus Ware, now with Denver. "I'm my own Demarcus," Lawrence said. "I don't like to try to be nobody else. I'm going to be me, and I'm going to do it well." - Cleveland added a protector for new quarterback Johnny Manziel by grabbing guard Joel Bitonio of Nevada, who also can play tackle or center. The Browns caused the biggest stir on opening night when they traded up to No. 22 to get 'Johnny Football'. "He's a heck of a quarterback," Bitonio said. "Hopefully, he comes in and he's ready to compete and just ready to work and do well for the Cleveland Browns." Cleveland did not choose any receivers even though Josh Gordon is reportedly facing suspension by the NFL for violating the league's drug policy again. Gordon was suspended for the first two games of 2013, but still led the league with 1,646 yards receiving. - Eastern Illinois quarterback Jimmy Garoppolo went to New England near the end of the second round, and will be a backup to his favorite player Tom Brady. "Whether I was coming in as the starter or as the backup, I'm going to go in and approach it the same way," Garoppolo said. "I'm going to go out there and try to get better each and every day. That's what good football players do." - Washington may have got a bargain by selecting Virginia tackle Morgan Moses at pick 66. Moses had been earmarked as a potential first-round pick by many pundits but never received a call. "I thought my phone was broken," Moses quipped. It took 54 selections, a draft record, for a running back to go. Bishop Sankey of Washington was chosen by Tennessee, who cut Chris Johnson this spring. Two more went in the next three selections: Jeremy Hill of Louisiana State to Cincinnati, and Carlos Hyde of Ohio State to San Francisco. ___ AP College Football Writer Ralph D. Russo and Sports Writers Simmi Buttar, Schuyler Dixon and Josh Dubow contributed to this story. ___ AP NFL website: www.pro32.ap.org and www.twitter.com/AP_NFL", "language" : "english", "stanford" : 0, "countries" : [ "UNITED STATES" ], "publication_date_raw" : "May 10, 2014 Saturday", "doc_id" : "TOPNEWSa6f3d910954c4d228015df686134c54d", "parsed" : 1, "queue_added" : 0 }

@YanLiang1102
Copy link
Contributor Author

YanLiang1102 commented Oct 16, 2017

77300000 finished processing
Traceback (most recent call last):
File "distinct.py", line 18, in
for i in largestory.find():
File "/home/yan/.local/lib/python3.5/site-packages/pymongo/cursor.py", line 1132, in next
if len(self.__data) or self._refresh():
File "/home/yan/.local/lib/python3.5/site-packages/pymongo/cursor.py", line 1075, in _refresh
self.__max_await_time_ms))
File "/home/yan/.local/lib/python3.5/site-packages/pymongo/cursor.py", line 892, in __send_message
**kwargs)
File "/home/yan/.local/lib/python3.5/site-packages/pymongo/mongo_client.py", line 950, in _send_message_with_response
exhaust)
File "/home/yan/.local/lib/python3.5/site-packages/pymongo/mongo_client.py", line 961, in _reset_on_error
return func(*args, **kwargs)
File "/home/yan/.local/lib/python3.5/site-packages/pymongo/server.py", line 136, in send_message_with_response
response_data = sock_info.receive_message(1, request_id)
File "/home/yan/.local/lib/python3.5/site-packages/pymongo/pool.py", line 510, in receive_message
self._raise_connection_failure(error)
File "/home/yan/.local/lib/python3.5/site-packages/pymongo/pool.py", line 610, in _raise_connection_failure
raise error
File "/home/yan/.local/lib/python3.5/site-packages/pymongo/pool.py", line 508, in receive_message
self.sock, operation, request_id, self.max_message_size)
File "/home/yan/.local/lib/python3.5/site-packages/pymongo/network.py", line 137, in receive_message
header = _receive_data_on_socket(sock, 16)
File "/home/yan/.local/lib/python3.5/site-packages/pymongo/network.py", line 170, in _receive_data_on_socket
raise AutoReconnect("connection closed")
pymongo.errors.AutoReconnect: connection closed

Mongo db just died in the middle not sure why!!!
2017-10-16T09:37:45.768-0500 W NETWORK [thread1] Failed to connect to 127.0.0.1:23755, in(checking socket for error after poll), reason: Connection refused
2017-10-16T09:37:45.776-0500 E QUERY [thread1] Error: couldn't connect to server localhost:23755, connection attempt failed :
connect@src/mongo/shell/mongo.js:237:13
@(connect):1:6
exception: connect failed

@cegme

@cegme
Copy link
Member

cegme commented Oct 17, 2017

Try and use the with statement to open the mongo document. Can you also add a timer? The connection may be timing out. Also, use JSON, stop using pickle.

@YanLiang1102
Copy link
Contributor Author

By mongo file do u mean access that table, it is but directly access the file on disk, it get the data using pumibgo through mongo db

@YanLiang1102
Copy link
Contributor Author

*it is not directly

@cegme
Copy link
Member

cegme commented Oct 17, 2017

@YanLiang1102 by mongo document, I mean the MongoClient.
In any case. I think it is our bad network complaining. Just catch this error, add a sleep and try again.

sleep = 1
done = False
while not done:
    try:
        # your code HERE
        done = True
    except pymongo.AutoReconnect:
        logging.info("Error connecting sleeping for {}".format(pow(2, sleep)))
        time.sleep(pow(2, sleep))
        sleep += 1
        logging.info("retrying...")

@YanLiang1102
Copy link
Contributor Author

YanLiang1102 commented Oct 17, 2017

So u suspect our network will time out when the connection is keeping alive for too long like 3 or 4 hours, I kind of think it is mongo client issue , they might only to keep the db connection for a certain amount of time @cegme
since I run that code on portland local, is the network still going to affecting this on local?

@cegme
Copy link
Member

cegme commented Oct 17, 2017

The network can go out and interrupt a TCP at any time. A local run should have a network problem still adding a sleep can be a remedy. Add socketKeepAlive=True to the mongo client connection.

@YanLiang1102
Copy link
Contributor Author

@cegme I don't think the code will work Dr Grant, in the way you write the cursor is changed, so it will loop from the beginning.
and I noticed that each time this program dies, the mongo db is down, and sometimes I can not even resatrt it need to reboot to make that restart, it must be some bad query to make the server down I can take a look tonight and see if there is any solution for that.

@YanLiang1102
Copy link
Contributor Author

YanLiang1102 commented Oct 21, 2017

File "1021.py", line 20, in
for i in largestory.find():
File "/home/yan/.local/lib/python3.5/site-packages/pymongo/cursor.py", line 1132, in next
if len(self.__data) or self._refresh():
File "/home/yan/.local/lib/python3.5/site-packages/pymongo/cursor.py", line 1075, in _refresh
self.__max_await_time_ms))
File "/home/yan/.local/lib/python3.5/site-packages/pymongo/cursor.py", line 892, in __send_message
**kwargs)
File "/home/yan/.local/lib/python3.5/site-packages/pymongo/mongo_client.py", line 950, in _send_message_with_response
exhaust)
File "/home/yan/.local/lib/python3.5/site-packages/pymongo/mongo_client.py", line 961, in _reset_on_error
return func(*args, **kwargs)
File "/home/yan/.local/lib/python3.5/site-packages/pymongo/server.py", line 136, in send_message_with_response
response_data = sock_info.receive_message(1, request_id)
File "/home/yan/.local/lib/python3.5/site-packages/pymongo/pool.py", line 510, in receive_message
self._raise_connection_failure(error)
File "/home/yan/.local/lib/python3.5/site-packages/pymongo/pool.py", line 610, in _raise_connection_failure
raise error
File "/home/yan/.local/lib/python3.5/site-packages/pymongo/pool.py", line 508, in receive_message
self.sock, operation, request_id, self.max_message_size)
File "/home/yan/.local/lib/python3.5/site-packages/pymongo/network.py", line 137, in receive_message
header = _receive_data_on_socket(sock, 16)
File "/home/yan/.local/lib/python3.5/site-packages/pymongo/network.py", line 170, in _receive_data_on_socket
raise AutoReconnect("connection closed")
pymongo.errors.AutoReconnect: connection closed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "1021.py", line 35, in
except pymongo.errors.AutoReconnect:

NameError: name 'pymongo' is not defined

@YanLiang1102
Copy link
Contributor Author

put limit on poolSize does help , but the mongo demon will down too,
and the nohup style code to make it restart by itself, just like what we do for the website.
import os
import subprocess
import time
import urllib.request
while True:
time.sleep(30)
try:
returncode=os.system("nc -zvv localhost port for mongodb")
if(returncode!=0):
os.system("sudo ***** restart mongo db commadn that we are using --port *****")
except:
os.system("forever start mongodb!!")

YanLiang1102 added a commit that referenced this issue Oct 24, 2017
…on problem, since teh previous implemenation when redo teh cursor will restart again so it will next ends
YanLiang1102 added a commit that referenced this issue Oct 30, 2017
YanLiang1102 added a commit that referenced this issue Oct 31, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants