-
Notifications
You must be signed in to change notification settings - Fork 2
Data structure
robertogithub edited this page Jul 30, 2013
·
11 revisions
users { "_id" : ObjectId("51dad1aad8c6cd4700000001"), "created_at" : ISODate("2013-07-08T14:50:18.146Z"), "current_sign_in_at" : ISODate("2013-07-17T21:03:40.785Z"), "current_sign_in_ip" : "127.0.0.1", "email" : "[email protected]", "encrypted_password" : "$2a$10$TFi0BTBPbfK4igvFGe6yqODDzAgCtprrZTw/mfSXcQsof8z6zLg4a", "last_sign_in_at" : ISODate("2013-07-09T10:35:11.378Z"), "last_sign_in_ip" : "127.0.0.1", "name" : "Roberto Bartolome", "sign_in_count" : 4, "updated_at" : ISODate("2013-07-17T21:03:40.936Z") } searches { _id: ObjectId("51e7095fd8c6cdef2f000002"), user_id: ObjectId("51dad1aad8c6cd4700000001"), body: "coca cola barcelona" } tweets { _id: ObjectId("51e70943d8c6cd67c8000001"), search_id: ObjectId("51e7095fd8c6cdef2f000002"), (and the rest of the tweet's info...) }
Some interesting commands to know:
$ db.tweets.ensureIndex({search_id: 1}) $ db.tweets.ensureIndex({search_id: 1}).explain() $ tweet = {author: "the name",...} $ db.tweets.save(tweet) $ db.tweets.find() $ db.books.find({tags: "comic"}) $ db.books.findAndModify({ query:{inprogress: false}, sort:{priority: -1}, update:{ $set: {inprogress: true, started: new Date()}} })
According to this site we need to worry about the document size. Even if we care less now. According to the documentation : "The maximum BSON document size is 16 megabytes." So if we've got tons of tweets the document size will be exceeded.
So we should split the collections in three: Users, Searches and Tweets, and use 'referencing' instead of 'embedding'. The structure should be like the one above. What do you think?
About the user and tweet:
- name (i.e. Diego Sanchez)
- screenname (i.e. @Dieguitson)
- date and time
- language
- city
- country
About the tweet content:
- tweet text
- hashtags: list of keywords
- links
About the interaction:
- RT (if it is a retweet, get the original source if possible)
- in_reply_to_screen_name
- favorited
About the relevant
- number of followers_count
- number of friends_count