Skip to content

Data structure

robertogithub edited this page Jul 30, 2013 · 11 revisions
users
{
	"_id" : ObjectId("51dad1aad8c6cd4700000001"),
	"created_at" : ISODate("2013-07-08T14:50:18.146Z"),
	"current_sign_in_at" : ISODate("2013-07-17T21:03:40.785Z"),
	"current_sign_in_ip" : "127.0.0.1",
	"email" : "[email protected]",
	"encrypted_password" : "$2a$10$TFi0BTBPbfK4igvFGe6yqODDzAgCtprrZTw/mfSXcQsof8z6zLg4a",
	"last_sign_in_at" : ISODate("2013-07-09T10:35:11.378Z"),
	"last_sign_in_ip" : "127.0.0.1",
	"name" : "Roberto Bartolome",
	"sign_in_count" : 4,
	"updated_at" : ISODate("2013-07-17T21:03:40.936Z")
}

searches
{
   _id: ObjectId("51e7095fd8c6cdef2f000002"),
   user_id: ObjectId("51dad1aad8c6cd4700000001"),
   body: "coca cola barcelona"
}

tweets
{
   _id: ObjectId("51e70943d8c6cd67c8000001"),
   search_id: ObjectId("51e7095fd8c6cdef2f000002"),
(and the rest of the tweet's info...)
}

Some interesting commands to know:

$ db.tweets.ensureIndex({search_id: 1})
$ db.tweets.ensureIndex({search_id: 1}).explain()
$ tweet = {author: "the name",...}
$ db.tweets.save(tweet)
$ db.tweets.find()
$ db.books.find({tags: "comic"})
$ db.books.findAndModify({
query:{inprogress: false},
sort:{priority: -1},
update:{ $set: {inprogress: true, started: new Date()}}
})

Paulo Fidalgo Concerns:

According to this site we need to worry about the document size. Even if we care less now. According to the documentation : "The maximum BSON document size is 16 megabytes." So if we've got tons of tweets the document size will be exceeded.

Roberto comments:

So we should split the collections in three: Users, Searches and Tweets, and use 'referencing' instead of 'embedding'. The structure should be like the one above. What do you think?

Information to be stored

About the user and tweet:

  • name (i.e. Diego Sanchez)
  • screenname (i.e. @Dieguitson)
  • date and time
  • language
  • city
  • country

About the tweet content:

  • tweet text
  • hashtags: list of keywords
  • links

About the interaction:

  • RT (if it is a retweet, get the original source if possible)
  • in_reply_to_screen_name
  • favorited

About the relevant

  • number of followers_count
  • number of friends_count
Clone this wiki locally