Skip to content

IvanQin/cse258_project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

58 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cse258_project

CSE 258 final project @ UCSD

Overall

In this project, we use Reddit submissions as our dataset. We are going to do some predictions and analysis on that.

Group memebers

Task distribution

任务总述:(菲华)

  • 预测 number_of_comments, number_of_upvotes,number_of_downvotes 通过如下regression: predict = f(title information, user information, time,...)

细节:(佳琪,嘉卓)

  • 分析title information
  • 对title进行聚类,分析出label
  • 将title所属的subreddit作为title information的一个feature
  • 对title进行降维,得到一系列关键词以作为 “is word xxx in title?” 的二进制feature
  • ...(待续)

分析user information (一帆)

  • 对user的权重值分析(是否为大V?) 并作为一个feature (用户分组)
  • 对user进行聚类分析出user_label
  • ...(待续)

进展

  • 读数据的文档见load_data.py,数据是submissions.csv

Report

Please follow this link to read our report.

About

CSE 258 final project @ UCSD

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •