Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CitationRank (like PageRank accept for papers/citations) #18

Open
cegme opened this issue Sep 15, 2012 · 2 comments
Open

CitationRank (like PageRank accept for papers/citations) #18

cegme opened this issue Sep 15, 2012 · 2 comments

Comments

@cegme
Copy link
Owner

cegme commented Sep 15, 2012

Here is another problem for you @virup @clintpgeorge @supriyan

We want to calculate a global importance factor for all the papers in the data set.
This is similar to page rank. The value of a paper CR(p) should produce a value that is the probability that if I am randomly looking for an important paper I land on p.

A paper with citations should have a higher value than a paper with no citations.

A paper with P citations should have a smaller value compared to a paper with G citations of citations where |P| - |G| < sigma.

The references of a paper do no affect the paper's score. (Although we should have a self-citation penalty)

Also, can we compute these values using SQL?

@supriyan
Copy link
Collaborator

If this ranking has to be done purely on the basis of citations, then i
have done it. I can explain when we meet.
Just brain storming - is citations the only factor for a paper to be
important or it also depends on who cited it - for eg -

2 papers, both have 50 citations in dblp, but one gets cited by more
important papers.

-Sup

On Sat, Sep 15, 2012 at 1:46 PM, Christan Grant [email protected]:

Here is another problem for you @virup https://github.com/virup
@clintpgeorge https://github.com/clintpgeorge @supriyanhttps://github.com/supriyan

We want to calculate a global importance factor for all the papers in the
data set.
This is similar to page rank. The value of a paper CR(p) should produce a
value that is the probability that if I am randomly looking for an
important paper I land on p.

A paper with citations should have a higher value than a paper with no
citations.

A paper with P citations should have a smaller value compared to a paper
with G citations of citations where |P| - |G| < sigma.

The references of a paper do no affect the paper's score. (Although we
should have a self-citation penalty)

Also, can we compute these values using SQL?


Reply to this email directly or view it on GitHubhttps://github.com//issues/18.

Supriya Nirkhiwale
4337 NW 35th Terrace
Gainesville, FL 32605
USA

cegme added a commit that referenced this issue Sep 19, 2012
…l numbers are incremented by one from the python program. This is issue #18
@cegme
Copy link
Owner Author

cegme commented Sep 19, 2012

OK, I have a solution that it looks like it solves the problem. It is in UDF/citation_count.sql. You can see the program UDF/citation_count.py to see what the solution is supposed to be.

@cegme cegme closed this as completed Sep 19, 2012
@cegme cegme reopened this Sep 19, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants