Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paper -> CoT pipeline: Algorithm for scoring a paper based on rubric data #60

Open
thehunmonkgroup opened this issue Oct 1, 2024 · 3 comments
Assignees

Comments

@thehunmonkgroup
Copy link
Collaborator

The pipeline for grading a paper produces a series of yes/no answers to questions meant to determine the quality of the paper for CoT extraction.

Requirements:

  • An algorithm that converts the simple yes/no rubric answers into an overall suitability score

Deliverable:

  • Said algorithm
  • Either
    • An SQL query suitable for SQLite that can apply the algorithm dynamically to the individual rubric answers
    • A script that runs the algorithm and writes the suitability score to the database

Example:

Simplest algorithm, SQL-based, sum of all criteria (1 for yes, 0 for no):

SELECT
    id,
    paper_url,
    paper_category,
    (COALESCE(criteria_clear_question, 0) +
     COALESCE(criteria_definitive_answer, 0) +
     COALESCE(criteria_complex_reasoning, 0) +
     COALESCE(criteria_coherent_structure, 0) +
     COALESCE(criteria_layperson_comprehensible, 0) +
     COALESCE(criteria_minimal_jargon, 0) +
     COALESCE(criteria_illustrative_examples, 0) +
     COALESCE(criteria_significant_insights, 0) +
     COALESCE(criteria_verifiable_steps, 0) +
     COALESCE(criteria_overall_suitability, 0)) AS total_criteria_score
FROM
    papers;
+----+----------------------------------------------+----------------+----------------------+
| id |                  paper_url                   | paper_category | total_criteria_score |
+----+----------------------------------------------+----------------+----------------------+
| 8  | https://export.arxiv.org/pdf/0704.3252v1.pdf | astro-ph.EP    | 10                   |
| 9  | https://export.arxiv.org/pdf/0710.0317v1.pdf | astro-ph.EP    | 10                   |
+----+----------------------------------------------+----------------+----------------------+
@thehunmonkgroup
Copy link
Collaborator Author

We're at least getting some different grades when run over a larger set of papers:

SELECT
    id,
    paper_url,
    paper_category,
    (COALESCE(criteria_clear_question, 0) +
     COALESCE(criteria_definitive_answer, 0) +
     COALESCE(criteria_complex_reasoning, 0) +
     COALESCE(criteria_coherent_structure, 0) +
     COALESCE(criteria_layperson_comprehensible, 0) +
     COALESCE(criteria_minimal_jargon, 0) +
     COALESCE(criteria_illustrative_examples, 0) +
     COALESCE(criteria_significant_insights, 0) +
     COALESCE(criteria_verifiable_steps, 0) +
     COALESCE(criteria_overall_suitability, 0)) AS total_criteria_score
FROM
    papers
WHERE
    id IN (9, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31);
+----+----------------------------------------------+----------------+----------------------+
| id |                  paper_url                   | paper_category | total_criteria_score |
+----+----------------------------------------------+----------------+----------------------+
| 9  | https://export.arxiv.org/pdf/0710.0317v1.pdf | astro-ph.EP    | 10                   |
| 13 | https://export.arxiv.org/pdf/0805.1116v1.pdf | astro-ph.EP    | 9                    |
| 14 | https://export.arxiv.org/pdf/0807.0527v1.pdf | astro-ph.EP    | 7                    |
| 15 | https://export.arxiv.org/pdf/0807.1873v1.pdf | astro-ph.EP    | 10                   |
| 16 | https://export.arxiv.org/pdf/0809.4042v1.pdf | astro-ph.EP    | 3                    |
| 17 | https://export.arxiv.org/pdf/0809.4562v1.pdf | astro-ph.EP    | 3                    |
| 18 | https://export.arxiv.org/pdf/0810.5138v1.pdf | astro-ph.EP    | 10                   |
| 19 | https://export.arxiv.org/pdf/0901.0304v1.pdf | astro-ph.EP    | 9                    |
| 20 | https://export.arxiv.org/pdf/0901.0343v1.pdf | astro-ph.EP    | 4                    |
| 21 | https://export.arxiv.org/pdf/0901.0482v1.pdf | astro-ph.EP    | 10                   |
| 22 | https://export.arxiv.org/pdf/0901.0515v1.pdf | astro-ph.EP    | 10                   |
| 23 | https://export.arxiv.org/pdf/0901.0532v1.pdf | astro-ph.EP    | 10                   |
| 24 | https://export.arxiv.org/pdf/0901.0554v1.pdf | astro-ph.EP    | 10                   |
| 25 | https://export.arxiv.org/pdf/0901.0625v1.pdf | astro-ph.EP    | 2                    |
| 26 | https://export.arxiv.org/pdf/0901.0828v1.pdf | astro-ph.EP    | 10                   |
| 27 | https://export.arxiv.org/pdf/0901.0846v1.pdf | astro-ph.EP    | 10                   |
| 28 | https://export.arxiv.org/pdf/0901.0735v1.pdf | astro-ph.EP    | 9                    |
| 29 | https://export.arxiv.org/pdf/0901.1214v1.pdf | astro-ph.EP    | 10                   |
| 30 | https://export.arxiv.org/pdf/0901.1217v1.pdf | astro-ph.EP    | 10                   |
| 31 | https://export.arxiv.org/pdf/0901.1547v1.pdf | astro-ph.EP    | 10                   |
+----+----------------------------------------------+----------------+----------------------+

@thehunmonkgroup thehunmonkgroup self-assigned this Oct 2, 2024
@thehunmonkgroup
Copy link
Collaborator Author

Updated the scoring to be a tad smarter:

  • There are now three required rubric questions (clear question, definitive answer, complex reasoning) -- if any of these are a 'no', the score is zero
  • Otherwise, the score is a sum of all ten rubric questions (1 for yes, 0 for no)

Here's a scoring summary across 100 profiled papers:

sqlite> select paper_url, paper_category, suitability_score from papers where processing_status = 'scored' order by suitability_score desc;
+-----------------------------------------------+--------------------+-------------------+
|                   paper_url                   |   paper_category   | suitability_score |
+-----------------------------------------------+--------------------+-------------------+
| https://export.arxiv.org/pdf/0901.0735v1.pdf  | astro-ph.EP        | 10                |
| https://export.arxiv.org/pdf/1801.05595v1.pdf | astro-ph.EP        | 10                |
| https://export.arxiv.org/pdf/1507.03327v1.pdf | astro-ph.GA        | 10                |
| https://export.arxiv.org/pdf/1901.07266v1.pdf | astro-ph.GA        | 10                |
| https://export.arxiv.org/pdf/1910.09121v1.pdf | astro-ph.GA        | 10                |
| https://export.arxiv.org/pdf/1110.2656v2.pdf  | astro-ph.HE        | 10                |
| https://export.arxiv.org/pdf/1310.7588v1.pdf  | astro-ph.HE        | 10                |
| https://export.arxiv.org/pdf/1611.08508v1.pdf | astro-ph.HE        | 10                |
| https://export.arxiv.org/pdf/1710.09893v1.pdf | astro-ph.HE        | 10                |
| https://export.arxiv.org/pdf/1810.04324v3.pdf | astro-ph.HE        | 10                |
| https://export.arxiv.org/pdf/1307.3576v1.pdf  | cond-mat.mtrl-sci  | 10                |
| https://export.arxiv.org/pdf/1310.6949v1.pdf  | cond-mat.mtrl-sci  | 10                |
| https://export.arxiv.org/pdf/1409.0959v1.pdf  | cond-mat.mtrl-sci  | 10                |
| https://export.arxiv.org/pdf/1509.04762v1.pdf | cond-mat.mtrl-sci  | 10                |
| https://export.arxiv.org/pdf/1901.06620v2.pdf | cs.AI              | 10                |
| https://export.arxiv.org/pdf/1604.04372v2.pdf | cs.CV              | 10                |
| https://export.arxiv.org/pdf/1806.09158v1.pdf | cs.CV              | 10                |
| https://export.arxiv.org/pdf/1910.13340v1.pdf | cs.CV              | 10                |
| https://export.arxiv.org/pdf/1207.1387v1.pdf  | cs.LG              | 10                |
| https://export.arxiv.org/pdf/1307.3964v1.pdf  | cs.LG              | 10                |
| https://export.arxiv.org/pdf/1905.09538v2.pdf | cs.LG              | 10                |
| https://export.arxiv.org/pdf/1811.08973v1.pdf | cs.SE              | 10                |
| https://export.arxiv.org/pdf/1104.2747v1.pdf  | hep-ex             | 10                |
| https://export.arxiv.org/pdf/1407.6211v2.pdf  | hep-ex             | 10                |
| https://export.arxiv.org/pdf/1808.03987v3.pdf | hep-ex             | 10                |
| https://export.arxiv.org/pdf/1211.3270v1.pdf  | math.CA            | 10                |
| https://export.arxiv.org/pdf/1709.00705v2.pdf | math.CA            | 10                |
| https://export.arxiv.org/pdf/1805.00990v2.pdf | math.CO            | 10                |
| https://export.arxiv.org/pdf/1706.08709v1.pdf | math.IT            | 10                |
| https://export.arxiv.org/pdf/1804.02217v1.pdf | math.IT            | 10                |
| https://export.arxiv.org/pdf/1206.4819v2.pdf  | math.MP            | 10                |
| https://export.arxiv.org/pdf/1201.0101v1.pdf  | math.NA            | 10                |
| https://export.arxiv.org/pdf/1012.2726v1.pdf  | nlin.AO            | 10                |
| https://export.arxiv.org/pdf/1706.04252v1.pdf | nlin.AO            | 10                |
| https://export.arxiv.org/pdf/1310.4490v1.pdf  | nlin.PS            | 10                |
| https://export.arxiv.org/pdf/1405.7920v2.pdf  | nlin.PS            | 10                |
| https://export.arxiv.org/pdf/1806.04399v1.pdf | nlin.PS            | 10                |
| https://export.arxiv.org/pdf/0805.2603v1.pdf  | nucl-th            | 10                |
| https://export.arxiv.org/pdf/1208.3888v2.pdf  | nucl-th            | 10                |
| https://export.arxiv.org/pdf/1512.02771v1.pdf | nucl-th            | 10                |
| https://export.arxiv.org/pdf/1905.06163v1.pdf | physics.app-ph     | 10                |
| https://export.arxiv.org/pdf/1306.4661v7.pdf  | physics.atom-ph    | 10                |
| https://export.arxiv.org/pdf/1706.07114v2.pdf | physics.atom-ph    | 10                |
| https://export.arxiv.org/pdf/1906.00474v1.pdf | physics.atom-ph    | 10                |
| https://export.arxiv.org/pdf/0803.3901v1.pdf  | physics.chem-ph    | 10                |
| https://export.arxiv.org/pdf/1706.07534v1.pdf | physics.class-ph   | 10                |
| https://export.arxiv.org/pdf/1904.00493v1.pdf | physics.data-an    | 10                |
| https://export.arxiv.org/pdf/1802.06590v1.pdf | physics.space-ph   | 10                |
| https://export.arxiv.org/pdf/1103.0286v2.pdf  | quant-ph           | 10                |
| https://export.arxiv.org/pdf/1207.2485v3.pdf  | quant-ph           | 10                |
| https://export.arxiv.org/pdf/1601.07931v3.pdf | stat.AP            | 10                |
| https://export.arxiv.org/pdf/1304.4203v2.pdf  | astro-ph.HE        | 9                 |
| https://export.arxiv.org/pdf/1506.03177v2.pdf | cond-mat.mtrl-sci  | 9                 |
| https://export.arxiv.org/pdf/1702.08515v3.pdf | cond-mat.mtrl-sci  | 9                 |
| https://export.arxiv.org/pdf/1508.05025v4.pdf | cond-mat.stat-mech | 9                 |
| https://export.arxiv.org/pdf/1706.09347v2.pdf | cs.AI              | 9                 |
| https://export.arxiv.org/pdf/1711.09952v2.pdf | cs.CV              | 9                 |
| https://export.arxiv.org/pdf/1205.3773v3.pdf  | gr-qc              | 9                 |
| https://export.arxiv.org/pdf/0909.2753v2.pdf  | math.MP            | 9                 |
| https://export.arxiv.org/pdf/1404.0651v1.pdf  | math.NA            | 9                 |
| https://export.arxiv.org/pdf/1110.2527v1.pdf  | math.OC            | 9                 |
| https://export.arxiv.org/pdf/1403.5318v3.pdf  | nlin.PS            | 9                 |
| https://export.arxiv.org/pdf/1709.03402v4.pdf | physics.chem-ph    | 9                 |
| https://export.arxiv.org/pdf/1806.02251v1.pdf | physics.data-an    | 9                 |
| https://export.arxiv.org/pdf/1211.6462v3.pdf  | cs.SI              | 8                 |
| https://export.arxiv.org/pdf/1404.6585v1.pdf  | math.IT            | 8                 |
| https://export.arxiv.org/pdf/1604.05771v1.pdf | math.OC            | 8                 |
| https://export.arxiv.org/pdf/0708.0048v1.pdf  | math.NT            | 8                 |
| https://export.arxiv.org/pdf/1506.04980v1.pdf | math.NT            | 8                 |
| https://export.arxiv.org/pdf/1811.08906v1.pdf | astro-ph.HE        | 7                 |
| https://export.arxiv.org/pdf/1310.1622v2.pdf  | cs.LO              | 7                 |
| https://export.arxiv.org/pdf/0810.4634v1.pdf  | math.CO            | 7                 |
| https://export.arxiv.org/pdf/1101.5924v3.pdf  | math.CT            | 7                 |
| https://export.arxiv.org/pdf/1106.3102v4.pdf  | math.OC            | 7                 |
| https://export.arxiv.org/pdf/1009.1736v1.pdf  | nlin.SI            | 7                 |
| https://export.arxiv.org/pdf/1908.01260v1.pdf | stat.ME            | 7                 |
| https://export.arxiv.org/pdf/1609.03875v1.pdf | astro-ph.HE        | 0                 |
| https://export.arxiv.org/pdf/1407.4035v1.pdf  | cond-mat.dis-nn    | 0                 |
| https://export.arxiv.org/pdf/1706.00372v1.pdf | cond-mat.mtrl-sci  | 0                 |
| https://export.arxiv.org/pdf/0704.1394v1.pdf  | cs.AI              | 0                 |
| https://export.arxiv.org/pdf/1004.1230v1.pdf  | cs.AI              | 0                 |
| https://export.arxiv.org/pdf/1902.11114v2.pdf | cs.CV              | 0                 |
| https://export.arxiv.org/pdf/1312.0940v1.pdf  | cs.CY              | 0                 |
| https://export.arxiv.org/pdf/1605.01580v1.pdf | cs.CY              | 0                 |
| https://export.arxiv.org/pdf/1806.06230v1.pdf | cs.GT              | 0                 |
| https://export.arxiv.org/pdf/1901.11499v2.pdf | cs.SE              | 0                 |
| https://export.arxiv.org/pdf/1910.08359v1.pdf | eess.SP            | 0                 |
| https://export.arxiv.org/pdf/1204.1077v1.pdf  | gr-qc              | 0                 |
| https://export.arxiv.org/pdf/0902.4798v1.pdf  | hep-ex             | 0                 |
| https://export.arxiv.org/pdf/1912.07355v1.pdf | hep-ex             | 0                 |
| https://export.arxiv.org/pdf/1901.06292v1.pdf | math.CO            | 0                 |
| https://export.arxiv.org/pdf/1411.6503v2.pdf  | math.MP            | 0                 |
| https://export.arxiv.org/pdf/1610.03664v1.pdf | math.MP            | 0                 |
| https://export.arxiv.org/pdf/0711.1635v1.pdf  | nucl-th            | 0                 |
| https://export.arxiv.org/pdf/0906.4909v1.pdf  | nucl-th            | 0                 |
| https://export.arxiv.org/pdf/1603.09057v1.pdf | nucl-th            | 0                 |
| https://export.arxiv.org/pdf/1012.0862v1.pdf  | physics.gen-ph     | 0                 |
| https://export.arxiv.org/pdf/1504.03161v1.pdf | physics.soc-ph     | 0                 |
| https://export.arxiv.org/pdf/1506.06091v1.pdf | q-bio.NC           | 0                 |
| https://export.arxiv.org/pdf/0906.2684v2.pdf  | quant-ph           | 0                 |
+-----------------------------------------------+--------------------+-------------------+

@thehunmonkgroup
Copy link
Collaborator Author

We've decided the current algorithm is sufficient until we need to run a larger number of papers, we'll need funding for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant