Skip to content

đź“š A curated list of papers for Software Engineers

License

Notifications You must be signed in to change notification settings

alikatgh/software-papers

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Papers for Software Engineers workflow

A curated list of papers that may be of interest to Software Engineering students or professionals. See the sources and selection criteria below.


List of papers by topic
  1. Von Neumann's First Computer Program. Knuth (1970).
    Computer History; Early Programming

  2. Computing Machinery and Intelligence. Turing (1950).
    Early Artificial Intelligence

    • Some Moral and Technical Consequences of Automation. Wiener (1960).
    • Steps towards Artificial Intelligence. Minsky (1960).
    • ELIZA—a computer program for the study of natural language communication between man and machine. Weizenbaum (1966).
    • A Theory of the Learnable. Valiant (1984).
  3. A Method for the Construction of Minimum-Redundancy Codes. Huffman (1952).
    Information Theory

  4. Engineering a Sort Function. Bentley, McIlroy (1993).
    Data Structures; Algorithms

  5. A Design Methodology for Reliable Software Systems. Liskov (1972).
    Software Design

  6. Programming with Abstract Data Types. Liskov, Zilles (1974).
    Abstract Data Types; Object-Oriented Programming

  7. Why Functional Programming Matters. Hughes (1990).
    Functional Programming

  8. An Incremental Approach to Compiler Construction. Ghuloum (2006).
    Language Design; Compilers

  9. No Silver Bullet: Essence and Accidents of Software Engineering. Brooks (1987).
    Software Engineering; Project Management

  10. Communicating sequential processes. Hoare (1978).
    Concurrency

  11. The UNIX Time- Sharing System. Ritchie, Thompson (1974).
    Operating Systems

  12. A Relational Model of Data for Large Shared Data Banks. Codd (1970).
    Databases

  13. A Protocol for Packet Network Intercommunication. Cerf, Kahn (1974).
    Networking

  14. New Directions in Cryptography. Diffie, Hellman (1976).
    Cryptography

  15. Time, Clocks, and the Ordering of Events in a Distributed System. Lamport (1978).
    Distributed Systems

  16. Designing for Usability: Key Principles and What Designers Think. Gould, Lewis (1985).
    Human-Computer Interaction; User Interfaces

  17. The anatomy of a large-scale hypertextual Web search engine. Brin, Page (1998).
    Information Retrieval; World-Wide Web

  18. Dynamo, Amazon’s Highly Available Key-value store. DeCandia et al (2007).
    Internet Scale Data Systems

  19. On Designing and Deploying Internet Scale Services. Hamilton (2007).
    Operations; Reliability; Fault-tolerance

  20. Thinking Methodically about Performance. Gregg (2012).
    Performance

  21. Bitcoin, A peer-to-peer electronic cash system. Nakamoto (2008).
    Crytpocurrencies

    • Ethereum: A Next-Generation Smart Contract and Decentralized Application Platform. Buterin (2014).
  22. A Few Useful Things to Know About Machine Learning. Domingos (2012).
    Machine Learning


Top-level papers only
  1. Von Neumann's First Computer Program. Knuth (1970).
  2. Computing Machinery and Intelligence. Turing (1950).
  3. A Method for the Construction of Minimum-Redundancy Codes. Huffman (1952).
  4. Engineering a Sort Function. Bentley, McIlroy (1993).
  5. A Design Methodology for Reliable Software Systems. Liskov (1972).
  6. Programming with Abstract Data Types. Liskov, Zilles (1974).
  7. Why Functional Programming Matters. Hughes (1990).
  8. An Incremental Approach to Compiler Construction. Ghuloum (2006).
  9. No Silver Bullet: Essence and Accidents of Software Engineering. Brooks (1987).
  10. Communicating sequential processes. Hoare (1978).
  11. The UNIX Time- Sharing System. Ritchie, Thompson (1974).
  12. A Relational Model of Data for Large Shared Data Banks. Codd (1970).
  13. A Protocol for Packet Network Intercommunication. Cerf, Kahn (1974).
  14. New Directions in Cryptography. Diffie, Hellman (1976).
  15. Time, Clocks, and the Ordering of Events in a Distributed System. Lamport (1978).
  16. Designing for Usability: Key Principles and What Designers Think. Gould, Lewis (1985).
  17. The anatomy of a large-scale hypertextual Web search engine. Brin, Page (1998).
  18. Dynamo, Amazon’s Highly Available Key-value store. DeCandia et al (2007).
  19. On Designing and Deploying Internet Scale Services. Hamilton (2007).
  20. Thinking Methodically about Performance. Gregg (2012).
  21. Bitcoin, A peer-to-peer electronic cash system. Nakamoto (2008).
  22. A Few Useful Things to Know About Machine Learning. Domingos (2012).

All papers in chronological order
  1. As We May Think. Bush (1945).
  2. Computing Machinery and Intelligence. Turing (1950).
  3. The Education of a Computer. Hopper (1952).
  4. A Method for the Construction of Minimum-Redundancy Codes. Huffman (1952).
  5. On the Shortest Spanning Subtree of a Graph and the Traveling Salesman Problem. Kruskal (1956).
  6. Man-Computer symbiosis. Licklider (1958).
  7. A Note on Two Problems in Connexion with Graphs. Dijkstra (1959).
  8. Recursive Programming. Dijkstra (1960).
  9. Some Moral and Technical Consequences of Automation. Wiener (1960).
  10. Steps towards Artificial Intelligence. Minsky (1960).
  11. Recursive Functions of Symbolic Expressions and Their Computation by Machine. McCarthy (1960).
  12. Quicksort. Hoare (1962).
  13. An Experimental Time-Sharing System. CorbatĂł, Merwin Daggett, Daley (1962).
  14. Programming Considered as a Human Activity. Dijkstra (1965).
  15. Solution Of a Problem in Concurrent Program Control. Dijkstra (1965).
  16. Some Thoughts About the Social Implications of Accessible Computing. David, Fano (1965).
  17. ELIZA—a computer program for the study of natural language communication between man and machine. Weizenbaum (1966).
  18. The Next 700 Programming Languages. Landin (1966).
  19. Goto Statement Considered Harmful. Dijkstra (1968).
  20. How do committees invent? Conway (1968).
  21. The Structure of the "THE"-Multiprogramming System. Dijkstra (1968).
  22. Von Neumann's First Computer Program. Knuth (1970).
  23. Space/Time Trade-offs in Hash Coding with Allowable Errors. Bloom (1970).
  24. Managing the Development of Large Software Systems. Royce (1970).
  25. A Relational Model of Data for Large Shared Data Banks. Codd (1970).
  26. Program development by stepwise refinement. Wirth (1971).
  27. On the Criteria To Be Used in Decomposing Systems into Modules. Parnas (1971).
  28. The Humble Programmer. Dijkstra (1972).
  29. A Design Methodology for Reliable Software Systems. Liskov (1972).
  30. Information Distribution Aspects of Design Methodology. Parnas (1972).
  31. A Statistical Interpretation of Term Specificity in Retrieval. Spärck Jones (1972).
  32. Computer Programming as an Art. Knuth (1974).
  33. Programming with Abstract Data Types. Liskov, Zilles (1974).
  34. Monitors: An operating system structuring concept. Hoare (1974).
  35. The UNIX Time- Sharing System. Ritchie, Thompson (1974).
  36. A Protocol for Packet Network Intercommunication. Cerf, Kahn (1974).
  37. Self-stabilizing systems in spite of distributed control. Dijkstra (1974).
  38. The Mythical Man Month. Brooks (1975).
  39. Granularity of Locks and Degrees of Consistency in a Shared Data Base. Gray et al (1975).
  40. The Semantics of Predicate Logic as a Programming Language. Van Emden, Kowalski (1976).
  41. New Directions in Cryptography. Diffie, Hellman (1976).
  42. A Universal Algorithm for Sequential Data Compression. Ziv, Lempel (1977).
  43. The Smalltalk-76 Programming System Design and Implementation. Ingalls (1978).
  44. A Theory of Type Polymorphism in Programming. Milner (1978).
  45. Can Programming Be Liberated from the von Neumann Style? Backus (1978).
  46. Communicating sequential processes. Hoare (1978).
  47. On the Duality of Operating System Structures. Lauer, Needham (1978).
  48. Ethernet: Distributed packet switching for local computer networks. Metcalfe, Boggs (1978).
  49. A Method for Obtaining Digital Signatures and Public-Key Cryptosystems. Rivest, Shamir, Adleman (1978).
  50. Time, Clocks, and the Ordering of Events in a Distributed System. Lamport (1978).
  51. The paradigms of programming. Floyd (1979).
  52. The Ubiquitous B-Tree. Comer (1979).
  53. Designing Software for Ease of Extension and Contraction. Parnas (1979).
  54. Access Path Selection in a Relational Database Management System. Selinger et al (1979).
  55. How To Share A Secret. Shamir (1979).
  56. The Semantic Elegance of Applicative Languages. Turner (1981).
  57. The Transaction Concept: Virtues and Limitations. Gray (1981).
  58. Tutorials for the First-Time Computer User. Al-Awar, Chapanis, Ford (1981).
  59. The Byzantine Generals Problem. Lamport, Shostak, Pease (1982).
  60. The star user interface: an overview. Smith, Irby, Kimball (1982).
  61. Design Principles for Human-Computer Interfaces. Norman (1983).
  62. Ironies of Automation. Bainbridge (1983).
  63. Literate Programming. Knuth (1984).
  64. A Theory of the Learnable. Valiant (1984).
  65. Programming pearls: Algorithm design techniques. Bentley (1984).
  66. Programming pearls: The back of the envelope. Bentley (1984).
  67. Reflections on Trusting Trust. Thompson (1984).
  68. End-To-End Arguments in System Design. Saltzer, Reed, Clark (1984).
  69. Programming as Theory Building. Naur (1985).
  70. On understanding types, data abstraction, and polymorphism. Cardelli, Wegner (1985).
  71. An algorithm for distributed computation of a Spanning Tree in an Extended LAN. Perlman (1985).
  72. Impossibility of Distributed Consensus With One Faulty Process. Fisher, Lynch, Patterson (1985).
  73. Designing for Usability: Key Principles and What Designers Think. Gould, Lewis (1985).
  74. Why do computers stop and what can be done about it? Gray (1985).
  75. Making data structures persistent. Driscoll et al (1986).
  76. Programming pearls: little languages. Bentley (1986).
  77. The design of POSTGRES. Stonebraker, Rowe (1986).
  78. No Silver Bullet: Essence and Accidents of Software Engineering. Brooks (1987).
  79. A Digital Signature Based on a Conventional Encryption Function. Merkle (1987).
  80. The Design Philosophy of the DARPA Internet Protocols. Clark (1988).
  81. Why Functional Programming Matters. Hughes (1990).
  82. Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial. Schneider (1990).
  83. SELF: The Power of Simplicity. Ungar, Smith (1991).
  84. On Building Systems That Will Fail. CorbatĂł (1991).
  85. The Design and Implementation of a Log-Structured File System. Rosenblum, Ousterhout (1991).
  86. The essence of functional programming. Wadler (1992).
  87. World-Wide Web: Information Universe. Berners-Lee et al (1992).
  88. Engineering a Sort Function. Bentley, McIlroy (1993).
  89. The Essence of Compiling with Continuations. Flanagan et al (1993).
  90. Software Aging. Parnas (1994).
  91. Software Transactional Memory. Shavit, Touitou (1997).
  92. Human-Computer Interaction: Psychology as a Science of Design. Carroll (1997).
  93. Fifty Years of Shannon Theory. VerdĂş (1998).
  94. The Cathedral and the Bazaar. Raymond (1998).
  95. The anatomy of a large-scale hypertextual Web search engine. Brin, Page (1998).
  96. The PageRank Citation Ranking: Bringing Order to the Web. Page, Brin, Motwani (1998).
  97. Rules of Thumb in Data Engineering. Gray, Shenay (1999).
  98. Practical Byzantine Fault Tolerance. Castro, Liskov (1999).
  99. QuickCheck: A Lightweight Tool for Random Testing of Haskell Programs. Claessen, Hughes (2000).
  100. Paxos made simple. Lamport (2001).
  101. Statistical Modeling: The Two Cultures. Breiman (2001).
  102. Recovery Oriented Computing (ROC): Motivation, Definition, Techniques, and Case Studies. Patterson et al (2002).
  103. A Brief History of Just-In-Time. Aycock (2003).
  104. The Google File System. Ghemawat, Gobioff, Leung (2003).
  105. Crash-Only Software. Candea, Fox (2003).
  106. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. Lattner, Adve (2004).
  107. A Unified Theory of Garbage Collection. Bacon, Cheng, Rajan (2004).
  108. TOR: The second generation onion router. Dingledine et al (2004).
  109. MapReduce: Simplified Data Processing on Large Clusters. Dean, Ghemawat (2004).
  110. A Nanopass Framework for Compiler Education. Sarkar, Waddell, Dybvig (2005).
  111. Church's Thesis and Functional Programming. Turner (2006).
  112. An Incremental Approach to Compiler Construction. Ghuloum (2006).
  113. Out of the Tar Pit. Moseley, Marks (2006).
  114. Why the Internet only just works. Handley (2006).
  115. Bigtable: A Distributed Storage System for Structured Data. Chang et al (2006).
  116. Performance Anti-Patterns. Smaalders (2006).
  117. The Salsa20 family of stream ciphers. Bernstein (2007).
  118. Paxos made live - An Engineering Perspective. Chandra, Griesemer, Redstone (2007).
  119. Dynamo, Amazon’s Highly Available Key-value store. DeCandia et al (2007).
  120. On Designing and Deploying Internet Scale Services. Hamilton (2007).
  121. Bitcoin, A peer-to-peer electronic cash system. Nakamoto (2008).
  122. Building on Quicksand. Helland, Campbell (2009).
  123. The Unreasonable Effectiveness of Data. Halevy, Norvig, Pereira (2009).
  124. ZooKeeper: wait-free coordination for internet scale systems. Hunt et al (2010).
  125. The Hadoop Distributed File System. Shvachko et al (2010).
  126. Thinking Clearly about Performance. Millsap (2010).
  127. Kafka: a Distributed Messaging System for Log Processing. Kreps, Narkhede, Rao (2011).
  128. CAP Twelve Years Later: How the "Rules" Have Changed. Brewer (2012).
  129. Thinking Methodically about Performance. Gregg (2012).
  130. A Few Useful Things to Know About Machine Learning. Domingos (2012).
  131. ImageNet Classification with Deep Convolutional Neural Networks. Krizhevsky, Sutskever, Hinton (2012).
  132. Playing Atari with Deep Reinforcement Learning. Mnih et al (2013).
  133. The Network is Reliable. Bailis, Kingsbury (2014).
  134. In Search of an Understandable Consensus Algorithm. Ongaro, Ousterhout (2014).
  135. Ethereum: A Next-Generation Smart Contract and Decentralized Application Platform. Buterin (2014).
  136. Generative Adversarial Nets. Goodfellow et al (2014).
  137. Towards a Theory of Conceptual Design for Software. Jackson (2015).
  138. Deep Learning. LeCun, Bengio, Hinton (2015).
  139. Bringing the Web up to Speed with WebAssembly. Haas (2017).
  140. Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases. Verbitski et al (2017).
  141. Attention Is All You Need. Vaswani et al (2017).

Sources

This list was inspired by (and draws from) several books and paper collections:

Meta reads

A few interesting resources about reading papers from Papers We Love and elsewhere:

Selection criteria

  1. The list should stay short. Let's say no more than 30 papers.
    • The idea is not to include every interesting paper that I come across but rather to keep a representative list that's possible to read from start to finish with a similar level of effort as reading a technical book from cover to cover.
    • I tried to include one paper per each major topic and author. Since in the process I found a lot of noteworthy alternatives, related or follow-up papers and I wanted to keep track of those as well, I included them as sublist items.
  2. The papers shouldn't be too long. For the same reasons as the previous item, I try to avoid papers longer than 20 or 30 pages.
  3. They should be self-contained and readable enough to be approachable by the casual technical reader.
  4. They should be freely available online.
  5. Although historical relevance was taken into account, I omitted seminal papers in the cases where I found them hard to approach, when the main subject of the paper wasn't the thing that made them influential, etc.
    • Examples of this are classic works by Von Neumann, Turing and Shannon.
    • That being said, where possible I preferred the original paper on each subject over modern updates or survey papers.
  6. I tended to prefer topics that I can relate to my professional practice, typically papers originated in the industry or about innovations that later saw wide adoption.
    • Similarly, I tended to skip more theoretical papers, those focusing on mathematical foundations for Computer Science, electronic aspects of hardware, etc.
  7. I sorted the list by a mix of relatedness of topics and a vague chronological relevance, such that it makes sense to read it in the suggested order. For example, historical and seminal topics go first, contemporary internet-era developments last, networking precedes distributed systems, etc.

In case you are interested in all articles and want to download them all - run this simple script locally

import os
import requests
from bs4 import BeautifulSoup

def sanitize_filename(filename):
    return "".join([c for c in filename if c.isalpha() or c.isdigit() or c == ' ']).rstrip()


def download_article(url, title, save_folder):
    # Create save_folder if it doesn't exist
    os.makedirs(save_folder, exist_ok=True)
    try:
        response = requests.get(url, timeout=5)  # specify a timeout
    except requests.exceptions.Timeout:
        print(f'Timeout error for URL: {url}')
        return
    except requests.exceptions.TooManyRedirects:
        print(f'TooManyRedirects error occurred with {url}')
        return
    except requests.exceptions.RequestException as e:
        print(f'RequestException error occurred with {url}. Exception: {e}')
        return

    if response.status_code == 200:
        # Sanitize title before using it as filename
        title = sanitize_filename(title)
        with open(os.path.join(save_folder, f'{title}.pdf'), 'wb') as f:
            f.write(response.content)


def extract_urls_from_webpage(webpage_url):
    response = requests.get(webpage_url)
    soup = BeautifulSoup(response.text, 'html.parser')
    items = soup.find_all('li')

    articles = []
    for item in items:
        a_tag = item.find('a')
        if a_tag:
            url = a_tag.get('href')
            # If URL starts with '/', append the domain name
            if url and url.startswith('/'):
                url = 'https://github.com' + url
            # Check if URL is valid and is a pdf
            if url and url.endswith('.pdf'):
                title = item.text.split(a_tag.text)[0].strip()
                articles.append((url, title))

    return articles


if __name__ == '__main__':
    webpage_url = input("Please enter the URL of the webpage: ")
    save_folder = '/your_local_folder'
    if not os.path.exists(save_folder):
        os.makedirs(save_folder)

    articles = extract_urls_from_webpage(webpage_url)
    for url, title in articles:
        download_article(url, title, save_folder)

About

đź“š A curated list of papers for Software Engineers

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%