- [ISCA2016] (http://isca2016.eecs.umich.edu/) [ISCA2015] (http://www.ece.cmu.edu/calcm/isca2015/)
- [Micro2016] (https://www.microarch.org/micro49/) [Micro2015] (https://www.microarch.org/micro48/)
- [ASPLOS2016] (https://www.ece.cmu.edu/calcm/asplos2016/)
- [HPCA2016] (http://hpca22.site.ac.upc.edu/index.php/program/conference-program/) [HPCA2015] (http://darksilicon.org/hpca/)
- [SC2016] (http://sc16.supercomputing.org/conference-components/technical-program-tues-fri/technical-papers) [SC2015] (http://sc15.supercomputing.org/program/technical-papers.html)
- [VLDB2016] (http://vldb2016.persistent.com/)
- [OSDI2016]
- [EuroSys2016]
- [ICDE2016]
- [SIGMOD2016]
- [FCCM2016] (http://fccm.org/2016/cfp.html) [FCCM2015] (http://fccm.org/2015/)
- [FPGA2016] (http://www.isfpga.org/fpga2016/index.html) [FPGA2015] (http://www.eecs.ucf.edu/isfpga/)
- [FPL2016] (http://www.fpl2016.org/) [FPL2015] (http://www.fpl2015.org/?page=tech_sessions#arch3)
- [FPT2016] (http://www.icfpt2016.org/index.jsp) [FPT2015] (http://fpt.massey.ac.nz/)
- [ASAP2016] (http://www.asap2016.org/) [ASAP2015] (http://www.eecg.toronto.edu/asap2015/)
- [GAP] (http://gap.cs.berkeley.edu/)
- [GPS] (http://infolab.stanford.edu/gps/)
- [Big Graph Mining] (http://datalab.snu.ac.kr/projects/big-graph-mining)
- [amplab: GraphX] (https://amplab.cs.berkeley.edu/projects/graphx/)
- [Gunrock] (http://gunrock.github.io/gunrock/doc/latest/index.html)
- [Ligra] (http://jshun.github.io/ligra/)
- [Galois] (http://iss.ices.utexas.edu/?p=projects/galois)
- [Trinity] (https://www.microsoft.com/en-us/research/project/trinity/)
- [GRASP] (http://grasp.cs.ucr.edu/)
The graph processing algorithms and frameworks are roughly classified based on the target computing platforms including many-core processors, distributed systems, GPUs, ASIC based Accelerators and FPGAs. Instead of targeting the graph processing framework, some of the work may particularly focus on one aspect of the graph processing such as graph compression, pre-processing, partition and load balancing. These work will be put in corresponding subsections as well.
-
McCune, Robert Ryan, Tim Weninger, and Greg Madey. "Thinking like a vertex: a survey of vertex-centric frameworks for large-scale distributed graph processing." ACM Computing Surveys (CSUR) 48.2 (2015): 25.
-
Doekemeijer, Niels, and Ana Lucia Varbanescu. "A survey of parallel graph processing frameworks." Delft University of Technology (2014).
-
Shi, Xuanhua, J. Liang, X. Luo, S. Di, B. He, L. Lu, and Hai Jin. "Frog: Asynchronous graph processing on GPU with hybrid coloring model." Huazhong University of Science and Technology, Tech. Rep. HUSTCGCL-TR-402 (2015).
-
Farzad Khorasani, Keval Vora, Rajiv Gupta, and Laxmi N. Bhuyan. 2014. CuSha: vertex-centric graph processing on GPUs. In Proceedings of the 23rd international symposium on High-performance parallel and distributed computing (HPDC '14). ACM, New York, NY, USA, 239-252.
-
Fu, Zhisong, Michael Personick, and Bryan Thompson. "Mapgraph: A high level API for fast development of high performance graph analytics on GPUs." In Proceedings of Workshop on GRAph Data management Experiences and Systems pp. 1-6. ACM, 2014
-
Andrew Davidson, Sean Baxter, Michael Garland, and John D. Owens. 2014. Work-Efficient Parallel GPU Methods for Single-Source Shortest Paths. In Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium (IPDPS '14). IEEE Computer Society, Washington, DC, USA, 349-359.
-
Merrill, Duane, Michael Garland, and Andrew Grimshaw. "Scalable GPU graph traversal." ACM SIGPLAN Notices. Vol. 47. No. 8. ACM, 2012.
-
Singh D P, Khare N. Modified Dijkstra’s Algorithm for Dense Graphs on GPU using CUDA[J]. Indian Journal of Science and Technology, 2016, 9(33).
-
Wang, Yangzihao; Davidson, Andrew; Pan, Yuechao; Wu, Yuduo; Riffel, Andy; & Owens, John D.(2016). Gunrock: A High-Performance Graph Processing Library on the GPU. Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming.
-
Singh DP, Khare N, Rasool A. Efficient Parallel Implementation of Single Source Shortest Path Algorithm on GPU Using CUDA. International Journal of Applied Engineering Research. 2016; 11(4):2560–7.
-
Bingsheng He, Jianlong Zhong, "Medusa: Simplified Graph Processing on GPUs", IEEE Transactions on Parallel & Distributed Systems
-
Hong, Sungpack, Sang Kyun Kim, Tayo Oguntebi, and Kunle Olukotun. "Accelerating CUDA graph algorithms at maximum warp." In ACM SIGPLAN Notices, vol. 46, no. 8, pp. 267-276. ACM, 2011.
-
Roy, Amitabha, Ivo Mihailovic, and Willy Zwaenepoel. "X-Stream: edge-centric graph processing using streaming partitions." In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, pp. 472-488. ACM, 2013.
-
Shang, Zechao, Feifei Li, Jeffrey Xu Yu, Zhiwei Zhang, and Hong Cheng. "Graph Analytics Through Fine-Grained Parallelism. SIGMOD, 2016"
-
Sundaram, Narayanan, et al. "GraphMat: High performance graph analytics made productive." Proceedings of the VLDB Endowment 8.11 (2015): 1214-1225.
-
Julian Shun. An Evaluation of Parallel Eccentricity Estimation Algorithms on Undirected Real-World Graphs. Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), pp. 1095-1104, 2015.
-
Delling, Daniel, et al. "Phast: Hardware-accelerated shortest path trees." Journal of Parallel and Distributed Computing 73.7 (2013): 940-952.
-
Meyer, Ulrich, and Peter Sanders. "Δ-stepping: a parallelizable shortest path algorithm." Journal of Algorithms 49.1 (2003): 114-152.
-
Kyrola, Aapo, Guy Blelloch, and Carlos Guestrin. "GraphChi: large-scale graph computation on just a PC." Presented as part of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12). 2012.
-
Julian Shun and Guy E. Blelloch. 2013. Ligra: a lightweight graph processing framework for shared memory. In Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming (PPoPP '13). ACM, New York, NY, USA, 135-146.
-
Yuze Chi, Guohao Dai, Yu Wang, Guangyu Sun, Guoliang Li, Huazhong Yang, "NXgraph: An Efficient Graph Processing System on a Single Machine", CoRR, 2015
-
Cheng, Raymond, Ji Hong, Aapo Kyrola, Youshan Miao, Xuetian Weng, Ming Wu, Fan Yang, Lidong Zhou, Feng Zhao, and Enhong Chen. "Kineograph: taking the pulse of a fast-changing and connected world." In Proceedings of the 7th ACM european conference on Computer Systems, pp. 85-98. ACM, 2012.
-
Geisberger, Robert, Peter Sanders, Dominik Schultes, and Daniel Delling. "Contraction hierarchies: Faster and simpler hierarchical routing in road networks." In International Workshop on Experimental and Efficient Algorithms, pp. 319-333. Springer Berlin Heidelberg, 2008.
-
Zheng, Da, Disa Mhembere, Randal Burns, Joshua Vogelstein, Carey E. Priebe, and Alexander S. Szalay. "FlashGraph: Processing billion-node graphs on an array of commodity SSDs." In 13th USENIX Conference on File and Storage Technologies (FAST 15), pp. 45-58. 2015.
-
Yuan, Pingpeng, Wenya Zhang, Changfeng Xie, Hai Jin, Ling Liu, and Kisung Lee. "Fast iterative graph computation: A path centric approach." In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 401-412. IEEE Press, 2014.
-
Najeebullah, Kamran, Kifayat Ullah Khan, Waqas Nawaz, and Young-Koo Lee. "BPP: Large Graph Storage for Efficient Disk Based Processing." arXiv preprint arXiv:1401.2327 (2014).
-
Nilakant, Karthik, Valentin Dalibard, Amitabha Roy, and Eiko Yoneki. "PrefEdge: SSD prefetcher for large-scale graph traversal." In Proceedings of International Conference on Systems and Storage, pp. 1-12. ACM, 2014.
-
Nguyen, Donald, Andrew Lenharth, and Keshav Pingali. "A lightweight infrastructure for graph analytics." In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, pp. 456-471. ACM, 2013.
-
Venkataraman, Shivaram, Erik Bodzsar, Indrajit Roy, Alvin AuYoung, and Robert S. Schreiber. "Presto: distributed machine learning and graph processing with sparse matrices." In Proceedings of the 8th ACM European Conference on Computer Systems, pp. 197-210. ACM, 2013.
-
Gonzalez, Joseph E., et al. "Graphx: Graph processing in a distributed dataflow framework." 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14). 2014.
-
Salihoglu, Semih, and Jennifer Widom. "GPS: a graph processing system." Proceedings of the 25th International Conference on Scientific and Statistical Database Management. ACM, 2013.
-
Malewicz, Grzegorz, et al. "Pregel: a system for large-scale graph processing." Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. ACM, 2010.
-
Anand Padmanabha Iyer, Li Erran Li, Tathagata Das, and Ion Stoica. 2016. Time-evolving graph processing at scale. In Proceedings of the Fourth International Workshop on Graph Data Management Experiences and Systems (GRADES '16). ACM, New York, NY, USA
-
Steinbauer, Matthias, and Gabriele Anderst-Kotsis. "DynamoGraph: extending the Pregel paradigm for large-scale temporal graph processing." International Journal of Grid and Utility Computing 7.2 (2016): 141-151.
-
Steinbauer, Matthias, and Gabriele Anderst-Kotsis. "DynamoGraph: A Distributed System for Large-scale, Temporal Graph Processing, its Implementation and First Observations." Proceedings of the 25th International Conference Companion on World Wide Web. International World Wide Web Conferences Steering Committee, 2016.
-
Khayyat, Zuhair, et al. "Mizan: a system for dynamic load balancing in large-scale graph processing." Proceedings of the 8th ACM European Conference on Computer Systems. ACM, 2013.
-
Sengupta, Dipanjan, et al. "Graphin: An online high performance incremental graph processing framework." European Conference on Parallel Processing. Springer International Publishing, 2016.
-
Sabeur Aridhi, Alberto Montresor, and Yannis Velegrakis. 2016. BLADYG: A Novel Block-Centric Framework for the Analysis of Large Dynamic Graphs. In Proceedings of the ACM Workshop on High Performance Graph Processing (HPGP '16). ACM, New York, NY, USA
-
Umuroglu, Yaman, Donn Morrison, and Magnus Jahre. "Hybrid breadth-first search on a single-chip FPGA-CPU heterogeneous platform." In Field Programmable Logic and Applications (FPL), 2015 25th International Conference on, pp. 1-8. IEEE, 2015.
-
Oguntebi and Kunle Olukotun. 2016. GraphOps: A Dataflow Library for Graph Analytics Acceleration. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA '16). ACM, New York, NY, USA, 111-117. DOI: http://dx.doi.org/10.1145/2847263.2847337
-
Nurvitadhi, Eriko, et al. "GraphGen: An FPGA framework for vertex-centric graph computation." Field-Programmable Custom Computing Machines (FCCM), 2014 IEEE 22nd Annual International Symposium on. IEEE, 2014.
-
U. Bondhugula, A. Devulapalli, J. Fernando, P. Wyckoff and P. Sadayappan, "Parallel FPGA-based all-pairs shortest-paths in a directed graph," Proceedings 20th IEEE International Parallel & Distributed Processing Symposium, 2006
-
Guohao Dai, Yuze Chi, Yu Wang, and Huazhong Yang. 2016. FPGP: Graph Processing Framework on FPGA A Case Study of Breadth-First Search. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
-
N. Engelhardt and H. K. H. So, "GraVF: A vertex-centric distributed graph processing framework on FPGAs," 2016 26th International Conference on Field Programmable Logic and Applications (FPL), Lausanne, Switzerland, 2016, pp. 1-4.
-
Kapre, Nachiket. "Custom FPGA-based soft-processors for sparse graph acceleration." In 2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP), pp. 9-16. IEEE, 2015.
-
Kapre, Nachiket, and Pradeep Moorthy. "A case for embedded FPGA-based socs in energy-efficient acceleration of graph problems." Supercomputing frontiers and innovations 2, no. 3 (2015): 76-86.
-
S. Zhou, C. Chelmis and V. K. Prasanna, "High-Throughput and Energy-Efficient Graph Processing on FPGA," 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), Washington, DC, 2016, pp. 103-110.
-
Ham, Tae Jun, et al. "Graphicionado: A High-Performance and Energy-Efficient Accelerator for Graph Analytics."
-
Ozdal, Muhammet Mustafa, et al. "Energy efficient architecture for graph analytics accelerators." Computer Architecture (ISCA), 2016 ACM/IEEE 43rd Annual International Symposium on. IEEE, 2016.
-
Junwhan Ahn, Sungpack Hong, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi. 2015. A scalable processing-in-memory accelerator for parallel graph processing. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA '15). ACM, New York, NY, USA, 105-117.
-
Chen, Rong, Jiaxin Shi, Yanzhe Chen, and Haibo Chen. "Powerlyra: Differentiated graph computation and partitioning on skewed graphs." In Proceedings of the Tenth European Conference on Computer Systems, p. 1. ACM, 2015.
-
Vaquero, Luis, et al. "xDGP: A dynamic graph processing system with adaptive partitioning." arXiv preprint arXiv:1309.1049 (2013).
-
Julian Shun, Farbod Roosta-Khorasani, Kimon Fountoulakis and Michael Mahoney. Parallel Local Graph Clustering. Proceedings of the International Conference on Very Large Data Bases (VLDB), 2016.
-
A. Abdolrashidi and L. Ramaswamy, "Continual and Cost-Effective Partitioning of Dynamic Graphs for Optimizing Big Graph Processing Systems," 2016 IEEE International Congress on Big Data (BigData Congress), San Francisco, CA, USA, 2016
-
Andreas Beckmann, Ulrich Meyer and David, Veith, "An Implementation of I/O-Efficient Dynamic Breadth-First Search Using Level-Aligned Hierarchical Clustering", 21st Annual European Symposium of Algorithms (ESA), 2013.
-
Wu, Bo, Zhijia Zhao, Eddy Zheng Zhang, Yunlian Jiang, and Xipeng Shen. "Complexity analysis and algorithm design for reorganizing data to minimize non-coalesced memory accesses on GPU." In ACM SIGPLAN Notices, vol. 48, no. 8, pp. 57-68. ACM, 2013.
-
Khorasani, Farzad, Keval Vora, Rajiv Gupta, and Laxmi N. Bhuyan. "CuSha: vertex-centric graph processing on GPUs." In Proceedings of the 23rd international symposium on High-performance parallel and distributed computing, pp. 239-252. ACM, 2014.
-
Eddy Z. Zhang, Yunlian Jiang, Ziyu Guo, Kai Tian, and Xipeng Shen. 2011. On-the-fly elimination of dynamic irregularities for GPU computing. In Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems (ASPLOS XVI). ACM, New York, NY, USA, 369-380.
-
Sanders, Peter, Dominik Schultes, and Christian Vetter. "Mobile route planning." In European Symposium on Algorithms, pp. 732-743. Springer Berlin Heidelberg, 2008.
-
Zhou, Fang. "Graph compression." Department of Computer Science and Helsinki Institute for Information Technology HIIT (2015): 1-12.
-
S. Chen and J. H. Reif. 1996. Efficient Lossless Compression of Trees and Graphs. In Proceedings of the Conference on Data Compression (DCC '96). IEEE Computer Society, Washington
-
Sebastian Maneth and Fabian Peternek, "A Survey on Methods and Systems for Graph Compression", Journal of CoRR, 2015
-
Sparsh Mittal and Jeffrey S. Vetter. 2016. A Survey Of Architectural Approaches for Data Compression in Cache and Main Memory Systems. IEEE Trans. Parallel Distrib. Syst. 27, 5 (May 2016), 1524-1536.
-
Vito Giovanni Castellana, Marco Minutoli, Alessandro Morari, Antonino Tumeo, Marco Lattuada, and Fabrizio Ferrandi. 2015. High Level Synthesis of RDF Queries for Graph Analytics. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD '15). IEEE Press, Piscataway, NJ, USA, 323-330.
-
Julian Shun, Laxman Dhulipala and Guy Blelloch. Smaller and Faster: Parallel Processing of Compressed Graphs with Ligra+. Proceedings of the IEEE Data Compression Conference (DCC), pp. 403-412, 2015
- Shang, Zechao, and Jeffrey Xu Yu. "Auto-approximation of graph computing." Proceedings of the VLDB Endowment 7, no. 14 (2014): 1833-1844.
-
Shi, Jiaxin, Youyang Yao, Rong Chen, Haibo Chen, and Feifei Li. "Fast and concurrent rdf queries with rdma-based distributed graph exploration." In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16)(Savannah, GA. 2016.
-
Xirogiannopoulos, Konstantinos, Udayan Khurana, and Amol Deshpande. "GraphGen: exploring interesting graphs in relational data." Proceedings of the VLDB Endowment 8.12 (2015): 2032-2035.
-
Morari, Alessandro, Jesse Weaver, Oreste Villa, David Haglin, Antonino Tumeo, Vito Giovanni Castellana, and John Feo. "High-Performance, Distributed Dictionary Encoding of RDF Datasets." In 2015 IEEE International Conference on Cluster Computing, pp. 250-253. IEEE, 2015.
-
Morari, Alessandro, Vito Giovanni Castellana, Oreste Villa, Jesse Weaver, Gregory Todd Williams, David J. Haglin, Antonino Tumeo, and John Feo. "GEMS: Graph Database Engine for Multithreaded Systems." (2015): 139-156.
- [Xtra Computing Group] (http://pdcc.ntu.edu.sg/xtra/)
- [amplab] (https://amplab.cs.berkeley.edu/projects/succinct-enabling-queries-on-compressed-data/)
-
M. Sadoghi, R. Javed, N. Tarafdar, H. Singh, R. Palaniappan and H. A. Jacobsen, "Multi-query Stream Processing on FPGAs," 2012 IEEE 28th International Conference on Data Engineering, Washington, DC, 2012, pp. 1229-1232.
-
Kocberber, Onur, Boris Grot, Javier Picorel, Babak Falsafi, Kevin Lim, and Parthasarathy Ranganathan. "Meet the walkers: Accelerating index traversals for in-memory databases." In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 468-479. ACM, 2013.
-
V. G. Castellana et al., "In-Memory Graph Databases for Web-Scale Data," in Computer, vol. 48, no. 3, pp. 24-35, Mar. 2015.
-
Zeng, Kai, Jiacheng Yang, Haixun Wang, Bin Shao, and Zhongyuan Wang. "A distributed graph engine for web scale RDF data." In Proceedings of the VLDB Endowment, vol. 6, no. 4, pp. 265-276. VLDB Endowment, 2013.
-
Sukhwani, Bharat, et al. "A hardware/software approach for database query acceleration with fpgas." International Journal of Parallel Programming 43.6 (2015): 1129-1159.
-
Dennl, Christopher, Daniel Ziener, and Jurgen Teich. "On-the-fly composition of FPGA-based SQL query accelerators using a partially reconfigurable module library." Field-Programmable Custom Computing Machines (FCCM), 2012 IEEE 20th Annual International Symposium on. IEEE, 2012.
-
Wu, Lisa, et al. "The Q100 Database Processing Unit." IEEE Micro 35.3 (2015): 34-46.
-
Chung, Eric S., John D. Davis, and Jaewon Lee. "Linqits: Big data on little clients." ACM SIGARCH Computer Architecture News. Vol. 41. No. 3. ACM, 2013.
-
Halstead, Robert J., et al. "FPGA-based Multithreading for In-Memory Hash Joins." CIDR. 2015.
-
Chen, Ren, and Viktor K. Prasanna. "Accelerating Equi-Join on a CPU-FPGA Heterogeneous Platform."
-
Wang, Zeke, Bingsheng He, and Wei Zhang. "A study of data partitioning on OpenCL-based FPGAs." 2015 25th International Conference on Field Programmable Logic and Applications (FPL). IEEE, 2015.
-
R. R. Bordawekar and M. Sadoghi, "Accelerating database workloads by software-hardware-system co-design," 2016 IEEE 32nd International Conference on Data Engineering (ICDE), Helsinki, 2016, pp. 1428-1431.
-
Guo, Cong and Martin Karsten. “Towards Adaptive Resource Allocation for Database Workloads.” ADMS@VLDB (2015).
-
Johns Paul, Jiong He, and Bingsheng He. 2016. GPL: A GPU-based Pipelined Query Processing Engine. In Proceedings of the 2016 International Conference on Management of Data (SIGMOD '16). ACM, New York, NY
-
Jared Casper and Kunle Olukotun. 2014. Hardware acceleration of database operations. In Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays (FPGA '14)
-
Gokul Soundararajan, Daniel Lupei, Saeed Ghanbari, Adrian Daniel Popescu, Jin Chen, and Cristiana Amza. 2009. Dynamic resource allocation for database servers running on virtual storage. In Proccedings of the 7th conference on File and storage technologies (FAST '09), Margo Seltzer and Ric Wheeler (Eds.). USENIX Association, Berkeley, CA, USA, 71-84.
-
Bingsheng He and Jeffrey Xu Yu. 2011. High-throughput transaction executions on graphics processors. Proc. VLDB Endow. 4, 5 (February 2011), 314-325.
-
Bharat Sukhwani, Hong Min, Mathew Thoennes, Parijat Dube, Bernard Brezzo, Sameh Asaad, Donna Eng Dillenberger, "Database Analytics: A Reconfigurable-Computing Approach", IEEE Micro vol. 34 no. 1, p. 19-29, Jan.-Feb., 2014
-
Shuang Chen, Shunning Jiang, Bingsheng He, and Xueyan Tang. 2016. A Study of Sorting Algorithms on Approximate Memory. In Proceedings of the 2016 International Conference on Management of Data (SIGMOD '16). ACM, New York, NY, USA, 647-662.
-
Gustavo Alonso, "Data Processing on the fast lane", Systems Group, Department of Computer Science, ETH Zurich, Switzerland, FPL keynote, 2016.
-
Zeke Wang, Huiyan Cheah, Johns Paul, Bingsheng He, and Wei Zhang. 2016. Accelerating Database Query Processing on OpenCL-based FPGAs (Abstract Only). In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA '16). ACM, New York, NY
-
Barthels, Claude, Ingo Müller, Timo Schneider, Gustavo Alonso, and Torsten Hoefler. "Distributed Join Algorithms on Thousands of Cores." Proceedings of the VLDB Endowment 10, no. 5 (2017).
-
Harald Lang, Tobias Mühlbauer, Florian Funke, Peter A. Boncz, Thomas Neumann and Alfons Kemper. “Data Blocks: Hybrid OLTP and OLAP on Compressed Storage using both Vectorization and Compilation.” SIGMOD Conference (2016).
-
Lin, Chunbin, Jianguo Wang, and Yannis Papakonstantinou. "Data Compression for Analytics over Large-scale In-memory Column Databases." arXiv preprint arXiv:1606.09315 (2016).
- [RIFF] (https://github.com/KastnerRG/riffa)
- Overclocking Arithmetic
- Some Highlights in FPGA 2016
- [Time is Precision] (https://constantinides.net/2016/12/12/time-is-precision/)
-
Ousterhout, John, Arjun Gopalan, Ashish Gupta, Ankita Kejriwal, Collin Lee, Behnam Montazeri, Diego Ongaro et al. "The ramcloud storage system." ACM Transactions on Computer Systems (TOCS) 33, no. 3 (2015): 7.
-
Ho, Chen-Han, Sung Jin Kim, and Karthikeyan Sankaralingam. "Efficient execution of memory access phases using dataflow specialization." In ACM SIGARCH Computer Architecture News, vol. 43, no. 3, pp. 118-130. ACM, 2015.
-
Kumar, Snehasish, Arrvindh Shriraman, Vijayalakshmi Srinivasan, Dan Lin, and Jordon Phillips. "SQRL: hardware accelerator for collecting software data structures." In Proceedings of the 23rd international conference on Parallel architectures and compilation, pp. 475-476. ACM, 2014.
-
Schkufza, Eric, Rahul Sharma, and Alex Aiken. "Stochastic optimization of floating-point programs with tunable precision." ACM SIGPLAN Notices 49, no. 6 (2014): 53-64.
-
Guo, Qi, Tze-Meng Low, Nikolaos Alachiotis, Berkin Akin, Larry Pileggi, James C. Hoe, and Franz Franchetti. "Enabling portable energy efficiency with memory accelerated library." In Proceedings of the 48th International Symposium on Microarchitecture, pp. 750-761. ACM, 2015.
-
Appuswamy, Raja, Matthaios Olma, and Anastasia Ailamaki. "Scaling the Memory Power Wall With DRAM-Aware Data Management." In Proceedings of the 11th International Workshop on Data Management on New Hardware, p. 3. ACM, 2015.
-
Akın, Berkin, Franz Franchetti, and James C. Hoe. "Understanding the design space of dram-optimized hardware FFT accelerators." In 2014 IEEE 25th International Conference on Application-Specific Systems, Architectures and Processors, pp. 248-255. IEEE, 2014.
-
Akin, Berkin, Franz Franchetti, and James C. Hoe. "Data reorganization in memory using 3d-stacked dram." In ACM SIGARCH Computer Architecture News, vol. 43, no. 3, pp. 131-143. ACM, 2015.
-
Hsieh, Kevin, Samira Khan, Nandita Vijaykumar, Kevin K. Chang, Amirali Boroumand, Saugata Ghose, and Onur Mutlu. "Accelerating pointer chasing in 3D-stacked memory: Challenges, mechanisms, evaluation." In Computer Design (ICCD), 2016 IEEE 34th International Conference on, pp. 25-32. IEEE, 2016.
-
Jacobsen, M., Richmond, D., Hogains, M., and Kastner, R. “RIFFA 2.1: A reusable integration framework for FPGA accelerators.” ACM Transactions on Reconfigurable Technology and Systems (TRETS), September 2015.
-
C. Pham-Quoc, Z. Al-Ars and K. Bertels, "Automated Hybrid Interconnect Design for FPGA Accelerators Using Data Communication Profiling," Parallel & Distributed Processing Symposium Workshops (IPDPSW), 2014 IEEE International, Phoenix, AZ, 2014, pp. 151-160.
-
Niu, Xinyu, Wayne Luk, and Yu Wang. "EURECA: On-chip configuration generation for effective dynamic data access." Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 2015.
-
Umuroglu, Yaman, and Magnus Jahre. "Random access schemes for efficient FPGA SpMV acceleration." Microprocessors and Microsystems (2016).
-
Dorrance, Richard, Fengbo Ren, and Dejan Marković. "A scalable sparse matrix-vector multiplication kernel for energy-efficient sparse-blas on FPGAs." In Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays, pp. 161-170. ACM, 2014.
-
Jamro, Ernest, Tomasz Pabiś, Paweł Russek, and Kazimierz Wiatr. "The algorithms for FPGA implementation of sparse matrices multiplication." Computing and Informatics 33, no. 3 (2015): 667-684.
-
Giefers, Heiner, Peter Staar, Costas Bekas, and Christoph Hagleitner. "Analyzing the energy-efficiency of sparse matrix multiplication on heterogeneous systems: A comparative study of GPU, Xeon Phi and FPGA." In Performance Analysis of Systems and Software (ISPASS), 2016 IEEE International Symposium on, pp. 46-56. IEEE, 2016.
-
Yu, Xiangyao, George Bezerra, Andrew Pavlo, Srinivas Devadas, and Michael Stonebraker. "Staring into the abyss: An evaluation of concurrency control with one thousand cores." Proceedings of the VLDB Endowment 8, no. 3 (2014): 209-220.
-
Fu, Yaosheng, and David Wentzlaff. "PriME: A parallel and distributed simulator for thousand-core chips." In Performance Analysis of Systems and Software (ISPASS), 2014 IEEE International Symposium on, pp. 116-125. IEEE, 2014.
-
Miller, Jason E., Harshad Kasture, George Kurian, Charles Gruenwald, Nathan Beckmann, Christopher Celio, Jonathan Eastep, and Anant Agarwal. "Graphite: A distributed parallel simulator for multicores." In HPCA-16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture, pp. 1-12. IEEE, 2010.
-
Carlson, Trevor E., Wim Heirman, and Lieven Eeckhout. "Sniper: exploring the level of abstraction for scalable and accurate parallel multi-core simulation." In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, p. 52. ACM, 2011.