Conference Papers

Conference
Paper Content
HPCA'26

[pdf] [code]
AUM: Unleashing the Efficiency Potential of Shared Processors with Accelerator Units for LLM Serving
Authors: Xinkai Wang, Chao Li, Yiming Zhuansun, Jinyang Guo, Xiaofeng Hou, Jing Wang, Luping Wang, Weigao Chen, Cheng Huang, Guodong Yang, Liping Zhang, Minyi Guo.
Conference: Proceedings of the 32nd IEEE International Symposium on High-Performance Computer Architecture (HPCA), Feb. 2026.

ASPLOS'26

[pdf] [code]
MoE-APEX: An Efficient MoE Inference System with Adaptive Precision Expert Offloading
Authors: Peng Tang, Jiacheng Liu, Xiaofeng Hou, Yifei Pu, Jing Wang, Pheng-Ann Heng, Chao Li, Minyi Guo.
Conference: Proceedings of the 31st International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), March 2026.

NPC'25
(Best Student Paper Award)

[pdf] [code]
TriCooling-Sim: Efficient Thermal Simulation for High-Density Micro AI Data Centers
Authors: Jinyang Guo, Xinkai Wang, Jing Wang, Xiaofeng Hou, Chao Li, Minyi Guo.
Conference: Proceedings of the 21st IFIP International Conference on Network and Parallel Computing (NPC), Nov. 2025.

NPC'25

[pdf] [code]
CGO: Cloud Game Orchestration via Resource Preception and CODEC Optimization
Authors: Taolei Wang, Chao Li, Jing Wang, Xiaofeng Hou, Minyi Guo.
Conference: Proceedings of the 21st IFIP International Conference on Network and Parallel Computing (NPC), Nov. 2025.

APPT'25

[pdf] [code]
AsymServe: Demystifying and Optimizing LLM Serving Efficiency on CPU Acceleration Units
Authors: Xinkai Wang, Yiming Zhuansun, Chao Li, Jing Wang, Xiaofeng Hou, Lingyu Sun, Luping Wang, Minyi Guo.
Conference: Proceedings of the International Symposium on Advanced Parallel Processing Technology (APPT), July 2025.

APPT'25

[pdf] [code]
Accelerating Large-Scale Out-of-GPU-Core GNN Training with Two-Level Historical Caching
Authors: Jing Wang, Taolei Wang, Juntao Huang, Yibo Liu, Xinkai Wang, Marius Kreutzer, Chao Li, Minyi Guo.
Conference: Proceedings of the International Symposium on Advanced Parallel Processing Technology (APPT), July 2025.

SC'24

[ACM link] [slides] [poster] [talk] [code]
Boosting Data Center Performance via Intelligently Managed Multi-backend Disaggregated Memory
Authors: Jing Wang, Hanzhang Yang, Chao Li, Yiming Zhuansun, Wang Yuan, Cheng Xu, Xiaofeng Hou, Minyi Guo, Yang Hu, Yaqian Zhao.
Conference: International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2024.

VLDB'24

[slides] [code]
FlowWalker: A Memory-efficient and High-performance GPU-based Dynamic Graph Random Walk Framework
Authors: Junyi Mei, Shixuan Sun, Chao Li, Cheng Xu, Cheng Chen, Yibo Liu, Jing Wang, Cheng Zhao, Xiaofeng Hou, Minyi Guo, Bingsheng He, Xiaoliang Cong.
Conference: Proceedings of the Very Large Data Bases Endowment (VLDB), 2024.

IPDPS'24

[slides] [code]
CoCG: Fine-grained Cloud Game Co-location on Heterogeneous Platform
Authors: Taolei Wang, Jing Wang, Chao Li, Cheng Xu, Xiaofeng Hou, Minyi Guo.
Conference: IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2024.

CCGrid'24

[slides] [code]
Improving the Efficiency of Serverless Computing via Core-Level Power Management
Authors: Du Liu, Jing Wang, Xinkai Wang, Chao Li, Lu Zhang, Xiaofeng Hou, Xiaoxiang Shi, Minyi Guo.
Conference: International Symposium on Cluster, Cloud and Internet Computing (CCGrid), 2024.

ICME'24

[slides] [code]
$M^2$SN: Adaptive and Dynamic Multi-modal Shortcut Network Architecture for Latency-aware Applications
Authors: Yifei Pu, Chi Wang, Xiaofeng Hou, Cheng Xu, Jiacheng Liu,Jing Wang, Minyi Guo, Chao Li.
Conference: International Conference on Multimedia and Expo (ICME), 2024.

SoCC'23

[pdf] [code]
Not All Resources are Visible: Exploiting Fragmented Shadow Resources in Shared-State Scheduler Architecture
Authors: Xinkai Wang, Hao He, Yuancheng Li, Chao Li, Xiaofeng Hou, Jing Wang, Quan Chen, Jingwen Leng, Minyi Guo, Leibo Wang.
Conference: Proceedings of the 14th ACM Symposium on Cloud Computing (SoCC), Nov. 2023.

IPDPS'22

[slides] [code] [video]
Excavating the Potential of Graph Workload on RDMA-based Far Memory Architecture
Authors: Jing Wang, Chao Li, Taolei Wang, Lu Zhang, Pengyu Wang, Junyi Mei, Minyi Guo.
Conference: International Parallel and Distributed Processing Symposium (IPDPS), 2022.

ICCD'22

[slides] [code] [video]
HyFarM: Task Orchestration on Hybrid Far Memory for High Performance Per Bit
Authors: Jing Wang, Chao Li, Junyi Mei, Hao He, Taolei Wang, Pengyu Wang, Lu Zhang, Minyi Guo, Hanqing Wu, Dongbai Chen, Xiangwen Liu.
Conference: International Conference on Computer Design (ICCD), 2022.

ICPE'22
(Best Paper Award)

[pdf] [code]
Oversubscribing GPU Unified Virtual Memory: Implications and Suggestions
Authors: Chuanming Shao, Jinyang Guo, Pengyu Wang, Jing Wang, Chao Li, Minyi Guo.
Conference: International Conference on Performance Engineering (ICPE), 2022.

PACT'21

[pdf] [code]
Skywalker: Efficient Alias-method-based Graph Sampling and Random Walk on GPUs
Authors: Pengyu Wang, Chao Li, Jing Wang, Taolei Wang, Lu Zhang, Jingwen Leng, Quan Chen, Minyi Guo.
Conference: International Conference on Parallel Architectures and Compilation Techniques (PACT), 2021.

Journal Articles

Journal
Paper Content
JPDC'25
(accepted)

[pdf]
MMBypass: Towards Efficient Multi-modal AI Computing with Adaptive Bypass Network
Authors: Yifei Pu, Xinfeng Xia, Chi Wang, Cheng Xu, Jiacheng Liu, Jing Wang, Minyi Guo, Jingling Yuan, Chao Li.
Journal: Journal of Parallel and Distributed Computing (JPDC), accepted, 2025.

TACO'25

[pdf] [code]
Enhancing High-Throughput GPU Random Walks Through Multi-Task Concurrency Orchestration
Authors: Cheng Xu, Chao Li, Xiaofeng Hou, Junyi Mei, Jing Wang, Pengyu Wang, Shixuan Sun, Minyi Guo, Baoping Hao.
Journal: ACM Transactions on Architecture and Code Optimization (TACO), Vol 22, Issue 1, 2025.

TACO'24

[code]
Enhancing High-Throughput GPU Random Walks Through Multi-Task Concurrency Orchestration
Authors: Cheng Xu, Chao Li, Xiaofeng Hou, Junyi Mei, Jing Wang, Pengyu Wang, Shixuan Sun, Minyi Guo, Baoping Hao
Journal: Transactions on Architecture and Code Optimization (TACO), 2024.

JPDC'23

[pdf] [code]
Fargraph+: Excavating the Parallelism of Graph Computing Workload on RDMA-based Far Memory System
Authors: Jing Wang, Chao Li, Yibo Liu, Taolei Wang, Junyi Mei, Lu Zhang, Pengyu Wang, Minyi Guo.
Journal: Journal of Parallel Distributed Computing (JPDC), 2023.

TODAES'23

[pdf] [code]
DRAGON: Dynamic Recurrent Accelerator for Graph Online Convolution
Authors:
José Romero Hung, Chao Li, Taolei Wang, Jinyang Guo, Pengyu Wang, Chuanming Shao, Jing Wang, Guoyong Shi, Xiangwen Liu, Hanqing Wu.
Journal: Transactions on Design Automation of Electronic Systems (TODAES), 2023.

TACO'21

[pdf] [code]
Grus: Toward Unified-memory-efficient High-performance Graph Processing on GPU
Authors: Pengyu Wang, Jing Wang, Chao Li, Jianzong Wang, Haojin Zhu, Minyi Guo.
Journal: Transactions on Architecture and Code Optimization (TACO), 2021.

TC'21

[pdf] [code]
Tapping into nfv environment for opportunistic serverless edge function deployment
Authors: Lu Zhang, Weiqi Feng, Chao Li, Xiaofeng Hou, Pengyu Wang, Jing Wang, Minyi Guo.
Journal: Transactions on Computers (TC), 2021.

TRETS'21

[pdf] [code]
ACE-GCN: A Fast data-driven FPGA accelerator for GCN embedding
Authors:
José Romero Hung, Chao Li, Pengyu Wang, Chuanming Shao, Jinyang Guo, Jing Wang, Guoyong Shi.
Journal: Transactions on Reconfigurable Technology and Systems (TRETS) , 2021.

JCRD'20

[pdf] [code]
Programming and Developing Environment for FPGA Graph Processing: Survey and Exploration
FPGA图计算的编程与开发环境:综述和探索
Authors: Jinyang Guo, Chuanming Shao, Jing Wang , Chao Li , Haojin Zhu, Minyi Guo.
Journal: 计算机研究与发展,Journal of Computer Research and Development (JCRD), 2020.

SCIS'19

[slides] [code]
Memory System Optimization for Graph Processing: A Survey
面向图计算的内存系统优化技术综述
Authors: Jing Wang, Lu Zhang, Pengyu Wang, Jiahong Xu, Chao Li, Haojin Zhu, Xuehai Qian, Minyi Guo.
Journal: 中国科学信息科学, Science China Information Science (SCIS), 2019.

Short Paper/ Research Poster

Conference
Poster Content
PPoPP'23

[poster] [pdf] [code]
Program CoWalker: High-Throughput GPU Random Walk with Fine-tuned Concurrent Query Processing
Authors: Cheng Xu, Chao Li, Pengyu Wang, Xiaofeng Hou, Jing Wang, Shixuan Sun, Minyi Guo, and Dongbai Chen, and Xiangwen Liu.
Conference: International Conference for Principles and Practice of Parallel Programming (PPoPP), research poster, 2023.

SC'23

[poster] [content] [code]
ParLeiden: Boosting Parallelism of Distributed Leiden Algorithm on Large-scale Graphs
Authors: Yongmin Hu*, Jing Wang*, Cheng Zhao, Yibo Liu, Cheng Chen, Xiaoliang Cong, Chao Li.
Conference: International Conference for High Performance Computing, Networking, Storage, and Analysis (SC), research poster, 2023.

SC'21

[poster] [content] [code]
Fargraph: Optimizing Graph Workload on RDMA-based Far Memory
Authors: Jing Wang, Chao Li, Taolei Wang, Lu Zhang, Pengyu Wang, Junyi Mei, and Minyi Guo.
Conference: International Conference for High Performance Computing, Networking, Storage, and Analysis (SC), research poster, 2021.

Benchconcil'19

[program] [pdf] [code]
Exploiting parallelism, sparsity and locality to accelerate matrix factorization on x86 platform
Authors: Weixin Deng, Pengyu Wang, Jing Wang, and Chao Li.
Conference: International Symposium on Benchmarking, Measuring and Optimizing (BenchCouncil), Stutent Challenge, First Prize, 2019.

CCCF'23

[pdf] [code]
Translation of "Taming Memory With Disaggregation", 驾驭内存:分离式内存
Authors: Weixin Deng, Pengyu Wang, Jing Wang, and Chao Li.
Conference: 中国计算机协会通讯,CCCF, 2023.

Ongoing Papers

Conference
Paper Content
Arxiv

[pdf]
Survey of Disaggregated Memory: Cross-layer Technique Insights for Next-Generation Datacenters
Authors: Jing Wang, Chao Li, Taolei Wang, Jinyang Guo, Hanzhang Yang, Yiming Zhuansun, Minyi Guo.

Arxiv

[pdf]
On The Design of a Light-weight FPGA Programming Framework for Graph Applications
Authors: Jing Wang,Jinyang Guo, and Chao Li.