top of page
1. Processing-in-Memory for Data-Intensive Applications, From Device to Algorithm
Over the past decades, the amount of data that is required to be processed and analyzed by computing systems has been increasing dramatically to exascale. However, the inability of modern computing platforms to deliver both energy-efficient and high-performance computing solutions leads to a gap between meets and needs. 
Unfortunately, such a gap will keep widening mainly due to limitations in both
devices and architectures. First, at the device level, the computing efficiency and performance of CMOS Boolean systems are beginning to stall due to approaching the end of Moore's law and also reaching its power wall (i.e., huge leakage power consumption limits the performance growth when technology scales down). Second, at the architecture level, today's computers are based on Von-Neumann architecture with separate computing and memory units connecting via buses, which leads to memory wall (including long memory access latency, limited memory bandwidth, energy-hungry data transfer) and huge leakage power for holding data in volatile memory.

Motivated by the aforementioned concerns, our group research is focused on Hardware and Software Co-design of Energy-efficient and High-Performance Processing-in-Memory (PIM) Platforms based on Volatile and Non-Volatile memories, leveraging innovations from device, circuit, architecture, and Algorithm to integrate memory and logic to break the existing memory and power walls and dramatically increase computing efficiency of today’s non-Von-Neumann computing systems.


Our 1.23-GHz 16-kb Processing-in-Memory Prototype [ESSCIRC'22]

2. Integrated Sensing and Normally-off Computing for Edge Imaging Systems
Internet of Things (IoT) devices are projected to exceed $1000B by 2025, with a web of interconnection projected to comprise approximately 75+ billion IoT devices. The large number of IoTs consists of sensory imaging systems that enable massive data collection from the environment and people. However, considerable portions of the captured sensory data are redundant and unstructured. Data conversion of such large raw data, storing in volatile memories, transmission, and computation in on-/off-chip processors, impose high energy consumption, latency, and a memory bottleneck at the edge. Moreover, because renewing batteries for IoT devices is very costly and sometimes impracticable, energy harvesting devices with ambient energy sources and low maintenance have impacted a wide range of IoT applications such as wearable devices, smart cities, and the intelligent industry. This project explores and designs new high-speed, low-power, and normally-off computing architectures for resource-limited sensory nodes by exploiting cross-layer CMOS/post-CMOS approaches to overcome these issues.
Our group mainly focuses on two main research directions: 
(1) design and analysis of a Processing-In-Sensor Unit (PISU) co-integrating always-on sensing and processing capabilities in conjunction with a Processing-Near-Sensor Unit (PNSU). The hybrid platform will feature real-time programmable granularity-configurable arithmetic operations to balance the accuracy, speed, and power-efficiency trade-offs under both continuous and energy-harvesting-powered imaging scenarios. This platform will enable resource-limited edge devices to locally perform data and compute-intensive applications such as machine learning tasks while consuming much less power than present state-of-the-art technology; and 
(2) realizing the concept of normally-off computing and rollback by integrating the non-volatility and instant wake-up features of non-volatile memories. This allows PISU to first keep data available and safe, even after a power-off, and then return to a previously valid state in the case of an execution error or a power failure.
3. Rethinking Hardware Security Solution for Emerging Non-volatile Memories
The non-volatile PIM platforms act as a double-sided sword for hardware security. On the one hand, NVMs impose new security challenges, which can be leveraged by adversaries to launch new attacks. For instance, considering spin-based devices, without any circuit modifications an adversary can run Trojan-like attacks by manipulating the magnetic field or thermal conditions or through picking up power signature by the side-channel attack to reverse engineer the in-memory operation. On the other hand, such PIM designs can facilitate the introduction of new hardware security solutions and countermeasures. In particular, we explore the new PIM-enhanced circuit/architecture security solutions for circuit obfuscation and side-channel attack prevention. Our research entails two main directions: 
(1) investigating various attack models and how to leverage various PIM designs circuit/architecture approaches for security vulnerabilities mitigation; and
(2) exploring how to leverage various PIM designs based on VMs and NVMs to efficiently develop an ultra-parallel in-memory technique for hardware security.


4. Adaptive Learning for Edge AI Computing IoT Systems
With the recent breakthroughs of AI almost simultaneously in all its sub-fields, such as machine learning, computer vision, speech recognition, natural language processing, and robotics, etc., the powerful AI-based data processing methodologies are widely deployed to IoT systems that could effectively analyze big data collected by distributed IoT nodes to gain a better understanding about these data and provide intelligent decisions without too much human intervention. However, due to resource-limited processors in IoT nodes, the sensed data inevitably are sent back to back-end cloud or edge servers for AI-based data processing, which may cause big data to accumulate in back-end servers, long response time due to the distance between IoT devices and back-end server, network congestion, and unreliability when a large amount of IoT devices are connected, security concerns during data transmission, etc. Furthermore, in typical IoT systems, each IoT node is distributed at different locations, taking inputs from a different environment, dealing with dynamic user requirements/patterns, and interacting with the ever-changing physical world. It then requires the AI in the IoT system to be domain-specific and continuously learn from its working environment. Considering these challenges and requests, our group explores a new IoT edge-server collaborative AI computing system where DNN inference computation is deployed in the local IoT edge, while novel domain adaptive and continuous learning methodologies are incorporated in the back-end server for the training of individual IoT domains. Our research entails two main directions:
(1) energy- and cost-efficient DNN accelerator deployment into IoT node for inference computing using a new automated weight compress technology; and
(2) novel domain-adaptive and continuous learning without catastrophic forgetting AI algorithms in the back-end server, which works as long as there are representative input data of the target domain no matter they are labeled or not.
bottom of page