1. Processing-in-Memory for Data-Intensive Applications, From Device to Algorithm
Over the past decades, the amount of data that is required to be processed and analyzed by computing systems has been increasing dramatically to exascale. However, the inability of modern computing platforms to deliver both energy-efficient and high-performance computing solutions leads to a gap between meets and needs. 
Unfortunately, such a gap will keep widening mainly due to limitations in both
devices and architectures. First, at the device level, the computing efficiency and performance of CMOS Boolean systems are beginning to stall due to approaching the end of Moore's law and also reaching its power wall (i.e., huge leakage power consumption limits the performance growth when technology scales down). Second, at the architecture level, today's computers are based on Von-Neumann architecture with separate computing and memory units connecting via buses, which leads to memory wall (including long memory access latency, limited memory bandwidth, energy-hungry data transfer) and huge leakage power for holding data in volatile memory.

Motivated by the aforementioned concerns, our group research is focused on Hardware and Software Co-design of Energy-efficient and High-Performance Processing-in-Memory (PIM) Platforms, leveraging innovations from device, circuit, architecture, and Algorithm to integrate memory and logic to break the existing memory and power walls and dramatically increase computing efficiency of today’s non-Von-Neumann computing systems.
1.jpg
3.jpg
2.jpg
2. Low Power and Area-Efficient Processing-in-Sensor Unit for IoT
Nowadays, billions of interconnected smart IoT devices are gathering, analyzing, and distributing data all around the world. However, nearly 90% of the data generated by the IoT is not analyzed or processed, mainly due to the insufficient computing ability of area- and power-restricted small IoT device computing systems. In this domain, always-on smart-sensing visual systems are widely used in edges, such as wearable devices, etc. While low-power CMOS image sensors and processing circuits are both explored, the separate optimizations on these two components are still far from reaching the system power budget. This comes from the fact that the humongous amount of energy is consumed on analog-to-digital conversion (ADC) of raw images. 
Motivated by these challenges, our group develops novel processing-in-sensor platforms (ASIC/FPGA/Accelerator) to diminish the power consumption of data conversion/transmission by co-integrating image sensors and processing circuits in a single chip to generate the preprocessed information. 

Our group mainly focuses on two main research directions:
(1) exploring ultra-low-power area-efficient multi-mode Processing-in-Sensor Unit (PSU) based on non-volatile memories for next-generation embedded IoT computing system supporting edge extraction, DCNN's first layer processing, and imaging,
(2) realizing the concept of normally-off computing and rollback by integrating the non-volatility and instant wake-up features of non-volatile memories. This allows PSU to first keep data available and safe, even after a power-off, and then return to a previously valid state in the case of an execution error or a power failure.
11.jpg
3. Rethinking Hardware Security Solution for Emerging Non-volatile Memories
The non-volatile PIM platforms act as a double-sided sword for hardware security. On the one hand, NVMs impose new security challenges, which can be leveraged by adversaries to launch new attacks. For instance, considering spin-based devices, without any circuit modifications an adversary can run Trojan-like attacks by manipulating the magnetic field or thermal conditions or through picking up power signature by the side-channel attack to reverse engineer the in-memory operation. On the other hand, such PIM designs can facilitate introducing new hardware security solutions and countermeasures. In particular, we explore the new PIM-enhanced circuit/architectures security solutions for circuit obfuscation and side-channel attack prevention. Our research entails two main directions:
(1) investigating various attack models and how to leverage various PIM designs circuit/architecture approaches for security vulnerabilities mitigation, and
(2) exploring how to leverage various PIM designs based on VMs and NVMs to efficiently develop an ultra-parallel in-memory technique for hardware security.
4. Adaptive Learning for Edge AI Computing IoT Systems
With the recent breakthroughs of AI almost simultaneously in all its sub-fields, such as machine learning, computer vision, speech recognition, natural language processing, and robotics, etc., the powerful AI-based data processing methodologies are widely deployed to IoT systems that could effectively analyze big data collected by distributed IoT nodes to gain a better understanding about these data and provide intelligent decisions without too much human intervention. However, due to resource-limited processor in IoT node, the sensed data inevitably are sent back to back-end cloud or edge servers for AI-based data processing, which may cause big data accumulated in back-end servers, long response time due to the distance between IoT devices and back-end server, network congestion and unreliability when large amount of IoT devices are connected, security concerns during data transmission, etc. Furthermore, in typical IoT systems, each IoT node is distributed at different locations, taking inputs from different environment, dealing with dynamic user requirements/patterns, and interacting with ever-changing physical word. It then requires the AI in IoT system to be domain-specific and continuously learning from its working environment. Considering these challenges and requests, our group explores a new IoT edge-server collaborative AI computing system where DNN inference computation is deployed in local IoT edge, while novel domain adaptive and continuous learning methodologies are incorporated in the back-end server for the training of individual IoT domains. Our research entails two main directions:
1) energy- and cost-efficient DNN accelerator deployment into IoT node for inference computing using a new automated weight compress technology, and
2) novel domain-adaptive and continuous learning without catastrophic forgetting AI algorithms in back-end server, which works as long as there are representative input data of the target domain no matter they are labeled or not.