GPU | SEMI

GPU

ESD Alliance_SEMI Q&A Blog Post_D2S Image

February 23, 2026

Meet D2S’ Aki Fujimura Whose Career Spans Manhattan to Curvilinear Design

By Bob Smith

Aki Fujimura has been at the forefront of chip design innovations from the beginning of his career and his technology leadership continues today. He serves as Chairman and CEO of D2S, co-founder of the eBeam Initiative, President of BACUS, and a Governing Council member of the ESD Alliance, a SEMI Technology Community. At Tangent (now Cadence), Fujimura and Steve Teig (a chip designer for the last 20 years and now Vice President and Distinguished Engineer at Amazon) built the first commercial over-the-cell routing system dedicated to fully synchronous designs with timing assurance and automated test-scan insertion. Fujimura and Tom Kronmiller developed LEF/DEF for efficient representation of Manhattan routing, both used as standards in the automated place and route (P R) flow to this day. He again teamed with Teig and Kronmiller to develop the X Architecture, an interconnect architecture based on the pervasive use of 45o diagonal routing. I was thinking about his background as I called him to chat about his evolution from chip design before focusing on chip manufacturing via eBeam technology at D2S.Smith: Let’s talk about your journey from focusing on how to do physical design of chips to chip manufacturing. How did this happen?Fujimura: GPUs weren’t a thing until late 1990s. With CPUs, Manhattan design was the obvious choice for computational efficiency. Largely gridded metal n that went up and down, and metal n+1 that went left and right with vias to connect the line segments were how all automated layout worked. PCB routing and packaging (even back then) used diagonal routing and even curved routing. But chip P R was all Manhattan. That was still true when we worked on the X Architecture at Simplex Solutions (now Cadence). ATi (now inside AMD), NVIDIA and several other GPU companies started in the late 1980s to 1990s, but they were targeting video and gaming more than scientific computing at the time. It’s when Teig came up with the idea for the X Architecture that he wanted to know if 60-degree routing was possible “because a hexagon tessellates a plane.” A good question. I set out to try to find out what the actual limits were in manufacturing that create the limitation to Manhattan shapes. I got introduced to the late Bill Arnold of ASML, who then introduced me to a lot of people in manufacturing who helped me get the answer. Naoya Hayashi of DNP was instrumental in helping me understand that mask making is where the limit exists. Hayashi-san kindly explained to me about the two mask writers. I had to dig around a lot more to make sure that that was the only barrier, but that’s how I came to understand that before masks, everything is data, and after masks, everything is physical. Mask making is the key that enables 45 degrees, but not 60 degrees. The lessons I learned then are still very important to me today. That’s when I saw and appreciated the opportunity there is for software for semiconductor manufacturing.Smith: But you still couldn’t use GPUs for the X Architecture work?Fujimura: Right. Way too early. The idea that GPU-accelerated gaming machines can be connected together to do video editing, or that large scientific simulations can be done on a connected set of gaming machines, was being explored in the 1990s already. It was only 20 years ago (2006) when Jensen Huang announced his bet with the CUDA software stack for general purpose GPUs (GP GPUs) for nodes in racks of CPUs, GPUs, memory and communication to create the modern scientific computer. Six years later in 2012, AlexNet won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) with CUDA, and the rest is history. But no, we didn’t use GPUs at Simplex. But we did help design GPUs, including with the X Architecture.Editor’s Note: ILSVRC evaluates algorithms for object detection and image classification at large scale. Smith: Now, everything you do at D2S is with GPU acceleration. When and how did that change come about?Fujimura: It was back in 2009, two years after D2S was founded. An extraordinary engineer, Harold Zable, noticed that simulation-based manipulation (rather than rules-based manipulation) of mask shapes, both for wafer manufacturing and for mask manufacturing, would be the ideal application for GPU acceleration. Fast-Fourier Transforms (needed for lithography simulation and optical proximity correction (OPC)/inverse lithography technology (ILT)) and Gaussian manipulations (needed for eBeam mask simulation and mask process correction (MPC) are nearly “free” in terms of compute time on GPUs. You still have to get the data in and out efficiently, but you can do pretty sophisticated computing without much overhead. At the same time, multi-beam based eBeam writing was getting momentum, first in wafer direct write applications. In 2007, at the BACUS conference in Monterey, Calif., IMS—then a well-respected research organization in Vienna—published a paper saying that multi-beam for mask writing is what they’d like to do. The wafer market is much bigger, but this technology is more suited for mask writing, where write times are measured in hours per mask. “Wafers Per Hour” is the measure in wafer manufacturing, so mask writing gets to flip the division. We were looking at a mask design and mask manufacturing world that should be doing simulation-based manipulation rather than rule-based. That’s better with GPUs. On top of that, maybe the world is going to go to multi-beam writing, going away from four decades of variable-shaped beam (VSB) writing. And I knew from the X Architecture experience that VSB was the only thing in the eco-structure that restricted mask shapes to be Manhattan or 45 degrees. In fact, with multi-beam, any curvilinear shape within the limits of resolution of a given pixel size can be freely written on the mask. The only barrier then to having curvilinear masks would be the software stack and trying to compute it with CPUs only. We knew GPU acceleration was the answer. Smith: Was it just totally an accident that multi-beam and GP GPUs happened at the same time?Fujimura: Yeah, it was. However, just as when multiple people simultaneously invent the same thing without knowing about each other, the environment and times in which we live have a lot to do with this. So, I guess, it’s not really just “luck.” But GP GPUs in 2006 and IMS Multibeam in 2007, I think that’s luck.Anyway, D2S became the GPU-acceleration partner for the semiconductor manufacturing industry and decided to work only on things that can be accelerated by GPUs in 2012.Smith: What trends do you see going forward in the next three to five years?Fujimura: A move toward curvilinear mask features, as well as an increased interest in curvilinear wafer targets as designers become aware that the manufacturing side has established a solid path for curvilinear mask shapes. We’re leaving a lot of margin on the table to accommodate gridded Manhattan assumptions, and that’s really no longer necessary from a manufacturing standpoint. I think electronic design automation (EDA) should be working on enabling curvilinear designs, because the door is open for the design world to explore curvilinear chip design and to reap compelling benefits in terms of power/performance and reliability.Editor’s Note: While Manhattan geometries are rectilinear shapes aligned to vertical and horizontal axes, curvilinear design introduces smooth, continuous curves into layouts and masks, leveraging advanced computational lithography and mask-writing technologies. This improves pattern fidelity, electrical performance and manufacturability at advanced technology nodes.About Aki FujimuraAki Fujimura is chairman and CEO of D2S, Inc., and managing company sponsor of the eBeam Initiative. Previously, Fujimura was CTO at Cadence Design Systems, President/COO and inside board member of Simplex Solutions, and VP and inside board member at Pure Software. He co-founded Tangent Systems (acquired by Cadence).Fujimura, made a SPIE fellow in 2023, serves as President of the SPIE BACUS Technical Group. He serves on the governing council of the ESD Alliance, a SEMI Technology Community. Fujimura was on the board of HLDS, RTime, Bristol, S7, and Coverity, Inc.Fujimura received his BSEE and MSEE degrees from MIT.Robert (Bob) Smith is executive director of the ESD Alliance, a SEMI Technology Community. 

ESD Alliance_SEMI Q&A Blog Post_D2S Image

Meet D2S’ Aki Fujimura Whose Career Spans Manhattan to Curvilinear Design

February 23, 2026

August 13, 2019

Breaking the Memory Wall: The AI Bottleneck

By Michael Hall

In the long unfolding arc of technology innovation, artificial intelligence (AI) looms immense. In its quest to mimic human behavior, the technology touches energy, agriculture, manufacturing, logistics, healthcare, construction, transportation and nearly every other imaginable industry – a defining role that promises to fast track the fourth Industrial Revolution. And if the industry oracles have it right, AI growth will be nothing shy of explosive.“The gains these days are not incremental,” said Ajit Manocha, SEMI president and CEO, said to a gathering in July of the Chinese American Semiconductor Professional Association (CASPA) for its Summer Symposium at SEMI’s headquarters in Milpitas. “They are hockey stick – exponential – with AI semiconductors growing in market size from $4 billion this year to $70 billion in 2025.”Manocha left little doubt that AI is remaking the semiconductor industry and, in the process, the world at large. Internet of Things (IoT) and 4G/5G, both key AI enablers, will account for more than 75 percent of device connections by 2025.“Today, 30 billion devices worldwide are connected,” Manocha said, citing an Applied Materials prediction that the number of connected devices globally will grow to between 500 billion and 1 trillion by 2030. Those devices will generate stunning amounts of data collected, interpreted and used to reason, solve problems, learn and plan, leading to the holy grail of autonomous machine behavior.To process this colossal amount of data central to the promise of AI, the industry must break through the limits of a key technology: memory. Memory a Critical AI BottleneckThe challenge for memory starts with performance. Historically, every decade gains in compute performance have outpaced improvements in memory speed by 100 times, and over the past 20 years that gap has grown, said Steven Woo, a fellow and distinguished inventor at Rambus, presenting at the symposium. The upshot is that memory has bottlenecked compute and, in turn, AI performance. The industry has responded with new ways to implement memory systems on AI chips. Each is suited to unique performance requirements and, of course, comes with trade-offs. Among the frontrunners: On-chip memory delivers the highest bandwidth and power efficiency but is limited in capacity. HBM (High Bandwidth Memory) offers both very high memory bandwidth and density. GDDR balances trade-offs among bandwidth, power efficiency, cost and reliability. Since 2012, AI training capability has grown 300,000 times, besting Moore’s law by 25,000 times in doubling every 3.5 months, a blistering pace compared to the 18-month doubling cycle of Moore’s law, Woo said. The staggering improvements have been driven by parallel computing capacity and new application-specific silicon like Google’s Tensor Processing Unit (TPU).These specialized silicon architectures and parallel engines are key to sustaining future gains in compute performance and combatting the slowing of Moore’s Law and the end of power scaling, Woo said. By rethinking the way processors are architected for certain markets, chipmakers can develop dedicated hardware capable of operating with 100 to 1,000 times greater energy efficiency than general purpose processors to overcome another big limiter to scaling compute performance – power.For its part, the memory industry can improve performance by signaling at higher data rates and using stacked architectures like HBM for greater power efficiency and performance, and by bringing compute closer to the data.Memory scaling for AIA key challenge is scaling memory for AI. Demand for better voice, gesture and facial recognition experiences and more immersive virtual reality and augmented reality interactions is tremendous, said Bill En, senior director at AMD, speaking at the symposium. These capabilities require more processing power across both high-performance computing (HPC) for big data analytics and machine learning as it relies on AI and machine intelligence to generate meaningful insights. Emerging machine learning applications include classification and security, medicine, advanced driver assistance, human-aided design, real-time analytics and industrial automation. And with 75 billion IoT-connected devices – all generating data – expected by 2025, there will be no shortage of data to analyze, En said. The wings alone of a new Airbus A380-1000 feature some 10,000 sensors.Mountains of this data are stored in massive data centers on magnetic hard drives, then transferred to DRAM before moving to SRAM within the CPU for the handoff to the compute hardware for analysis.With data growing at an exponential clip, the question is how to make sure all other memory systems can handle the flood of data. AMD’s answer is a chiplet architecture featuring eight smaller chips around the edge that drive the compute and a large chip in the center that doubles the IO interface and memory capability to in turn double chip bandwidth.AMD has also moved from a legacy GDDR5 memory chip configuration to HBM to bring memory bandwidth closer to the GPU for more efficient processing of AI applications. The HBM provides much higher bandwidth while reducing power consumption. Compared to DRAM, AMD’s HBM delivers a much faster data rate and far greater memory density, En said.Over the next decade, look for more performance improvements from multi-chip architectures, innovations in memory technology and integration, aggressive 3D stacking and streamlined system-level interconnects, he said. The industry will also continue to drive performance gains in devices, compute density and power through technology scaling.Michael Hall is a global marketing communications manager at SEMI.

Breaking the Memory Wall: The AI Bottleneck

By Michael Hall

August 13, 2019

September 26, 2018

Tracking the Changing OSAT Market

By E. Jan Vardaman and Clark Tseng

Outsourced Semiconductor Assembly and Test (OSAT) service providers experienced strong growth in 2017, but will this growth continue? In the last few years, OSAT growth has been driven by shipments for packages found in smartphones, but this market is slowing. What will replace it? Growth in power devices is strong and electronic content in vehicles is increasing. Will OSATs participate in this growth? Many OSATs have plants dedicated to automotive package assembly and will see continued growth. Growing demand for connectivity everywhere, called IoT, is generating large amounts of data, creating the need for more servers and datacenters. The adoption of Artificial Intelligence (AI) across a broad range of applications is driving demand for high-performance packages, but will this assembly take place at the OSATs or foundries? In the third and fourth quarters of 2017, growth in cryptocurrency provided unanticipated revenue for a number of OSATs. Given that the most well-known crypto mining companies and the biggest mining pools are all based in China, several OSATs, including major Taiwanese and Chinese service providers, experienced revenue growth in 2017 directly attributed to the assembly of ASICs in flip chip scale packages (FC-CSPs) and GPUs in flip chip ball grid arrays (FC-BGAs) for the cryptocurrency market. However, the first and second quarter of this year has seen decreased demand for GPUs and ASICs for this application. The assembly of packages for cryptocurrency slowed considerably in the first half of the year and therefore can’t be counted on to add as much to the revenue base as in the previous year. Going into the latter half of the year, the demand for Crypto ASICs is expected to pick up as new generation of 7nm chips will drive new investment and replacement cycle while crypto-mining GPU will see a further decline. Three of the top 10 OSATs, Jiangsu Changjiang Electronics Technology (JCET), Tianshui Huatian Technology (Huatian), and Tongfu Microelectronics (TFME), are based in China. China’s share of the top 10 OSATs’ revenue increased from slightly less than 23 percent in 2016 to more than 25 percent in 2017, and this trend is expected to continue. Crypto-related packaging and test business has certainly contributed a big portion of the share gain. Major OSATs such as TFME and Tianshui Huatian plan expansion in their plants and they expect to fill this added capacity in a broad range of packages. Huatian’s new Nanjing plant will include assembly for memory packages. TFME plans to set up a plant in Xiamen, Fujian Province to provide bumping, wafer level packaging, and system-in-packaging (SiP) services. Tracking the capabilities of OSATs is increasingly important. SEMI and TechSearch International have introduced a new Worldwide OSAT Manufacturing Site Database that provides listings of OSAT facility locations and package and test options in each factory. This database indicates the specific packages offered at each location. Finding plants that offer automotive qualified assembly is also possible with the database. Companies that offer bumping and wafer level packaging are identified. Over 120 companies and 300 facilities are tracked in this database covering both OSAT packaging and test facilities. For additional information about this informative database, please visit https://discover.semi.org/osat-database-registration.html E. Jan Vardaman is president of TechSearch International, Inc., and Clark Tseng is director of Industry Research and Statistics at SEMI.

Tracking the Changing OSAT Market

By E. Jan Vardaman and Clark Tseng

September 26, 2018

September 14, 2018

Global Semiconductor Revenue Forecast Revised Upward to 15 Percent from 7.5 Percent for 2018, SEMI Reports

By Eugenia Liu

SEMI Releases latest update to World Fab Forecast with adjusted semiconductor revenue consensus for second-half 2018 and 2019 Global semiconductor revenue in 2018 is now expected to reach $473.8 billion and clock a growth rate of 15 percent, a significant upward revision from the 7.5 percent expansion (to $442.9 billion) forecast at the start of the year by six research and investment forecasts tracked by SEMI Industry Research and Statistics (SEMI IR S). Data center growth will remain robust in the coming quarters, fueling demand for memory devices. In addition, cloud computing will continue to spur strong CPU, GPU, networking, ASIC, and DRAM and NAND demand through 2019, driving a consensus 3.63 percent year-to-year growth to reach the semiconductor revenue of $491 billion in 2019. Fab equipment spending (new and used) for 2018 is expected to increase by 14 percent to a record high of $63 billion, according to the last data from the SEMI World Fab Forecast, published by SEMI IR S. For 2019, fab equipment spending (new and used) is expected to increase 8 percent to another record of just under $68 billion. Memory continues to be the biggest swing factor in fab spending in 2018 and is expected to lead growth into 2020. 3D NAND will see the most capacity added in 2018 and 2019 with growth of 41 percent in 2018 and 27 percent in 2019, according to the SEMI World Fab Forecast. DRAM investment will see even stronger growth in 2018 and 2019 driven by new capacity addition as well as the continued technology shrink towards 1y/1z nm. For the first half of 2018, global spending for semiconductor fab equipment continues its growth momentum from 2017. Though we expect some softness in the second half of 2018, the outlook for 2019 remains robust with a fourth consecutive year of growth – the first such run since the 1990s. This prolonged growth cycle has been propelled by memory and will be extended by significant investment in China in 2019. Although a potential slowdown in 2020 is a concern, the overall outlook for semiconductor demand remains solid due to broad-based growth trends in data center, artificial intelligence (AI)/machine learning (ML), automotive, and industrial segments. Following are other SEMI forecasts for fab spending. Installed Capacity 3D NAND will see the most capacity added in both 2018 and 2019 with growth of 41 percent in 2018 and 27 percent in 2019. Foundry capacity growth is steady at 3 percent in 2018 and 6 percent in 2019, driven by both leading-edge and trailing-edge capacity buildup. 200mm fab capacity will increase 4 percent in 2018 and 3 percent in 2019, fueled by demand for MCU, sensors, PMIC, MOSFET and Driver IC. New Facilities / Construction Spending In 2018, there are 72 construction projects with investments totaling $15 billion, a year-over-year increase of 23 percent. Construction spending will reach all-time highs with China continuing its lead at US$7 billion in 2018, shattering its own record of $6.3 billion investment in 2017. Most construction spending in 2018 will be for Memory (just under $9 billion), primarily for 3D NAND followed by DRAM. Foundry will log second place in construction spending at just under $5 billion. Fab Equipment Spending Fab equipment spending (new and used) for 2018 is expected to jump 14 percent to a record high of US$63 billion, flat from the forecast issued in June 2018. Equipment spending (new and used) for 2019 is expected to increase 8 percent to another record of just under US$68 billion, a downward adjustment from +9 percent published in June 2018. We believe equipment spending will remain healthy, driven by solid, broad-based demand and predictable technology investments on top of constructive SEMICAP equipment fundamentals. Activity Report The August report features 1,265 records including about 300 Opto- and LED-related facilities. We have made 223 changes related to 216 fabs/lines. The modifications include the addition of new records, changes to existing records, the deletion of records since the February 2018 World Fab Forecast report. We are tracking 103 future facilities/lines with various probabilities that will start volume production in 2018 or later. Download a sample report Not a subscriber? Please review SEMI fab databases listed below. Our databases deliver the latest forecast and a complete analysis of front-end fabs and foundries worldwide. They are ideal resources to empower your market research. Eugenia Liu is a Senior Product Marketing Manager at SEMI.

Global Semiconductor Revenue Forecast Revised Upward to 15 Percent from 7.5 Percent for 2018, SEMI Reports

September 14, 2018

SEMICON West Preview: Exponential Data Growth Drives Change in System Architecture

By Paula Doe

With artificial intelligence (AI) rapidly evolving, look for applications like voice recognition and image recognition to get more efficient, more affordable, and far more common in a variety of products over the next few years. This growth in applications will drive demand for new architectures that deliver the higher performance and lower power consumption required for widespread AI adoption. “The challenge for AI at the edge is to optimize the whole system-on-a-chip architecture and its components, all the way to semiconductor technology IP blocks, to process complex AI workloads quickly and at low power,” says Qualcomm Technologies Senior Director of Engineering Evgeni Gousev, who will provide an update on the progress of AI at the edge in a Data and AI program at SEMICON West, July 10-12 in San Francisco. Qualcomm Snapdragon 845 uses heterogeneous computing across the CPU, GPU, and DSP for power-efficient processing for constantly evolving AI models. Source: QualcommA system approach that optimizes across hardware, software, and algorithms is necessary to deliver the ultra-low power – to a sub 1-milliwatt level, low enough to enable always-on machine vision processing – for the usually energy-intensive AI computing. From the chip architecture perspective, processing AI workloads with the most appropriate engine, such as the CPU, GPU, and DSP with dedicated hardware acceleration, provides the best power efficiency – and flexibility for dealing with rapidly changing AI models and growing diversity of applications.“So far it’s been largely a brute force approach using conventional architectures and cloud-based infrastructure,” says Evgeni. “But we’re going to run out of brute force options, so future opportunities lie in developing innovative architectures, dedicated hardware, new algorithms, and new software. Innovation will be especially important for AI at the edge and applications requiring always-on functionality. Training is mostly in the cloud now, but in the near future it will start migrating to the device as the algorithms and hardware improve. AI at the edge will also remove some privacy concerns, an increasingly important issue for data collection and management.”Practical AI applications at the edge where resources are constrained run the gamut, spanning smartphones, drones, autonomous vehicles, virtual reality, augmented reality and smart home solutions such as connected cameras. “More AI on the edge will create a huge opportunity for the whole ecosystem – chip designers, semiconductor and device manufacturers, applications developers, and data and service providers. And it’s going to make a significant impact on the way we work, live, and interact with the world around us,” Evgeni said.Future generations of chips may need more disruptive systems-level change to handle high data volumes with low power A next-generation solution for handling the massive proliferation of AI data could be a nanotechnology system, such as the collaborative N3XT (Nano-Engineered Computing Systems Technology) project, led by H.S. Philip Wong and Subhasish Mitra at Stanford. “Even with next-generation scaling of transistors and new memory chips, the bottlenecks in moving data in and out of memory for processing will remain,” says Mitra, another speaker in the SEMICON West program. “The true benefits of nanotechnology will only come from new architectures enabled by nanosystems. One thing we are certain of is that massively more capable and more energy-efficient systems will be necessary for almost any future application, so we will need to think about system-level improvements.” Major improvement in handling high volumes of data with low high energy use will require system-level improvements, such as monolithic 3D integration of carbon nanotube transistors in the multi-campus N3XT chip research effort. Source: Stanford UniversityThat means carbon nanotube transistors for logic, high density non-volatile MRAM and ReRAM for memory, fine-grained monolithic 3D for integration, new architectures for computation immersed in memory, and new materials for heat removal. “The N3XT approach is key for the 1000X energy efficiency needed,” says Mitra.Researchers have demonstrated improvements in all these areas, including multiple hardware nanosystem prototypes targeting AI applications. The researchers have transferred multiple layers of as-grown carbon nanotubes to the target wafer to significantly improve CNT density and have also developed a low-power TiN/HfOx/Pt ReRAM. The low-temperature CNT and ReRAM processes enable multiple vertical layers to be grown on top of one another for ultra-dense and fine-grained monolithic 3D integration. Other speakers at the Data and AI TechXpot include Fram Akiki, VP Electronics, Siemens; Hariharan Ananthanarayanan, motion planning engineer, Osaro; and David Haynes, Sr. director, strategic marketing, Lam Research. See SEMICONWest.org.Paula Doe, SEMI

SEMICON West Preview: Exponential Data Growth Drives Change in System Architecture

June 26, 2018