The Decadal Plan for Semiconductors is a call for innovative solutions in semiconductor design and manufacturing to suit the growing needs of information and communication. It identified five critical problems that could be of further research while reaching the end of Moore's law.

The Decadal Plan website can be found here: https://www.src.org/about/decadal-plan/.

Analog Hardware

Real-world sensory inputs are analog; modern AI/robotic systems absorb outside information and perform calculations on them. In the past, computation (compute per dollar) grows exponentially with the exponential growth of data, so we are not having much trouble. However, with the end of Moore's law, the plethora of information is almost certainly guaranteed to obliterate the computation power we have.

The human brain does not work with a lot of data. Instead, it focuses on what is important, known as the attention mechanism as understood by psychology and has been exploited by machine learning algorithms. This mechanism has made machine learning algorithms perform much better, but such attention pruning happens after all data is collected and stored, which consumes lots of bandwidth, storage, and computation.

This section has proposed that attention pruning can be introduced at the point when data is collected with analog computation. Analog computation has the advantage of low power consumption. They can also provide more computation power when used to operate on multi-bit analog data or massive parallelism when used together with things such as in-memory computing. More importantly, analog computation fits naturally when the sensory data is inherently analog.

Analog computation was nothing new. It began around the same as digital computing. Simple adders and multiplexers can be made out of operational amplifiers and transistors. However, they soon fell out of favor because the result is usually not accurate due to interference and noise. This issue grows heavier as we move to higher-resolution numbers, such as 32-bit and 64-bit integers and floating numbers. Analog computing needs some hard work to get to 16-bit resolution, and practically impossible, as seen today on 32-bits. With the growth of machine learning, however, accuracy gets less critical in recent machine learning computations: Some models even work with 8-bit or 4-bit resolution numbers, and we saw analog in-memory computing modules performing batch matrix-vector multiplications.

This agenda proposes more investment in analog computing to cope with the growing amount of data and communication needs.

New Memory and Storage Solutions

The idea stated in the previous section may reduce the amount of data we need to save, but before that is fully realized, considering data scientists will still need the full amount of data to study them (such as studying how to prune them,) the demand for more storage is not going anywhere soon.

This idea seems less imminent since storage technology is still scaling: DDRs are getting denser with more levels of trenches, SSDs are getting higher bit densities with 3D stacking, and even hard drives are getting a boost with long-advertised HAMR technologies (that has yet to deliver, though, after all these years ...) However, as calculated by this document, if we want to store all the data we produce on NAND flashes, the amount of silicon we need will outrun the total silicon supply. I do not fully agree with this point: Not all data need to be stored, such as Instagram stories or surveillance camera footage. For cold data, we have alternate storage solutions such as hard drives or even tapes.

The document then called for innovative and efficient memory that is fast, dense, and cost-effective. There is also the need for memory to support emerging architectures such as quantum computers. Unfortunately, we did not see any of the current solutions providing such a benefit. DNA storage, for example, provides high density and may get cost-effective in the future (not much knowledge in that area), but it is very slow and prone to data loss.

The final point to point out is that this has less to do with Moore's law.

Communication Capacity

Communication comes right after storage. Data need to be transferred to where they need to be stored and delivered to where they are wanted. More data means more communication. This will not only lead to a need for higher bandwidth but also the need for a way to transfer data in an energy-efficient way.

The first way out is obvious: Edge computing and solution, so the data does not need to be moved that far and that often. This leads to more topics in network routing and discovery, distributed computation algorithms, and information security challenges.

The second evident approach is to increase the bandwidth. The total worldwide bandwidth is continuing to grow. Though it might become hard to increase the bandwidth over a single link, most network applications are scale-out network systems designed to be more and more distributed, load-balanced, and redundant, so that is not really a problem.

One final point they mentioned is the advancement in wireless communication. Wireless performance can be improved by increasing the frequency (all the way to the THz range) as well as increasing the number of antennas. I am not a great fan of such ideas: Wireless communication is costly compared to wires communication in terms of money and energy, they are not as reliable, they may have security problems, and they may cause electromagnetic interference issues. 5G is already overkill for most mobile phone users. Industrial wireless solutions exist, but more of them focus on reliability or reachability, not bandwidth or speed.

Security

The fourth challenge is security. Complex systems exhibit more weak points or explorable subsystems, such as cache, prefetching, and speculative execution. But more importantly, the prevalence of AI and computing has brought more problems.

AIs are not trustworthy. They have no guarantee of correctly identifying examples they have not seen before, and they could be deceived by a couple of stickers and think a traffic sign is something the attacker wants it to see and completely different. More intelligent security solutions might be needed for such attacks.

The migration to cloud computing is also posing a risk to security. A multitenant data center environment has lots of unknown actors on the same network, and sometimes you do not even want to trust the cloud provider. We have no way of making sure the calculation results from the cloud server are correct or have not been tampered with if we do not perform some cryptographic verification. Several security enclaving techniques and trusted execution environments have been proposed, but few of them stayed secure with side channels and various other kinds of attacks.

Security is a big, big topic, and it is never a new problem, but the problem does not seem to grow bigger with the end of Moore's law, yet we still have to keep it in mind.

Energy-Efficient Computing

Dennard scaling ended long before the coming end of Moore's law. If we want more calculations, we have to pay more energy, and the amount of energy paid to calculation is already a significant amount of energy generation we generate. If we want to have more computation, with the current level of energy consumption, we may soon run to the cap.

The report proposes to alter the trajectory of computing and find completely different solutions, yet it is completely unknown what solution might be out there. Quantum computing is an entirely different architecture, and we do not really have an idea how much energy they would consume once they are fully placed into applications.

Conclusion

The Decadal Plan for Semiconductors is a document guiding where the silicon industry should proceed, reaching the end of Moore's law. We should seek to either grow computation without relying on Moore's law or strategically perform less computation to achieve our needs. Moreover, there are lots of challenges above Moore's law that we also need to pay attention to.

References

[1]
“List of References,” Coursera. Available: https://www.coursera.org/learn/rf-mmwave-circuit-design/home/welcome. [Accessed: Jun. 04, 2024]
[1]
D. M. W. Leenaerts, J. van der Tang, and C. S. Vaucher, Circuit design for RF transceivers. New York: Springer, 2011.
[1]
T. H. Lee, The design of CMOS radio-frequency integrated circuits, 2. ed., 7. printing. Cambridge: Cambridge Univ. Press, 2009.
[1]
B. Razavi, RF microelectronics, 2nd ed. Upper Saddle River, NJ: Prentice Hall, 2012.
[1]
Y. Yu, P. G. M. Baltus, and A. H. M. Van Roermund, Integrated 60GHz RF Beamforming in CMOS, vol. 1. in Analog Circuits and Signal Processing, vol. 1. Dordrecht: Springer Netherlands, 2011. doi: 10.1007/978-94-007-0662-0. Available: https://link.springer.com/10.1007/978-94-007-0662-0. [Accessed: Jun. 04, 2024]
[1]
Y. (Yikun) Yu, “Design methods for 60GHz beamformers in CMOS.” Technische Universiteit Eindhoven, 2010. doi: 10.6100/IR691208. Available: https://research.tue.nl/en/publications/design-methods-for-60ghz-beamformers-in-cmos(29a1aea8-7d5a-465b-9577-fb2bbdc99372).html. [Accessed: Jun. 04, 2024]
[1]
“Uniform interface is not flexible enough to handle complex and mixed network traffic.”
[1]
“one uniform die-to-die interface, which severely limits flexibility.”
[1]
“restricts cache coherence of an application or page to a subset of core.”
[1]
“no inter-node ordering requirements.”
[1]
“consuming a high priority packet is never dependent on lower priority traffic.”
[1]
“each consists of two 64-bit uni-directional links, one in each direction.”
[1]
“the L1.5 does not cache instructions–these cache lines are bypassed directly between the L1 instruction cache and the L2.”
[1]
“a write-back layer.”
[1]
“Rather than modifying the existing RTL for the L1s, we introduced an extra cache level (L1.5) to tackle both issues.”
[1]
“OpenPiton uses the OpenSPARC T1 [58] core with minimal modifications.”
[1]
J. Balkind et al., “OpenPiton: An Open Source Manycore Research Framework,” in Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, Atlanta Georgia USA: ACM, Mar. 2016, pp. 217–232. doi: 10.1145/2872362.2872414. Available: https://dl.acm.org/doi/10.1145/2872362.2872414. [Accessed: May 24, 2024]
[1]
Y. Feng, D. Xiang, and K. Ma, “Heterogeneous Die-to-Die Interfaces: Enabling More Flexible Chiplet Interconnection Systems,” in 56th Annual IEEE/ACM International Symposium on Microarchitecture, Toronto ON Canada: ACM, Oct. 2023, pp. 930–943. doi: 10.1145/3613424.3614310. Available: https://dl.acm.org/doi/10.1145/3613424.3614310. [Accessed: May 21, 2024]
[1]
“exploration sequence NG1.”
[1]
“returns a candidate couple (un, vn) to be checked for the feasibility or a null couple ( ,  ) if there are no more couples to explore.”
[1]
“Ft.”
[1]
“Fs.”
[1]
“feasibility rules Fs and Ft.”
[1]
“if two nodes can be matched in a consistent mapping, they must be in the same class.”
[1]
“node explo- ration sequence NG1.”
[1]
“avoiding also consistent states that surely will not be part of a solution.”
[1]
“exploration of only consistent states, i.e. states satisfying the constraints of the subgraph isomorphism problem.”
[1]
“making the state space a tree.”
[1]
“State Space Representation (SSR).”
[1]
“partial mapping M˜(s).”
[1]
“consistent state.”
[1]
“adding a new pair of nodes (un, vn) to the partial mapping of the current state sc so as to generate a new state sn = sc ∪ (un, vn), that becomes the new current state.”
[1]
“In the case of the subgraph isomorphism, as detailed in [7,9,18], the function M must be injective and structure preserving, i.e. it must preserve both the presence and the absence of the edges between corresponding pairs of nodes.”
[1]
“Introducing VF3: A New Algorithm for Subgraph Isomorphism.”
[1]
“a greedy algorithm called GreatestConstraint- First to find a good sequence of vertices μ.”
[1]
“the order in which vertices of the pattern are matched is crucial to speeding up the pruning process.”
[1]
“reduce the search space.”
[1]
“In this paper we present a novel subgraph isomorphism algorithm, called RI (http://ferrolab.dmi.unict.it/ri.html).”
[1]
“Contribution.”
[1]
“Note that there may be an edge (u’, v’) is Î E’ without any corre- sponding edge in E; when this happens, the subgraph isomorphism is also called a monomorphism.”
[1]
“This paper introduces a new algorithm for the subgraph isomorphism problem and compares it on synthetic and biochemical data with the most efficient and recent algorithms present in literature [3,29,30].”
[1]
“The authors in [3] and related publications show their speedup compared to the algorithm in [1] which is used in [5,6].”
[1]
V. Bonnici, R. Giugno, A. Pulvirenti, D. Shasha, and A. Ferro, “A subgraph isomorphism algorithm and its application to biochemical data,” BMC Bioinformatics, vol. 14, no. 7, p. S13, Apr. 2013, doi: 10.1186/1471-2105-14-S7-S13. Available: https://doi.org/10.1186/1471-2105-14-S7-S13. [Accessed: May 10, 2024]
[1]
“makes the yield and wafer costs of the interposer much better.”
[1]
“average area per transistor and gate, and the defect density.”
[1]
“number of metal layers and cost per additional layer.”
[1]
“choice of process technology.”
[1]
“Rent’s.”
[1]
“Industry analysts have observed a rise in the design cost of a standard SoC by 2.7x between 28nm and 14nm designs, and anticipate a further increase to 9x, over $270 million, from 28nm to 7nm [9].”
[1]
“TSVs will block off active device area, so a given partitioned die will need slightly more area for the X number of TSVs: A3D = Adie + XT SV AT SV .”
Last updated on 2023-05-14