In this episode, I will dive into the world of hardware product development and explore the crucial role of second source components. When creating a hardware product, it’s common to rely on various individual components that serve as the building blocks of the final device. However, factors such as component shortages or end-of-life cycles can disrupt the supply chain and require the need for substitute components. I discuss the challenges that arise from these situations and share valuable insights on managing, choosing, and validating substitute components before integrating them into your product. I will touch the risks involved and strategies to ensure a smooth transition to second source components, minimizing disruptions and maximizing product stability.
Once, while working on a product where we had a small electromechanical relay that provided customizable digital output for users. They could use it to operate external systems by connecting a pair of contacts. We chose a tiny 5 volt relay for this application to fit into our compact enclosure. As we figured out later, this was a pretty rare type of relay. And at some point of time we faced the shortage of this relay from our supplier. We managed to find another type of relay from another manufacturer and used it as a substitute in the product. The relay we chose was a rare type and, at some point, we faced a shortage from our supplier. After studying the relay’s datasheet, we realized that five volts was at the very low edge of operational voltage for this relay and probably in some conditions there was not enough power to drive this relay properly.
The lesson I learned from this experience is that it’s important to properly manage, choose, and validate substitute components before introducing them into a product.
When creating a hardware product, you rely on many various individual components which are building blocks of your final device. Sometimes these components reach the end of their life cycle and they need to be replaced with a substitute. Or you might face a shortage of components when suppliers can no longer provide them in enough quantities or lead time increases dramatically. All this means that you need to have a plan B.
And what is your plan B and when do you need to prepare it? For hardware products plan B is to use and manage second source components.
Second source components are essentially different versions or equivalents of the original component, provided by alternative manufacturers. These components serve the same function or purpose as the original, but may have variations in specifications, features, or implementation details. They are intended to be compatible replacements for the original component, allowing the product to continue functioning without supply chain disruptions caused by component unavailability or shortages.
The procurement team is responsible for handling these situations. They purchase components, track their life cycles, work with suppliers, forecast and mitigate potential component problems. They perform these tasks during the entire life cycle of a product. During the design phase it is a good idea to avoid dependency on the single component. During the mass production phase you need to be able to purchase a high volume of the components with a specified rhythm.
When the procurement team identifies a component that might become unavailable, they propose substitute options. However, since they lack detailed knowledge of the device’s operations, they can only compare datasheets. This is not enough to make a decision if the substitute is acceptable or not. This means that someone else needs to perform some extensive validations to make a conclusion.
Usually the hardware team validates these proposals and assesses if the proposed substitute fits your product. They conduct multiple specific low-level tests to ensure the new component delivers the same functionality in context of your application. For example, when dealing with power supply converters, they check stability of output voltage, performance under load, heat generation, and more. When replacing a DDR memory chip, they check timings and frequencies, signal integrity, eye mask parameters, etc. If the new DDR chip will work at a new clock frequency, they will validate how it interferes with other components in your device. Additionally, they verify if software can support new components and ensure that there is no regression in overall system stability.
Since we are talking about the substitutes, such replacements bring us risks. What kind of risks are we talking about? The hardware team has limited knowledge about all firmware implementation aspects and they can perform only E2E high-level validation of the firmware. So to properly test the device you will want to involve all other QA teams to validate the system’s behavior. Proposed substitutes might operate almost like original components but still have some difference in details of implementation of some features.
Here is an example.
In one device we were required to replace a flash memory chip which stores a lot of configuration information for one of the software modules. Procurement team came up with a proposed replacement, the hardware team validated it and assessed that the system with the new flash chip worked properly. So we started to produce devices with this chip. Later on we noticed a degradation in the field. This software component was not performing properly for some users. After the investigation, we identified that the software module was failing to read all parameters fast enough so timeout occurred. We figured out that software was configured to work with a flash chip in Double I/O mode. This was enough to read all information from the original chip, but the new chip was performing a bit slower in this mode and we started to face timeouts in some cases. Luckily, we were able to update software to use Quad I/O mode for the flash chip which doubles the read throughput and this allowed us to fix the problem for users in the field.
These differences in substitutes can have a significant impact on your product. You can do rough estimation by analyzing datasheets of the individual component but the final conclusion can be made only after a full regression cycle of your device in all operation modes and environments. The problem here is that these regression cycles might be time consuming and quite expensive. Validation cycle can take weeks and sometimes you can’t wait that long. In cases when your QA team is located far away from the factory, prototype shipping might cause additional delays in the overall validation process. Does it make sense to do a full regression test cycle for every component that you are supposed to change? This may not be practical or efficient and there is a temptation to skip testing activities for some components, especially for passive ones. You might just not have enough resources to run all these tests. So you need to find the balance.
It’s crucial to adopt a strategic approach here. Instead of testing each component individually, a viable option might be to run a full regression cycle only for every batch of devices that include any new components compared to previous batches. This will help you to optimize costs in case you often have multiple substitutes to validate. Also, these batch tests are important even if you previously validated each included component separately.
Why? Because components might interfere with each other. Here is an example.
We were testing a new release candidate firmware in a new pool of Beta Trials devices. Metrics showed us that about 50% of devices went offline after the over-the-air-update. This issue hadn’t occurred during our validation process before, so we were uncertain about the root cause. We tried multiple assumptions but failed to reproduce this issue on our end. After additional investigation and trial-and-error we identified that the problem is happening only on new hardware revision devices manufactured recently. We didn’t have them in our test lab yet so we had no chance to find this issue earlier.
Further investigation revealed that in new hardware revision two components, a diode and a low-dropout regulator, had been replaced with second-source alternatives. Individually, the system worked fine with either replacement, but using both replacements simultaneously caused boot failures under high temperatures. What we learned from this case is that we need to validate not only each component individually but their combinations as well.
Therefore, to mitigate risks it’s important to perform a full regression cycle for each hardware revision before releasing it to customers. Also it is a good strategy to rollout new hardware revision slowly to be able to monitor how it performs in real-world conditions and use-cases. Slow rollout will help you to react fast to stop the bleeding and prevent failures at a scale.
I hope all this was helpful. Please let me know what your experience with second source components is.
Leave a Reply