CTS Interview Questions

Can an AND gate replace an ICG if glitch is handled

oIf used AND gate as gater, we need to make sure that, EN only switched when clock is low. Otherwise, it can create glitches. Let’s say this is taken care somehow.

  • But still enable can only switch after x time of clock falling edge and it should stay stable before y time of clock rising edge. If this is not taken care, it can still create issues. This is exactly setup and hold time checks of ICG cells.
  • ICG cells are specifically characterized in the library for clock gating checks (setup/hold on the enable relative to the clock). Standard AND/OR gates are not. Using a standard gate for clock gating bypasses these critical checks in STA tools.
  • PnR and CTS tools recognize ICG cells as clock gating elements and handle them correctly during clock tree building (e.g., balancing up to the clock input pin, recognizing the enable pin). They may not correctly interpret a standard AND/OR gate used for gating.
  • So, Even if glitches are taken care and EN signal is perfectly timed to avoide glitches, it bypasses standard clg checks so should not be used.

Command to balance two clock trees? Command to check latency? Command to do useful skew?

oBalance Two Clock Trees:

Innovus: create_skew_group -name <group_name> -clocks {<clock1> <clock2>}

ICC2: set_clock_balance_points can be used to define common points for balancing.

  • Check Latency:

Innovus/ICC2: The command report_clock_timing -type latency is used.

  • Do Useful Skew:

Innovus: Enabled by default during ccopt_design. Can be controlled with setUsefulSkewMode.

ICC2: clock_opt has options to enable useful skew,

Different CTS types? What are benefits of those?

1. Conventional / Single Point CTS

This is the standard approach used for lower-frequency designs with fewer “sinks” (flip-flops/registers).

  • Structure: It has a single clock source that distributes the signal to every corner of the design. The “point of divergence” (where the paths split) is right at the clock source.
  • Benefits:
    • High Power Efficiency: Because clock gating is typically done near the source, large sections of the tree can be shut off, saving significant dynamic power.
    • Simplicity: It is the easiest to implement using standard EDA tool flows.
  • Trade-offs: * OCV Sensitivity: Because the clock paths are largely “uncommon” (they don’t share much of the same wire/buffer path), manufacturing variations (OCV) affect each branch differently, leading to higher skew.
    • High Insertion Delay: The signal has to travel through many levels of buffers to reach the entire chip.

2. Clock Mesh Structure

This is the most robust structure, creating a dense grid of shorted wires driven by “mesh drivers.”

How can you manually tune the clock or force specific flops to have lower latency?

oCreate Specific Skew Groups: Place a few critical flops that need very low latency into their own skew group. The CTS tool will then build a dedicated, shorter path to balance just those flops.

  • Apply latency attributes: set_clock_latency -source -max <delay>, to a specific pin, tricking the tool into building a shorter path to that point to meet the tighter constraint.

How is clock gater cloning done?

oSingle ICG cell driving a large number of flip-flops (high fanout) is replicated into multiple identical ICG cells, each driving a smaller subset of the original flip-flops. All cloned ICG cells share the same input clock and enable signal.

Identify High Fanout ICGs: The synthesis or CTS tool identifies ICG cells whose fanout exceeds a certain threshold or which are causing timing/DRV issues due to high load.

Cluster Sinks: The flip-flops driven by the original ICG are spatially clustered based on their placement location.

How to build/synthesize the clock tree? What types of cells are used?

o Conventional CTS (Buffer/Inverter Tree): The most common approach. The tool starts from the sinks and works backward or starts from the root and works forward, clustering nearby sinks, inserting buffers/inverters to meet skew, latency, and DRC targets, and progressively building a tree structure. The exact topology isn’t strictly predefined but emerges based on sink locations and optimization goals. Modern tools use sophisticated algorithms (e.g., clock concurrent optimization - CCOpt) that optimize the clock tree and logic paths concurrently.