Sources:

  • TPU v1: In-Datacenter Performance Analysis of a Tensor Processing Unit. 2017.
  • TPU v2, v3: A Domain Specific Supercomputer for Training Deep Neural Networks.
  • AI Chips: Google TPU
  • HotChips 2019 Tutorial Cloud TPU: Codesigning Architecture and Infrastructure

Many architects believe that major improvements in cost-energy-performance must now come from domain-specific hardware.

Read more »

Sources:

  • Adjrej Karpathy's video Building makemore Part 4: Becoming a Backprop Ninja.
  • Paper: Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
  • PyTorch: BatchNorm1d
Read more »

Sources:

  • UWashington: CSE378, Lecture11
  • UWashington: CSE378, Lecture12

Note: the assembly code in this article can be MIPS or RISCV. This shouldn't be consufusing since the only big difference between them is that MIPS add a $ before the name of each register:

1
2
3
4
5
# MIPS:
add $t0, $t1, $t2 # add values in $t1 and $t2, the result is stored in $t0

# RISC-V
add t0, t1, t2 # add values in t1 and t2, the result is stored in t0
Read more »

Sources:

  • UWashington: CSE378, Lecture12

Note: the assembly code in this article can be MIPS or RISCV. This shouldn't be consufusing since the only big difference between them is that MIPS add a $ before the name of each register:

1
2
3
4
5
# MIPS:
add $t0, $t1, $t2 # add values in $t1 and $t2, the result is stored in $t0

# RISC-V
add t0, t1, t2 # add values in t1 and t2, the result is stored in t0
Read more »
0%