feat: reword (#2549)
parent
4c984bba7e
commit
28183c7438
|
@ -128,7 +128,7 @@ Training neural networks in tinygrad is super simple.
|
|||
All we need to do is define our neural network, define our loss function, and then call `.backward()` on the loss function to compute the gradients.
|
||||
They can then be used to update the parameters of our neural network using one of the many optimizers in [optim.py](/tinygrad/nn/optim.py).
|
||||
|
||||
For our loss function we will be using sparse categorical cross entropy loss.
|
||||
For our loss function we will be using sparse categorical cross entropy loss. The implementation below is taken from [tensor.py](/tinygrad/tensor.py), it's copied below to highlight an important detail of tinygrad.
|
||||
|
||||
```python
|
||||
def sparse_categorical_crossentropy(self, Y, ignore_index=-1) -> Tensor:
|
||||
|
@ -138,9 +138,9 @@ def sparse_categorical_crossentropy(self, Y, ignore_index=-1) -> Tensor:
|
|||
return self.log_softmax().mul(y).sum() / loss_mask.sum()
|
||||
```
|
||||
|
||||
As we can see in this implementation of cross entropy loss, there are certain operations that tinygrad does not support.
|
||||
As we can see in this implementation of cross entropy loss, there are certain operations that tinygrad does not support natively.
|
||||
Namely, operations that are load/store or assigning a value to a tensor at a certain index.
|
||||
Load/store ops are not supported in tinygrad because they add complexity when trying to port to different backends and 90% of the models out there don't use/need them.
|
||||
Load/store ops are not supported in tinygrad natively because they add complexity when trying to port to different backends, 90% of the models out there don't use/need them, and they can be implemented like it's done above with an `arange` mask.
|
||||
|
||||
For our optimizer we will be using the traditional stochastic gradient descent optimizer with a learning rate of 3e-4.
|
||||
|
||||
|
|
Loading…
Reference in New Issue