feat: reword (#2549)

2023-12-01 13:56:18 -05:00 · 2023-12-01 13:56:18 -05:00 · 28183c7438
parent 4c984bba7e
commit 28183c7438
1 changed files with 3 additions and 3 deletions
--- a/docs/quickstart.md
+++ b/docs/quickstart.md
@ -128,7 +128,7 @@ Training neural networks in tinygrad is super simple.
 All we need to do is define our neural network, define our loss function, and then call `.backward()` on the loss function to compute the gradients.
 They can then be used to update the parameters of our neural network using one of the many optimizers in [optim.py](/tinygrad/nn/optim.py).

-For our loss function we will be using sparse categorical cross entropy loss.
+For our loss function we will be using sparse categorical cross entropy loss. The implementation below is taken from [tensor.py](/tinygrad/tensor.py), it's copied below to highlight an important detail of tinygrad.

 ```python
 def sparse_categorical_crossentropy(self, Y, ignore_index=-1) -> Tensor:
@ -138,9 +138,9 @@ def sparse_categorical_crossentropy(self, Y, ignore_index=-1) -> Tensor:
    return self.log_softmax().mul(y).sum() / loss_mask.sum()
 ```

-As we can see in this implementation of cross entropy loss, there are certain operations that tinygrad does not support.
+As we can see in this implementation of cross entropy loss, there are certain operations that tinygrad does not support natively.
 Namely, operations that are load/store or assigning a value to a tensor at a certain index.
-Load/store ops are not supported in tinygrad because they add complexity when trying to port to different backends and 90% of the models out there don't use/need them.
+Load/store ops are not supported in tinygrad natively because they add complexity when trying to port to different backends, 90% of the models out there don't use/need them, and they can be implemented like it's done above with an `arange` mask.

 For our optimizer we will be using the traditional stochastic gradient descent optimizer with a learning rate of 3e-4.