Skip to content
This repository was archived by the owner on Jul 4, 2023. It is now read-only.

Commit 41fe6cc

Browse files
authored
Update README
1 parent 86a44fd commit 41fe6cc

File tree

1 file changed

+30
-17
lines changed

1 file changed

+30
-17
lines changed

README.md

Lines changed: 30 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
<p align="center"><img width="55%" src="docs/_static/img/logo.svg" /></p>
22

3-
<h3 align="center">Basic Utilities for PyTorch NLP Software</h3>
3+
<h3 align="center">Basic Utilities for PyTorch Natural Language Processing (NLP)</h3>
44

55
PyTorch-NLP, or `torchnlp` for short, is a library of basic utilities for PyTorch
6-
Natural Language Processing (NLP). `torchnlp` extends PyTorch to provide you with
6+
NLP. `torchnlp` extends PyTorch to provide you with
77
basic text data processing functions.
88

99
![PyPI - Python Version](https://img.shields.io/pypi/pyversions/pytorch-nlp.svg?style=flat-square)
@@ -39,7 +39,7 @@ via [our ReadTheDocs website](https://pytorchnlp.readthedocs.io).
3939

4040
Within an NLP data pipeline, you'll want to implement these basic steps:
4141

42-
### Load Your Data 🐿
42+
### 1. Load your Data 🐿
4343

4444
Load the IMDB dataset, for example:
4545

@@ -71,9 +71,11 @@ open(directory_path / train_file_path)
7171

7272
Don't worry we'll handle caching for you!
7373

74-
### Text To Tensor
74+
### 2. Text to Tensor
7575

76-
Tokenize and encode your text as a tensor. For example, a `WhitespaceEncoder` breaks
76+
Tokenize and encode your text as a tensor.
77+
78+
For example, a `WhitespaceEncoder` breaks
7779
text into terms whenever it encounters a whitespace character.
7880

7981
```python
@@ -84,7 +86,7 @@ encoder = WhitespaceEncoder(loaded_data)
8486
encoded_data = [encoder.encode(example) for example in loaded_data]
8587
```
8688

87-
### Tensor To Batch
89+
### 3. Tensor to Batch
8890

8991
With your loaded and encoded data in hand, you'll want to batch your dataset.
9092

@@ -107,15 +109,17 @@ batches = [collate_tensors(batch, stack_tensors=stack_and_pad_tensors) for batch
107109
PyTorch-NLP builds on top of PyTorch's existing `torch.utils.data.sampler`, `torch.stack`
108110
and `default_collate` to support sequential inputs of varying lengths!
109111

110-
### Your Good To Go!
112+
### 4. Training and Inference
111113

112114
With your batch in hand, you can use PyTorch to develop and train your model using gradient descent.
115+
For example, check out [this example code](examples/snli/train.py) for training on the Stanford
116+
Natural Language Inference (SNLI) Corpus.
113117

114-
### Last But Not Least
118+
## Last But Not Least
115119

116120
PyTorch-NLP has a couple more NLP focused utility packages to support you! 🤗
117121

118-
#### Deterministic Functions
122+
### Deterministic Functions
119123

120124
Now you've setup your pipeline, you may want to ensure that some functions run deterministically.
121125
Wrap any code that's random, with `fork_rng` and you'll be good to go, like so:
@@ -141,10 +145,10 @@ Numpy: 843828735
141145
Torch: 843828736
142146
```
143147

144-
#### Pre-Trained Word Vectors
148+
### Pre-Trained Word Vectors
145149

146150
Now that you've computed your vocabulary, you may want to make use of
147-
pre-trained word vectors, like so:
151+
pre-trained word vectors to set your embeddings, like so:
148152

149153
```python
150154
import torch
@@ -160,7 +164,7 @@ for i, token in enumerate(encoder.vocab):
160164
embedding_weights[i] = pretrained_embedding[token]
161165
```
162166

163-
#### Neural Networks Layers
167+
### Neural Networks Layers
164168

165169
For example, from the neural network package, apply the state-of-the-art `LockedDropout`:
166170

@@ -175,7 +179,7 @@ dropout = LockedDropout(0.5)
175179
dropout(input_) # RETURNS: torch.FloatTensor (6x3x10)
176180
```
177181

178-
#### Metrics
182+
### Metrics
179183

180184
Compute common NLP metrics such as the BLEU score.
181185

@@ -197,17 +201,25 @@ Need more help? We are happy to answer your questions via [Gitter Chat](https://
197201

198202
## Contributing
199203

200-
We've released PyTorch-NLP because we found a lack of basic toolkits for NLP in PyTorch. We hope that other organizations can benefit from the project. We are thankful for any contributions from the community.
204+
We've released PyTorch-NLP because we found a lack of basic toolkits for NLP in PyTorch. We hope
205+
that other organizations can benefit from the project. We are thankful for any contributions from
206+
the community.
201207

202208
### Contributing Guide
203209

204-
Read our [contributing guide](https://github.com/PetrochukM/PyTorch-NLP/blob/master/CONTRIBUTING.md) to learn about our development process, how to propose bugfixes and improvements, and how to build and test your changes to PyTorch-NLP.
210+
Read our [contributing guide](https://github.com/PetrochukM/PyTorch-NLP/blob/master/CONTRIBUTING.md)
211+
to learn about our development process, how to propose bugfixes and improvements, and how to build
212+
and test your changes to PyTorch-NLP.
205213

206214
## Related Work
207215

208216
### [torchtext](https://github.com/pytorch/text)
209217

210-
torchtext and PyTorch-NLP differ in the architecture and feature set; otherwise, they are similar. torchtext and PyTorch-NLP provide pre-trained word vectors, datasets, iterators and text encoders. PyTorch-NLP also provides neural network modules and metrics. From an architecture standpoint, torchtext is object orientated with external coupling while PyTorch-NLP is object orientated with low coupling.
218+
torchtext and PyTorch-NLP differ in the architecture and feature set; otherwise, they are similar.
219+
torchtext and PyTorch-NLP provide pre-trained word vectors, datasets, iterators and text encoders.
220+
PyTorch-NLP also provides neural network modules and metrics. From an architecture standpoint,
221+
torchtext is object orientated with external coupling while PyTorch-NLP is object orientated with
222+
low coupling.
211223

212224
### [AllenNLP](https://github.com/allenai/allennlp)
213225

@@ -220,7 +232,8 @@ AllenNLP is designed to be a platform for research. PyTorch-NLP is designed to b
220232

221233
## Citing
222234

223-
If you find PyTorch-NLP useful for an academic publication, then please use the following BibTeX to cite it:
235+
If you find PyTorch-NLP useful for an academic publication, then please use the following BibTeX to
236+
cite it:
224237

225238
```
226239
@misc{pytorch-nlp,

0 commit comments

Comments
 (0)