Skip to content

Where to download transformer dataset from? #1058

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sfraczek opened this issue Jul 19, 2018 · 3 comments
Closed

Where to download transformer dataset from? #1058

sfraczek opened this issue Jul 19, 2018 · 3 comments
Labels

Comments

@sfraczek
Copy link
Contributor

sfraczek commented Jul 19, 2018

Hi,

I need to work on Transformer model optimization. Could you please tell me where can I download dataset from? Should it be exactly WMT-16? Which archives?

Thank you.

Regards,
Sylwester

@sfraczek sfraczek added the intel label Jul 19, 2018
@sfraczek
Copy link
Contributor Author

Ok I found instruction on dataset in the Chinese README. https://github.com/PaddlePaddle/models/blob/develop/fluid/neural_machine_translation/transformer/README_cn.md

@sfraczek
Copy link
Contributor Author

But there is no instruction how to obtain newstest2013.tok.bpe.32000.en-de. Can you please tell me how?

@sfraczek sfraczek reopened this Jul 19, 2018
@sfraczek
Copy link
Contributor Author

Ok I figured that I probably have to do paste -d ' \ t ' newstest2013.tok.bpe.32000.en newstest2013.tok.bpe.32000.de > newstest2013.tok.bpe.32000.en-de. Moved this issue to a bigger one: #1059

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant