Epubs

Epubs

Grabbing data from the Simple OpenNMT-py REST Server

5 minute read Published:

Alright, so you’ve trained a model or two and are ready to translate, but when you start using OpenNMT.py’s training script, you run into some unforeseen issues — for example, you’ll find it’s not a huge fan of whitespace, and it’s not really meant to translate an entire document. And for my use case, I want to actually print bilingual content to a single file in the format: language 1 string language 2 string language 1 string language 2 string What to do?

Using Python to clean up corpus files for OpenNMT Training

8 minute read Published:

So I’m working on a little epub project tentatively called epub-ocr-and-translate (EOAT) that started out as me sharing a bunch of little scripts I was using to OCR, translate, and single-source the creation of PDFs and epubs from old public domain works in other languages. It’s kind of ballooned into a much bigger project than I originally envisioned, somehow leading me down the path of (don’t laugh…okay, fine, you can laugh, but make it quick) DIY machine translation…