[personal profile] vashu11
I like to copypaste quotes from books that I read. And many times Acrobat Reader frustrated me, saving to clipboard something like


Evenif ouroldideasaboutthemindarewrong,wecanlearnalotbytryingtounderstand w h y w e b e l i e v e t h e m . I n s t e a d o f a s k i n g , " W h a t a r e S e l v e s ? w" e c a n a s k , i n s t e a d, " W h a t d r e o u r ideasaboutSelyes?"

If the book was scanned and converted into .pdf, then Acrobat can provide you a text only after running its OCR and it's rather bad. So you get either text with no spaces at all or too many spaces.

So I wrote a script for managing this problem.

You download it, then download dictionary file(run 'curl http://www-01.sil.org/linguistics/wordlists/english/wordlist/wordsEn.txt > wordsEn.txt' or download from here manually).

Then you run 'python acrobat_clipboard_corrector.py copypaste.txt' and the text above turns into


Even if our old ideas about them in dare wrong , we can learn al otby trying to understand why web el ie vet hem . Instead of asking , "What areS elves ? w"ecan ask , instead , "What dre our ideas about Se lyes ? "

It removes all spaces from text, then tries to split text at word boundaries. It is not perfect, but better than nothing.
This account has disabled anonymous posting.
If you don't have an account you can create one now.
No Subject Icon Selected
More info about formatting

Profile

vashu11

December 2024

S M T W T F S
12 34567
891011121314
15161718192021
22232425262728
293031    

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Jun. 8th, 2025 06:30 pm
Powered by Dreamwidth Studios