Current Status:

Loading Surrender.html…

Paragraphs found (2315)

  • Formatting 19 chapters.
  • Formatting 12 section breaks
  • Formatting 43 centered paragraphs
  • Formatting 4 unindented paragraphs
  • Setting remaining 2237 paragraphs to indented
  • Removing 2248 span tags
  • Removing 1 div tags
  • Removing embedded styles
  • Linking ebook.css

Saving to output.html…
Thank you for flying WordBleach

That pretty much wraps things the hell up. Now onto the hard stuff.

Well, all good things must come to an end.

Time to drive to the office. Fashionably late.

@kdfrawg Nah. This is my twisted idea of fun.

real 0m0.575s
user 0m1.007s
sys 0m0.060s

Beats the hell out of when I used to do the edits manually. Took six to eight hours then.

A little more work and I will have it identifying and inserting true emdashes and the like as well.

Believe it or not, that's the container Amazon's kindlegen creates (With a buttload of added record types and more than a few fields that had to be reverse engineered)

Okay. Time to move this code into git now that I have a project structure set in stone (ie: I know WTF I am doing now..sort of)

Switch over to the Mac because doing it on RHEL vbox via SSH was a dumb idea.

Add some cases to handle braindeath from Sidekick's HTML dumps from OpenOffice.

Locate page-break-before and page-break-after attributes for I can have the code split the source into separate files so I don't have to.

Another main class to make adjustments to content.opf and toc.ncx for the various ePubs.

Look into porting my C++ PalmDB classes over so I can do surgery on MOBIs

This is how you have fun when you have no life to speak of outside of work

FWIW, I'd do worse than Windows 7 if you had to teach me Japanese.

[twilson@largo bancomedia]$ java -jar WordBleach.jar Test.html
Loading Test.html…

Paragraphs found: (2315)
Deleting paragraph attributes
TODO: Apply styles via stylesheet

Span tags found: (2248)
Deleting…

Font tags found: (0)

Removing STYLES..

TODO: Insert linkref to stylesheet

Saving to output.html…

---
80 lines of code. This is so easy that I'm getting bored.

I'm in the too wired to sleep but too tired to think stage.

In the past, I've played with some of those tools. Powerful but more bug-prone and complicated than the code itself sometimes. Of course, a lot of that is me not being a Java whiz. I'm definitely a neophyte with this stuff.