Fix Broken Text Encoding

You have a text file with broken encoding? You want to strip it from all invalid characters?

Here is how to do it:

iconv -c -t ASCII input.txt

The result will be printed to stdout. The -c switch does the stripping. Using -t you can select every target encoding you like.

Comments

Thanks

I had a text from a webpage with weird encoding. It showed the typical characters you see when viewing a latin1 text in an UTF8 encoding, but trying to save the text didn't work.

Finally, I copied and pasted the text into a file and made:

iconv -c -t latin1 something.txt

That worked except for two characters, I think they were the only ones in UTF8.

I think you English speakers don't imagine how big this problem can be, but an small text in Spanish usually contains plenty of non ASCII characters.

The thing is that this post saved my day. Thank you very much.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

More information about formatting options

To prevent automated spam submissions leave this field empty.
Syndicate content