Kyle Bragger

Hi. I'm Kyle Bragger. I make Forrst. You should follow me on Twitter here.

March 18, 2009 at 10:37pm
home
reblogged from marco

I love character encoding!

I’m interested!

marco:

Tonight’s goal: Make a simple PHP class.

  • Input: a URL pointing to an HTML document.
  • Output: a UTF-8 version, regardless of what encoding it’s really in.

Sounds easy, right?

Nope. Because some pages specify encoding via HTTP header, some specify via meta tag, some specify both but they disagree, and some don’t specify at all. Sometimes, the encoding is specified with an unusual variant of its name (e.g. X-GBK, MS939). And often, the specified encoding is wrong.

But I think I got it, finally.

This is so useful, albeit to a relatively narrow range of programmers, that I feel bad not releasing it to the world, except that I assume that someone else has already done this and I just didn’t bother looking for it. (My experiences with PHP-community code are not good, so I almost always roll my own.) Any interest?

Notes

  1. mavryx reblogged this from marco and added:
    Good Lord! Very Interested~~!!
  2. answers reblogged this from 200 and added:
    Shouldn’t be, but it is. mb_detect_encoding doesn’t always detect properly. It works statistically, and it’s imperfect....
  3. 200 reblogged this from marco and added:
    mb_detect_encoding...mb_convert_encdoing plus...bit string...
  4. bjornstar reblogged this from marco and added:
    Yes, please release it. You don’t have...support it, but helping people convert into UTF-8...
  5. spiteshow reblogged this from marco and added:
    check it out. I haven’t had to mess...converting character
  6. kylewritescode reblogged this from marco and added:
    I’m interested! marco:
  7. marco posted this