
You may have noticed if you use the Google services that they lie to us. Don't get me wrong the services are great and all, but they missed an elementary step.
Let's say i want to get the correctly formatted address for the Red Hat office in Switzerland, i would send a request such as: http://maps.google.com/maps/geo?q=neuchatel&output=xml&key=MYKEY
The first line says:
<?xml version="1.0" encoding="UTF-8"?>
It means that what Google is sending us is encoded according to the UTF-8 standard, so far so good.Well, unluckily the address i'm looking for contains one of those weird letter:
, if you use any parser that trust the first line of this XML it will very probably fail. (Note that Firefox doesn't seem to trust it and automagically find the correct encoding)So let's have a look at the hexadecimal string that Google sends us, for the world "Neuchâtel":
4E 65 75 63 68 E2 74 65 6CSince UTF-8 is compatible with ASCII encoding, all the letters but our weird letter are correct. Unfortunately for our so loved letter, it's encoded as "E2" which is incompatible with UTF-8. The correct encoding for
is "C3 A2", yes two bytes for a single letter, that's the trick of UTF-8 and some other encoding, it encodes letters on up to 4 bytes. Sending one byte when two are expected definitely breaks many parsers, tested on JBoss XB and the PHP XML Simple Parser.It does not only affect this service but also the Google Widget discovery service.
One solution is to encode it correctly to UTF-8 before you send it to your parser for parsing. It sucks if they decide to change the encoding to something else though.
One solution is to use a parser that doesn't care about what is said and try to do his best (a la Firefox), it's not the cleanest way though.
The latest is to let the Google team know, a solution we already tried. I hope they will read this blog and act accordingly.
2 comments:
Yahoo! gets it right:
http://local.yahooapis.com/MapsService/V1/geocode?appid=YahooDemo&location=Neuch%E2tel
The E2 is converted to UTF-8 in the reply.
I wasn't aware of the equivalent Yahoo service. Yep, the encoding is correct in this case.
What's "funny" is that Yahoo doesn't precise any encoding (<?xml version="1.0"?>), but from what i read XML's default encoding is UTF-8, good :)
Yahoo:1 Google:0 ;)
Post a Comment