Closed Bug 631751 Opened 14 years ago Closed 14 years ago

A saved html page is rendered improperly

Tracking

()

Status:

RESOLVED FIXED

Tracking Flags:

Tracking

Status

blocking2.0

---

final+

People

(Reporter: bobbyalexphilip, Assigned: hsivonen)

References

(
URL
)

Details

(Keywords: regression, Whiteboard: [softblocker])

Attachments

(4 files, 2 obsolete files)

Always reproducible file 14 years ago bobbyalexphilip 49.16 KB, text/html		Details
improper rendering 14 years ago bobbyalexphilip 306.00 KB, image/png		Details
Desired rendering 14 years ago bobbyalexphilip 247.21 KB, image/png		Details
Patch with broken test 14 years ago Henri Sivonen (:hsivonen) (not reading bugmail until 2025-06-06) 8.58 KB, patch		Details \| Diff \| Splinter Review
Patch using the chardet harness 14 years ago Henri Sivonen (:hsivonen) (not reading bugmail until 2025-06-06) 10.08 KB, patch	bzbarsky : review+	Details \| Diff \| Splinter Review
Address review comments, add magic limit for minimum bytes 14 years ago Henri Sivonen (:hsivonen) (not reading bugmail until 2025-06-06) 9.65 KB, patch	bzbarsky : review+ bzbarsky : approval2.0+	Details \| Diff \| Splinter Review

bobbyalexphilip

Reporter

Description

•

14 years ago

User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0b10) Gecko/20100101 Firefox/4.0b10 Build Identifier: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0b10) Gecko/20100101 Firefox/4.0b10 I get a statement from my brokers as an html page. In firefox 4 its rendered improperly. The alignment is broken totally. Checked the same page in 3.6 as well as IE8, it is shown properly Reproducible: Always Steps to Reproduce: Just use the attached HTML page Actual Results: See the screenshot1 Expected Results: See the screenshot2

bobbyalexphilip

Reporter

Comment 1

•

14 years ago

Attached file Always reproducible file — Details

bobbyalexphilip

Reporter

Comment 2

•

14 years ago

Attached image improper rendering — Details

bobbyalexphilip

Reporter

Comment 3

•

14 years ago

Attached image Desired rendering — Details

Henri Sivonen (:hsivonen) (not reading bugmail until 2025-06-06)

Assignee

Comment 4

•

14 years ago

This is very odd. I'll look into this.

Assignee: nobody → hsivonen

Status: UNCONFIRMED → NEW

Ever confirmed: true

Henri Sivonen (:hsivonen) (not reading bugmail until 2025-06-06)

Assignee

Updated

•

14 years ago

OS: Windows 7 → All

Hardware: x86 → All

Boris Zbarsky [:bzbarsky]

Comment 5

•

14 years ago

The file is UTF-16 encoded, with no BOM. It does have a <meta> saying so, but of course the <meta> prescan won't find that. Henri, it looks like the HTML5 parser treats the page as UTF-8 whereas the old parser treated it as UTF-16LE...

blocking2.0: --- → ?

Keywords: regression

Boris Zbarsky [:bzbarsky]

Comment 6

•

14 years ago

Looks like the HTML5 spec's charset detection at http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#determining-the-character-encoding is just broken for BOM-less UTF-16 without a declared charset. We should use the weasel-room from step 7 to do something sane here. The old parser had a bunch of code dealing with null bytes at file start, including looking for UCS-4 BOMs, looking for some combination of '<' and '\0' in the first 4 bytes, looking for UTF-16-encoded "<!" and "<?" and "<H" and "<h" in the first 4 bytes, etc. I think we need something like that here too; certainly to handle this page correctly.

Zack Weinberg (:zwol)

Comment 7

•

14 years ago

Shouldn't be too hard to craft wording for the prescan algorithm that looks for sequences like 00 3C 00 21 00 2D 00 2D, or the same in opposite endianness... http://tools.ietf.org/html/draft-abarth-mime-sniff would need updating too.

Henri Sivonen (:hsivonen) (not reading bugmail until 2025-06-06)

Assignee

Comment 8

•

14 years ago

(In reply to comment #5) > Henri, it looks like the HTML5 parser treats the page as UTF-8 whereas the old > parser treated it as UTF-16LE... IE9 PP7 treats this as UTF-8, too, but it seems to drop zero bytes before tokenizing, which isn't exactly safe, so let's not go there. It's interesting to test what happens with BOMless UTF-16 that contains characters from outside the Basic Latin range. That is, the expectation is for the content to "break" if the test case here "works" only because zeros are dropped. It turns out that neither IE6 nor IE9 PP7 (didn't test 7 and 8) sniffs UTF-16 in these cases (spot-checked loading from local disk in IE6, too): http://hsivonen.iki.fi/test/moz/utf-16-russian-meta-quirks.htm http://hsivonen.iki.fi/test/moz/utf-16-russian-meta.htm http://hsivonen.iki.fi/test/moz/utf-16-russian-no-meta-quirks.htm http://hsivonen.iki.fi/test/moz/utf-16-russian-no-meta.htm Given that these pages don't "work" in IE, Chrome or Safari, I think this bug should be WONTFIXed. bz, is that OK given this evidence?

Boris Zbarsky [:bzbarsky]

Comment 9

•

14 years ago

I think that gives a really sucky user experience, though. And the pages do work with Opera and Firefox 3.x. They also "work" in IE for western users.... I really think that UAs should make more of an effort to have this work. There are some basic heuristics we could use (e.g. the one you proposed on irc) that should have a very low false-positive rate and work well on at least the corpus of documents that IE works on.

Henri Sivonen (:hsivonen) (not reading bugmail until 2025-06-06)

Assignee

Comment 10

•

14 years ago

Per IRC discussion with bz, I'll fix this for the case that seems to "work" in IE by checking if every second byte is zero in the first 1024 bytes.

Status: NEW → ASSIGNED

Henri Sivonen (:hsivonen) (not reading bugmail until 2025-06-06)

Assignee

Comment 11

•

14 years ago

Attached patch Patch with broken test (obsolete) — Details — Splinter Review

The test doesn't work. Framing probably interfering or something.

Boris Zbarsky [:bzbarsky]

Comment 12

•

14 years ago

Almost certainly, yes. :(

Johnny Stenback (:jst)

Comment 13

•

14 years ago

Blocking, let's get this test fixed and the patch landed.

blocking2.0: ? → final+

Whiteboard: [softblocker]

Boris Zbarsky [:bzbarsky]

Comment 14

•

14 years ago

Fwiw, a chrome mochitest might work for the test...

Henri Sivonen (:hsivonen) (not reading bugmail until 2025-06-06)

Assignee

Comment 15

•

14 years ago

Attached patch Patch using the chardet harness (obsolete) — Details — Splinter Review

Attachment #511429 - Attachment is obsolete: true

Attachment #512139 - Flags: review?(bzbarsky)

Boris Zbarsky [:bzbarsky]

Comment 16

•

14 years ago

Comment on attachment 512139 [details] [diff] [review] Patch using the chardet harness >+ if (i & 1) { >+ if (evenByteNonZero) { >+ return; >+ } >+ oddByteNonZero = PR_TRUE; >+ } else { >+ if (oddByteNonZero) { >+ return; >+ } >+ evenByteNonZero = PR_TRUE; >+ } This (duplicated) code might be simpler if you declared the booleans as: // even-numbered bytes tracked at 0, odd-numbered bytes tracked at 1 PRBool byteNonZero[2] = { PR_FALSE, PR_FALSE }; and then you can do if (byteNonZero[1 - (i % 2)]) return; byteNonZero[i % 2] = PR_TRUE; in the loop, and similar for the i+j loop. I don't see why you need the last check for |oddByteNonZero| before deciding you're LE. How can that be nonzero? Also, wouldn't it make sense to just do the mCharset set conditionally at the end there, and hare the other three lines of code? > + * Sniff for if every other byte in the sniffing buffer is zero. "Check whether every other byte...." r=me with those changes.

Attachment #512139 - Flags: review?(bzbarsky) → review+

Henri Sivonen (:hsivonen) (not reading bugmail until 2025-06-06)

Assignee

Comment 17

•

14 years ago

Oops. Can't land this. Bad behavior with files with length of one byte.

Henri Sivonen (:hsivonen) (not reading bugmail until 2025-06-06)

Assignee

Comment 18

•

14 years ago

Attached patch Address review comments, add magic limit for minimum bytes — Details — Splinter Review

The patch requires at least 30 bytes of data (the length of "<title></title>" as UTF-16) in order to sniff as UTF-16. This avoids false positives with very short files.

Attachment #512139 - Attachment is obsolete: true

Attachment #512532 - Flags: review?(bzbarsky)

Boris Zbarsky [:bzbarsky]

Comment 19

•

14 years ago

Comment on attachment 512532 [details] [diff] [review] Address review comments, add magic limit for minimum bytes r=me

Attachment #512532 - Flags: review?(bzbarsky) → review+

Henri Sivonen (:hsivonen) (not reading bugmail until 2025-06-06)

Assignee

Comment 20

•

14 years ago

Comment on attachment 512532 [details] [diff] [review] Address review comments, add magic limit for minimum bytes Sounds like softblockers need explicit approval now.

Attachment #512532 - Flags: approval2.0?

Boris Zbarsky [:bzbarsky]

Comment 21

•

14 years ago

Comment on attachment 512532 [details] [diff] [review] Address review comments, add magic limit for minimum bytes a=me

Attachment #512532 - Flags: approval2.0? → approval2.0+

Henri Sivonen (:hsivonen) (not reading bugmail until 2025-06-06)

Assignee

Comment 22

•

14 years ago

Thanks. http://hg.mozilla.org/mozilla-central/rev/8eb1b3531dd9

Status: ASSIGNED → RESOLVED

Closed: 14 years ago

Resolution: --- → FIXED

Boris Zbarsky [:bzbarsky]

Updated

•

14 years ago

Depends on: 638318

Henri Sivonen (:hsivonen) (not reading bugmail until 2025-06-06)

Assignee

Updated

•

5 years ago

Updated

•

4 years ago