Skip to main content
summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorRoberto Tyley2011-08-25 21:25:10 +0000
committerRoberto Tyley2011-08-25 21:25:10 +0000
commit602b2f0d450236a7e4d1b5bc1532a092b0a3129f (patch)
treed3369ea18399ec232d8338b0b3f9b10f4d050ee0
parent28a49925245df61f24b072727265a9eb530ab184 (diff)
downloadjgit-602b2f0d450236a7e4d1b5bc1532a092b0a3129f.tar.gz
jgit-602b2f0d450236a7e4d1b5bc1532a092b0a3129f.tar.xz
jgit-602b2f0d450236a7e4d1b5bc1532a092b0a3129f.zip
Tolerate zlib deflation with window size < 32Kb
JGit currently identifies loose objects as 'corrupt' if they've been deflated using a window size less than 32Kb, because the isStandardFormat() function doesn't recognise the header byte as a zlib header. This patch makes the method tolerant of all valid window sizes (15-bit to 8-bit) - but doesn't sacrifice it's accuracy in distingushing the standard loose-object format from the experimental (now abandoned) format. It's based on a patch which has been merged into C-Git master branch: https://github.com/git/git/commit/7f684a2aff636f44a506 On memory constrained systems zlib may use a much smaller window size - working on Agit, I found that Android uses a 4KB window; giving a header byte of 0x48, not 0x78. Consequently all loose objects generated by the Android platform appear 'corrupt' :( It might appear that this patch changes isStandardFormat() to the point where it could incorrectly identify the experimental format as the standard one, but the two criteria (bitmask & checksum) can only give a false result for an experimental object where both of the following are true: 1) object size is exactly 8 bytes when uncompressed (bitmask) 2) [single-byte in-pack git type&size header] * 256 + [1st byte of the following zlib header] % 31 = 0 (checksum) As it happens, for all possible combinations of valid object type (1-4) and window bits (0-7), the only time when the checksum will be divisible by 31 is for 0x1838 - ie object type *1*, a Commit - which, due the fields all Commit objects must contain, could never be as small as 8 bytes in size. Given this, the combination of the two criteria (bitmask & checksum) always correctly determines the buffer format, and is more tolerant than the previous version. References: Android uses a 4KB window for deflation: http://android.git.kernel.org/?p=platform/libcore.git;a=blob;f=luni/src/main/native/java_util_zip_Deflater.cpp;h=c0b2feff196e63a7b85d97cf9ae5bb2583409c28;hb=refs/heads/gingerbread#l53 Code snippet searching for false positives with the zlib checksum: https://gist.github.com/1118177 Change-Id: Ifd84cd2bd6b46f087c9984fb4cbd8309f483dec0
-rw-r--r--org.eclipse.jgit/src/org/eclipse/jgit/storage/file/UnpackedObject.java35
1 files changed, 28 insertions, 7 deletions
diff --git a/org.eclipse.jgit/src/org/eclipse/jgit/storage/file/UnpackedObject.java b/org.eclipse.jgit/src/org/eclipse/jgit/storage/file/UnpackedObject.java
index b04a294b63..de38caa231 100644
--- a/org.eclipse.jgit/src/org/eclipse/jgit/storage/file/UnpackedObject.java
+++ b/org.eclipse.jgit/src/org/eclipse/jgit/storage/file/UnpackedObject.java
@@ -270,14 +270,35 @@ public class UnpackedObject {
}
private static boolean isStandardFormat(final byte[] hdr) {
- // Try to determine if this is a standard format loose object or
- // a pack style loose object. The standard format is completely
- // compressed with zlib so the first byte must be 0x78 (15-bit
- // window size, deflated) and the first 16 bit word must be
- // evenly divisible by 31. Otherwise its a pack style object.
- //
+ /*
+ * We must determine if the buffer contains the standard
+ * zlib-deflated stream or the experimental format based
+ * on the in-pack object format. Compare the header byte
+ * for each format:
+ *
+ * RFC1950 zlib w/ deflate : 0www1000 : 0 <= www <= 7
+ * Experimental pack-based : Stttssss : ttt = 1,2,3,4
+ *
+ * If bit 7 is clear and bits 0-3 equal 8, the buffer MUST be
+ * in standard loose-object format, UNLESS it is a Git-pack
+ * format object *exactly* 8 bytes in size when inflated.
+ *
+ * However, RFC1950 also specifies that the 1st 16-bit word
+ * must be divisible by 31 - this checksum tells us our buffer
+ * is in the standard format, giving a false positive only if
+ * the 1st word of the Git-pack format object happens to be
+ * divisible by 31, ie:
+ * ((byte0 * 256) + byte1) % 31 = 0
+ * => 0ttt10000www1000 % 31 = 0
+ *
+ * As it happens, this case can only arise for www=3 & ttt=1
+ * - ie, a Commit object, which would have to be 8 bytes in
+ * size. As no Commit can be that small, we find that the
+ * combination of these two criteria (bitmask & checksum)
+ * can always correctly determine the buffer format.
+ */
final int fb = hdr[0] & 0xff;
- return fb == 0x78 && (((fb << 8) | hdr[1] & 0xff) % 31) == 0;
+ return (fb & 0x8f) == 0x08 && (((fb << 8) | hdr[1] & 0xff) % 31) == 0;
}
private static InputStream inflate(final InputStream in, final long size,

Back to the top