Find the MIME type of a file based on the file signature

Sometimes we need to store images in a database instead of as physical files. For this purpose, the SQL Server database provides a data type called image. For the sake of simplicity, the extension of the file is also stored with image content. The extension will help to identify the MIME type when loading the content from the database.

If the file extension is incorrect or not given then we cannot download the document as a known type. The solution to this problem is MIME type detection using Urlmon.dll, to find an appropriate MIME type from binary data. In Urlmon.dll, there’s a function called FindMimeFromData.

public static string GetMimeType(byte[] content)

{

IntPtr mimeout;
int MaxContent = content.Length;
if (MaxContent > 4096) MaxContent = 4096;
string mime = string.Empty;
int result = 0;
byte[] buf = new byte[MaxContent];
Array.Copy(content,buf,MaxContent);
result = NativeMethods.FindMimeFromData(IntPtr.Zero, null,
buf, MaxContent, null, 0, out mimeout, 0);
if (result != 0) throw Marshal.GetExceptionForHR(result);
mime = Marshal.PtrToStringUni(mimeout);
Marshal.FreeCoTaskMem(mimeout);
return mime;
}
[DllImport(“urlmon.dll”, CharSet = CharSet.Unicode, ExactSpelling = true, SetLastError = false)] public static extern int FindMimeFromData(IntPtr pBC,
[MarshalAs(UnmanagedType.LPWStr)] string pwzUrl,
[MarshalAs(UnmanagedType.LPArray, ArraySubType = UnmanagedType.I1, SizeParamIndex = 3)] byte[ ] pBuffer, int cbSize, [MarshalAs(UnmanagedType.LPWStr)] string pwzMimeProposed,
int dwMimeFlags, out IntPtr ppwzMimeOut, int dwReserved);}

FindMimeFromData tests for the following MIME types:

text/plain, text/html, text/xml, text/richtext, text/scriptlet, audio/x-aiff, audio/basic, audio/mid, audio/wav, image/gif, image/jpeg, image/pjpeg, image/png, image/x-png, image/tiff, image/bmp, image/x-xbitmap, image/x-jg, image/x-emf, image/x-wmf, video/avi, video/mpeg, application/octet-stream, application/postscript, application/base64, application/macbinhex40, application/pdf, application/xml, application/atom+xml, application/rss+xml, application/x-compressed, application/x-zip-compressed, application/x-gzip-compressed, application/java, application/x-msdownload

FindMimeFromData does not detect word or excel file, it simply return “application/octet-stream”. To determine the MIME type of word/excel we have to compare the file content with a content set of byte sequences.

private static readonly byte[] BMP = { 66, 77 };
private static readonly byte[] MSO = { 208, 207, 17, 224, 161, 177, 26, 225 }; //MSO includes doc, xlsprivate static readonly byte[] XLS = { 77, 105, 99, 114, 111, 115, 111, 102, 116, 32, 69, 120, 99, 101, 108, 0 };
private static readonly byte[] GIF = { 71, 73, 70, 56 };
private static readonly byte[] JPG = { 255, 216, 255 };
private static readonly byte[] PDF = { 37, 80, 68, 70, 45, 49, 46 };
private static readonly byte[] PNG = { 137, 80, 78, 71, 13, 10, 26, 10, 0, 0, 0, 13, 73, 72, 68, 82 };
private static readonly byte[] TIFF = { 73, 73, 42, 0 };
public static string GetMimeType(byte[] content)
{ string mime = “application/octet-stream”;
if (content.Take(2).SequenceEqual(BMP)) mime = “image/bmp”;
else if (content.Take(8).SequenceEqual(MSO))
mime = IsOfType(content,XLS) ? “application/vnd.ms-excel” : “application/msword”;
else if (content.Take(4).SequenceEqual(GIF)) mime = “image/gif”;
else if (content.Take(3).SequenceEqual(JPG)) mime = “image/jpeg”;
else if (content.Take(7).SequenceEqual(PDF)) mime = “application/pdf”;
else if (content.Take(16).SequenceEqual(PNG)) mime = “image/png”;
else if (content.Take(4).SequenceEqual(TIFF)) mime = “image/tiff”;
return mime;
}
private static bool IsOfType(byte[] contents,byte[] pattern)
{
int i = 0;
foreach (byte content in contents)
{
if (content.Equals(pattern[i]))
{
i++;
if (pattern.Length.Equals(i))
return true;
}
else
i = 0;
}
return false;
}

Have questions? Contact the technology experts at InApp to learn more.