Thứ Hai, 6 tháng 1, 2014

The “right way” to handle file downloads in PHP

’ve seen many download scripts written in PHP, from simple one-liners to dedicated classes. Yet, at least half of them share common errors; in many cases programmers simply copy the code from something that works, without even attempting to understand what it really does.
What follows is not a complete working download script, but rather a set of issues you should be aware about and that will allow you to write better code.

1. Never accept paths as input

It’s very tempting to write something like
readfile($_GET['file']);
but before you do, think about it: anyone could request any file on the server, even if it’s outside the public html area. Guessing is not too difficult and in a few tries, an attacker could obtain configuration or password files.
You might think you’re being extra clever by doing something like
$mypath = '/mysecretpath/' .  $_GET['file'];
but an attacker can use relative paths to evade that.
What you must do – always – is sanitize the input. Accept only file names, like this:
$path_parts = pathinfo($_GET['file']);
$file_name  = $path_parts['basename'];
$file_path  = '/mysecretpath/' . $file_name;
And work only with the file name and add the path to it youserlf.
Even better would be to accept only numeric IDs and get the file path and name from a database (or even a text file or key=>value array if it’s something that doesn’t change often). Anything is better than blindly accept requests.
If you need to restrict access to a file, you should generate encrypted, one-time IDs, so you can be sure a generated path can be used only once.

2. Use headers correctly

This is a very widespread problem and unfortunately even the PHP manual is plagued with errors. Developers usually say “this works for me” and they copy stuff they don’t fully understand.
First of all, I notice the use of headers like Content-Description and Content-Transfer-Encoding. There is no such thing in HTTP. Don’t believe me? Have a look at RFC2616, they specifically state “HTTP, unlike MIME, does not use Content-Transfer-Encoding, and does use Transfer-Encoding and Content-Encoding“. You may add those headers if you want, but they do absolutely nothing. Sadly, this wrong example is present even in the PHP manual.
Second, regarding the MIME-type, I often see things like Content-Type: application/force-download. There’s no such thing and Content-Type: application/octet-stream (RFC1521) would work just as fine (or maybe application/x-msdownload if it’s an exe/dll). If you’re thinking about Internet Explorer, it’s even better to specify it clearly rather than force it to “sniff” the content. See MIME Type Detection in Internet Explorer for details.
Even worse, I see these kinds of statements:
header("Content-Type: application/force-download");
header("Content-Type: application/octet-stream");
header("Content-Type: application/download");
The author must have been really frustrated and added three Content-Type headers. The only problem is, as specified in the header() manual entry, “The optional replace parameter indicates whether the header should replace a previous similar header, or add a second header of the same type. By default it will replace“. So unless you specify header("Content-Type: some-value", FALSE), the new Content-Type header will replace the old one.

3. Forcing download and Internet Explorer bugs

What would it be like to not having to worry about old versions of Internet Explorer? A better world, that’s for sure.
To force a file to download, the correct way is:
header("Content-Disposition: attachment; filename=\"$file_name\"");
Note: the quotes in the filename are required in case the file may contain spaces.
The code above will fail in IE6 unless the following are added:
header("Pragma: public");
header("Cache-Control: must-revalidate, post-check=0, pre-check=0");
Now, the use of Cache-Control is wrong in this case, especially to both values set to zero, according to Microsoft, but it works in IE6 and IE7 and later ignores it so no harm done.
If you still get strange results when downloading (especially in IE), make sure that the PHP output compression is disabled, as well as any server compression (sometimes the server inadvertently applies compression on the output produced by the PHP script).

4. Handling large file sizes

readfile() is a simple way to ouput files files. Historically it had some performance issues and while the documentation claims there are no memory problems, real-life scenarios beg to differ - output buffering and other subtle things. Regardless, if you need byte ranges support, you still have to output the old-fashioned way.
The simplest way to handle this is to output the file in “chunks”:
set_time_limit(0);
$file = @fopen($file_path,"rb");
while(!feof($file))
{
 print(@fread($file, 1024*8));
 ob_flush();
 flush();
}
If you’re on Apache, there’s a very cool module called mod_xsendfile that makes the download simpler and faster. You just output a header and the module takes care of the rest. Of course, you must be able to install it and it also makes the code less portable so you probably won’t want to use this for redistributable code.

5. Disable Gzip / output compression / output buffering

This is the source of many seemingly obscure errors. If you have output buffering, the file will not be sent to the user in chunks but only at the end of the script. Secondly, you’re most likely to be outputting a binary file that does not need compression anyway. Thirdly, some older browser+server combinations might become confused that you’re requesting a text file (PHP) but you’re sending compressed data with a different content type.
To avoid this, assuming you’re using Apache, create a .htaccess file in the folder containing your download script with this directive:
SetEnv no-gzip dont-vary
This will disable compression in that folder.

6. Resumable downloads

For large files, it’s useful to allow downloads to be resumed. Doing so is more involved, but it’s really worth doing, especially if you serve large files or video/audio.
I’m not going to write a complete example, but to point you in the right direction.
First, you need to signal the browser that you support ranges:
header("Accept-Ranges: bytes");
Again, I’ve seen examples in which the actual byte range is given (e.g. 0-1000), which is wrong, according to the specs.
At the start of your script, after checking the file (if it exists, etc.), you have to check if a range is requested:
if (isset($_SERVER['HTTP_RANGE'])) 
 $range = $_SERVER['HTTP_RANGE'];
Ranges can be expressed like ‘bytes=-99‘ or ‘bytes=0-99‘ for the first 100 bytes, ‘bytes=100-‘ to skip the first 100 bytes, or ‘bytes=1720-8392‘ for something in the middle. Be aware that multiple ranges can be specified (e.g. ’100-200,400-’) but processing and especially delivering those ranges is more complicated so no one bothers.
So, now that you have the range, you have to make sure that’s expressed in bytes, that it does not contain multiple ranges and that the range itself is valid (end is greater that the start, start is not negative, and end is not larger than the file itself. Note that ‘bytes:-‘ is not a valid request. If the range is not valid, you must output
header('HTTP/1.1 416 Requested Range Not Satisfiable');
(yet again, many scripts get this wrong by sending 400 errors or other codes). Do not try to guess or fix the range(s) as it may result in corrupted downloads, which are more dangerous than failed ones.
Then, you must send a bunch of headers:
header('HTTP/1.1 206 Partial Content');
header('Accept-Ranges: bytes');
header("Content-Range: bytes $start-$end/$filesize");
$content_length = $end - $start + 1;
header("Content-Length: $length");
Every line contains a ‘gotcha’. Many developers forget to send the 206 code or the Accept-Ranges. Don’t forget that given a file size of 1000 bytes, a full range would be 0-999 so the Content-Range would be expressed as Content-Range: bytes 0-999/1000. Yet others forget that when you send a range, the Content-Length must match the length of the range rather than the size of the whole file.
You can output the file using the method described above, skipping until the start of the range and delivering the length of the range.

Closing thoughts

I did my best to provide only accurate information. It would be truly sad for me if an article about avoiding common PHP errors contained errors itself.
Regardless, my point stands: PHP makes it easy to hack together code that appears to be working, but developers should read and adhere to the official specifications.
UPDATE: I released a free script that adheres to the above guidelines.

Nguồn: http://www.richnetapps.com/the-right-way-to-handle-file-downloads-in-php/

Không có nhận xét nào:

Đăng nhận xét