Skip to content
Advertisement

PHP – What is the process from uploading a client-side file to the point PHP saves it in the tmp file

What is the process that happens from when a client-side file(s) is uploaded to the server, up to the point when PHP saves the uploaded file(s) in tmp file(s).

That is from this point (file upload form):

<!doctype html>
<html>
    <h1>Upload new File</h1>
    <form method="post enctype=multipart/form-data" action="example.php">
        <input type="file" name="file">
        <input type="submit" value="Upload">
    </form>
</html>

To this point:

<?php 

$filePath = $_FILES['file']['tmp_name']

Is any of the file content read into memory at any point before the file is saved to the tmp file?

Uploading 3 files with enctype="multipart/form-data" will send something like this:

POST / HTTP/1.1
[[ Less interesting headers ... ]]
Content-Type: multipart/form-data; boundary=---------------------------735323031399963166993862150
Content-Length: 834

-----------------------------735323031399963166993862150
Content-Disposition: form-data; name="text1"

text default...

-----------------------------735323031399963166993862150
Content-Disposition: form-data; name="file1"; filename="a.txt"
Content-Type: text/plain

Content of a.txt...

-----------------------------735323031399963166993862150
Content-Disposition: form-data; name="file2"; filename="a.html"
Content-Type: text/html

<!DOCTYPE html><title>Content of a.html.</title>

If the file content is not read into memory, how does this get parsed without reading any of the file content into memory? Is hard drive space used where another temporary file is created which acts like the memory to parse it?

If the file content is read into memory, then how are large files handled, for example, if the uploaded file size is 10GB but the memory is only 2GB?

Advertisement

Answer

You don’t need to read a complete data stream into memory in order to parse it. And in this case PHP doesn’t even need to understand the actual attachment code: all it has to do is dump it to a temporary file at upload_tmp_dir. You can design this process to use almost as little memory as you want, trading it for I/O and CPU cycles.

I have no idea of the actual implementation in C code but I remember writing a simple parser at school with the help of a finite-state machine, which in plain English is a set of boolean variables that tell you where you are (states) and get updated as you keep reading more input. I can conceive a file upload parser that reads the input socket in chunks:

  • It can determine there’s a new POST field when it finds the -----------------------------735323031399963166993862150 boundary.
  • It can determine the field is a file when there’s a filename attribute in the Content-Disposition header.
  • It can determine the file starts when there’s an empty line right after headers and create a temporary file.
  • It can keep appending input to the temp file until the boundary is found again.

Etc.

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement