Handling File Uploads via CGI
Credit: Mauro Cicio
Problem
You want to let a visitor to your web site upload a file to the web server, either for storage or processing.
Solution
The CGI class provides a simple interface for accessing data sent through HTTP file upload. You can access an uploaded file through CGI#params as though it were any other CGI form variable.
If the uploaded file size is smaller than 10 kilobytes, its contents are made available as a StringIO object. Otherwise, the file is put into a Tempfile on disk: you can read the file from disk and process it, or move it to a permanent location.
Heres a CGI that accepts file uploads and saves the files to a special directory on disk:
#!/usr/bin/ruby # upload.rb # Save uploaded files to this directory UPLOAD_DIR = "/usr/local/www/uploads" require cgi require stringio
The CGI has two main parts: a method that prints a file upload form and a method that processes the results of the form. The method that prints the form is very simple:
def display_form(cgi)
action = env[script_name]
return <
Your email address:
EOF end
The method that processes the form is a little more complex:
def process_form(cgi) email = cgi.params[email_address][0] fileObj = cgi.params[file_name][0] str =
Upload report
+ "
Thanks for your upload, #{email.read}
" if fileObj path = fileObj.original_filename str += "Original Filename : #{path}" + cgi.br dest = File.join(UPLOAD_DIR, sanitize_filename(path)) str += "Destination : #{dest}" File.open(dest.untaint, wb) { |f| f << fileObj.read } # Delete the temporary file if one was created local_temp_file = fileObj.local_path() File.unlink(local_temp_file) if local_temp_file end return str end
The process_form method calls a method sanitize_filename to pick a new filename based on the original. The new filename is stripped of characters in the upload files name that aren valid on the servers filesystem. This is important for security reasons. Its also important to pick a new name because Internet Explorer on Windows submits filenames like "c:hotfondue.txt" where other browsers would submit "fondue.txt". Well define that method now:
def sanitize_filename(path) if RUBY_PLATFORM =~ %r{unix|linux|solaris|freebsd} # Not required for unix platforms since all characters # are allowed (except for /, which is stripped out below). elsif RUBY_PLATFORM =~ %r{win32} # Replace illegal characters for NTFS with _ path.gsub!(/[x00-x1f/|?*]/,\_) else # Assume a very restrictive OS such as MSDOS path.gsub!(/[/|?*+][ x00-x1fa-z]/,\_) end # For files uploaded by Windows users, strip off the beginning path. return path.gsub(/^.*[\/]/, \) end
Finally we have the CGI code itself, which calls the appropriate method and prints out the results in an HTML page:
cgi = CGI.new(html3) if cgi.request_method !~ %r{POST} buf = display_form(cgi) else buf = process_form(cgi) end cgi.out() do cgi.html() do cgi.head{ cgi.title{Upload Form} } + cgi.body() { buf } end end exit 0
Discussion
This CGI script presents the user with a form that lets them choose a file from their local system to upload. When the form is POSTed, CGI accepts the uploaded file data and stores it as a CGI parameters. As with any other CGI parameter (like email_address), the uploaded file is keyed off of the name of the HTML form element: in this case, file_name.
If the file is larger than 10 kilobytes, it will be written to a temporary file and the contents of CGI[:file_name] will be a Tempfile object. If the file is small, it will be kept directly in memory as a StringIO object. Either way, the object will have a few methods not found in normal Tempfile or StringIO objects. The most useful of these are original_filename, content_type, and read.
The original_filename method returns the name of the file, as seen on the computer of the user who uploaded it. The content_type method returns the MIME type of the uploaded file, again as estimated by the computer that did the upload. You can use this to restrict the types of file youll accept as uploads (note, however, that a custom client can lie about the content type):
# Limit uploads to BMP files. raise Wrong type! unless fileObj.content_type =~ %r{image/bmp}
Every StringIO object supports a read method that simply returns the contents of the underlying string. For the sake of a uniform interface, a Tempfile object created by file upload also has a read method that returns the contents of a file. For most applications, you don need to check whether youve got a StringIO or a Tempfile: you can just call read and get the data. However, a Tempfile can be quite largetheres a reason it was written to disk in the first placeso don do this unless you trust your users or have a lot of memory. Otherwise, check the size of a Tempfile with File.size and read it a block at a time.
To see where a Tempfile is located on disk, call its local_path method. If you plan to write the uploaded file to disk, its more efficient to move a Tempfile with FileUtils.mv than to read it into memory and immediately write it back to another location.
Temporary files are deleted when the Ruby interpreter exits, but some web frameworks keep a single Ruby interpreter around indefinitely. If you e not careful, a long-running application can fill up your disk or partition with old temporary files. Within a CGI script, you should explicitly delete temporary files when you e done with themexcept, of course, the ones you move to permanent positions elsewhere on the filesystem.
See Also
- RFC1867 describes HTTP file upload
- For more on the StringIO and Tempfile classes used to store uploaded files, see Recipe 6.8, "Writing to a Temporary File," and Recipe 6.15, "Pretending a String Is a File"
- http://wiki.rubyonrails.com/rails/pages/HowtoUploadFiles
Категории