The Risk of File Uploads


File upload functionality is one of the most dangerous features an application can expose. An unrestricted file upload can lead to remote code execution, malware distribution, data breaches, and server compromise. Every file upload endpoint must be treated as a critical attack surface.


Threat Model


| Attack | Description |

|--------|-------------|

| Malicious file upload | Attacker uploads a PHP shell or executable |

| File size DoS | Huge files exhaust disk space or memory |

| Path traversal | Filename manipulates directory traversal |

| MIME type spoofing | File extension does not match content |

| Malware distribution | Legitimate-looking files containing malware |

| Zip bombs | Compressed archive that expands to enormous size |

| SSRF via file processing | Server-side parsing of attacker-controlled files |


Validation Strategy


1. Validate File Extension


Allowlist-based validation is essential. Blocklisting (e.g., rejecting `.exe` files) will always miss edge cases.



ALLOWED_EXTENSIONS = {

    # Images

    '.jpg', '.jpeg', '.png', '.gif', '.webp', '.svg',

    # Documents

    '.pdf', '.doc', '.docx', '.xls', '.xlsx',

    # Other

    '.txt', '.csv'

}



def validate_extension(filename):

    ext = os.path.splitext(filename)[1].lower()

    if ext not in ALLOWED_EXTENSIONS:

        raise ValueError(f"Extension {ext} not allowed")


2. Validate MIME Type


Never trust the `Content-Type` header from the client. Inspect the actual file content:



import magic



def validate_mime(file_stream):

    mime = magic.from_buffer(file_stream.read(2048), mime=True)

    file_stream.seek(0)

    ALLOWED_MIMES = {

        'image/jpeg', 'image/png', 'image/gif',

        'image/webp', 'application/pdf',

        'text/plain', 'text/csv'

    }

    if mime not in ALLOWED_MIMES:

        raise ValueError(f"MIME type {mime} not allowed")


3. Validate File Size


Enforce strict limits at multiple layers:



// Express middleware

const multer = require('multer');

const upload = multer({

    limits: {

        fileSize: 10 * 1024 * 1024, // 10 MB

        files: 1

    },

    fileFilter: (req, file, cb) => {

        const allowed = ['image/jpeg', 'image/png', 'application/pdf'];

        if (!allowed.includes(file.mimetype)) {

            return cb(new Error('Invalid file type'), false);

        }

        cb(null, true);

    }

});


4. Validate File Content


For images, attempt to re-process them. This strips embedded metadata and breaks hidden payloads:



from PIL import Image

import io



def sanitize_image(file_bytes):

    """Re-encode image to strip metadata and break embedded payloads."""

    img = Image.open(io.BytesIO(file_bytes))

    # Convert to ensure clean output

    img = img.convert('RGB')

    output = io.BytesIO()

    img.save(output, format='PNG')

    return output.getvalue()


Secure Storage


Never Store in Webroot


Storing uploaded files inside the web server's document root is dangerous. If the filename or path is guessable, files can be accessed directly.



# UNSAFE: Stored in webroot

upload_dir = '/var/www/html/uploads/'



# SAFE: Outside webroot

upload_dir = '/data/uploads/'  # Served through app logic


Generate Safe Filenames


Never use user-provided filenames. Generate random filenames:



import uuid

import os



def safe_filename(original_name):

    ext = os.path.splitext(original_name)[1].lower()

    return f"{uuid.uuid4().hex}{ext}"


Object Storage


For production systems, use object storage with pre-signed URLs:



import boto3



s3 = boto3.client('s3')



def upload_file(file_bytes, content_type):

    key = f"uploads/{uuid.uuid4().hex}.pdf"

    s3.put_object(

        Bucket='my-app-uploads',

        Key=key,

        Body=file_bytes,

        ContentType=content_type

    )

    return key



def get_download_url(key, expires=3600):

    return s3.generate_presigned_url(

        'get_object',

        Params={'Bucket': 'my-app-uploads', 'Key': key},

        ExpiresIn=expires

    )


Antivirus Scanning


Integrate virus scanning into the upload pipeline:



import subprocess



def scan_file(file_path):

    result = subprocess.run(

        ['clamscan', '--stdout', file_path],

        capture_output=True,

        text=True

    )

    if 'FOUND' in result.stdout:

        os.remove(file_path)

        raise SecurityError("Malware detected in uploaded file")


Storage Limits and Quotas


| Level | Limit | Mitigation |

|-------|-------|------------|

| Per file | 10 MB | Reject on client and server |

| Per user | 500 MB | Track in database, reject when exceeded |

| Per day (total) | 5 GB | Aggregate monitoring, rate limit |

| Disk usage | 80% capacity | Alert, stop accepting uploads |


Server Configuration


Prevent uploaded files from being executed by the web server:



# Nginx configuration

location /uploads/ {

    # Only serve specific file types

    location ~* \.(jpg|jpeg|png|gif|pdf)$ {

        add_header Content-Disposition 'attachment';

        expires 30d;

    }

    # Deny everything else

    location ~* \. {

        deny all;

        return 404;

    }

}


Summary


Secure file upload requires defense in depth. Validate extensions and MIME types server-side, sanitize images by re-encoding them, generate random filenames, store files outside the webroot or in object storage, enforce strict size limits at multiple layers, and scan for malware. Never trust client-provided metadata, and process uploaded files with minimal privileges in isolated environments.