How to fix MissingSchema error in Python requests

Oct 22, 2023 ยท 6 min read

Introduction

The requests module is a popular Python library for sending HTTP requests and interacting with web APIs. But like any other code, you can run into errors - one common one being the MissingSchema error. In this guide, we'll understand what causes this error, and various ways to fix and handle it properly.

What is MissingSchema Error?

The MissingSchema error occurs when you try to make a request to a URL without specifying the protocol - http:// or https://.

For example:

import requests

response = requests.get("www.example.com")

This will raise the error:

requests.exceptions.MissingSchema: Invalid URL 'www.example.com': No schema supplied.
Perhaps you meant '<http://www.example.com>'?

Python requests requires the protocol to understand how to connect to the URL.

When Does it Occur?

Some common cases when you may see this error:

  • Forgetting to include http:// or https:// in the URL string
  • Trying to use a relative URL like /path/to/page instead of absolute URL
  • Passing an invalid URL string without proper structure
  • Reading URL from user input or external source without validation
  • So the key is the URL string being passed to requests is invalid or incomplete.

    Impact of the Error

    The MissingSchema exception will cause the request to fail and raise an error. This will stop further execution of the code unless the exception is handled properly.

    So it's important to fix and handle this error to make sure your program doesn't crash when encountering such URLs.

    Causes of MissingSchema

    To understand how to fix this error, let's first see some of the common causes in detail:

    Forgetting the Protocol

    The most common reason for this error is simply missing the http:// or https:// in the URL string:

    # Missing http://
    url = "www.example.com"
    

    This can happen accidentally while coding, especially when constructing URLs dynamically.

    Using a Relative URL

    Another reason is trying to use a relative URL like /page/2 instead of absolute URL.

    For example:

    response = requests.get("/products")
    

    Relative URLs are shortcuts for linking internal pages but won't work directly with requests.

    Invalid URL String

    Sometimes the URL string may be constructed incorrectly or contain errors like missing dots:

    url = "<http://examplecom>"  # missing dot after example
    

    This can happen when building URLs dynamically or via user input.

    Fixing MissingSchema

    Now let's see various ways to fix the MissingSchema error:

    Check for Protocol in URL

    The simplest fix is to ensure the URL contains http:// or https:// before sending the request:

    if not url.startswith("http"):
        url = "http://" + url
    

    Or better, use urllib.parse.urlparse() to validate the URL components:

    from urllib.parse import urlparse
    
    if not urlparse(url).scheme:
        url = "http://" + url
    

    Handle Relative URLs

    To handle relative URLs, we need to combine it with the base URL of the site:

    from urllib.parse import urljoin
    
    base_url = "<http://example.com>"
    relative_url = "/page/2"
    
    full_url = urljoin(base_url, relative_url)
    # <http://example.com/page/2>
    

    The urljoin() function will merge relative paths with the base URL correctly.

    Validate URL String

    For user input URLs, we should validate it before sending the request:

    from urllib.parse import urlparse
    
    if not urlparse(url).netloc:
        # Invalid URL, raise error or log warning
        raise ValueError("Invalid URL")
    

    This will catch invalid URLs and prevent the MissingSchema error.

    Handling MissingSchema Gracefully

    In addition to fixing the error, it's also important to handle it gracefully using exceptions and logging.

    Try-Except Blocks

    We can use try-except blocks to handle MissingSchema:

    import requests
    from urllib.parse import urljoin
    
    try:
       response = requests.get(url)
    except requests.exceptions.MissingSchema:
       print("Invalid URL")
    
    # or handle specific cases:
    
    try:
       response = requests.get(url)
    except requests.exceptions.InvalidURL:
       raise ValueError("Invalid URL")
    except requests.exceptions.MissingSchema:
       url = urljoin("http://", url)
       response = requests.get(url)
    

    This prevents the program from crashing and we can take appropriate actions.

    Raise Custom Exceptions

    We can define custom exceptions to raise errors on specific conditions:

    class InvalidURLError(Exception):
        pass
    
    # Check URL with regex or urlparse
    if not valid_url(url):
       raise InvalidURLError("Invalid URL: "+url)
    

    This allows us to notify callers of our API about invalid URLs.

    Log Errors

    Using the logging module, we can log MissingSchema errors to debug later:

    import logging
    
    logger = logging.getLogger(__name__)
    
    try:
       response = requests.get(url)
    except MissingSchema:
       logger.error("Invalid URL: %s", url)
    

    The logs can be output to console or a file for analysis.

    Preventing MissingSchema Errors

    Prevention is better than cure, so let's look at some best practices:

    Validate User Input URL

    If your code takes URL as user input, always validate it first:

    from urllib.parse import urlparse
    
    url = input("Enter URL: ")
    if not urlparse(url).scheme:
        print("Invalid URL")
        exit()
    

    This ensures malformed URLs are not passed to requests.

    Use Relative URLs Cautiously

    Avoid relative URLs as much as possible. If you need to use them, always join with base URL first.

    Standardize URL Handling

    Have a standard function to handle URL validation and processing before sending requests. This avoids duplicate code and reduces errors.

    Best Practices

    Here are some overall best practices for avoiding MissingSchema errors:

  • Always include the full protocol and hostname in URLs
  • Prefer absolute URLs over relative URLs
  • Use a URL parsing/validation library like urllib
  • Handle exceptions properly for robustness
  • Log and monitor errors using the logging module
  • Validate user input URL before requests
  • Standardize URL handling in a common function
  • Other Ways to Fix MissingSchema

    Here are some other methods to try if you still face MissingSchema errors:

  • Upgrade to the latest Requests module in case a bugfix was released
  • Double check environment and imported modules
  • Change code to use absolute URLs only
  • Refactor code to make URL handling more consistent
  • Related Errors and Issues

    Here are some other requests errors and issues that are good to know about:

  • ConnectionError - Cannot connect to the server
  • Timeout - Request timed out
  • TooManyRedirects - Exceeded max redirects
  • HTTPError - HTTP error response like 404 or 500
  • Debugging techniques:

  • Print URL before sending request to verify
  • Check for issues with environment or imported modules
  • Use logging and traceback to analyze source of error
  • Conclusion

    The MissingSchema error in Python requests occurs when the URL string is invalid or missing the protocol. By understanding what causes it and using the right techniques to handle it, you can make your requests-based programs more robust and failure-proof. The key is to validate URLs, use absolute URLs, handle exceptions properly and apply defensive coding practices.

    Browse by tags:

    Browse by language:

    Tired of getting blocked while scraping the web?

    ProxiesAPI handles headless browsers and rotates proxies for you.
    Get access to 1,000 free API credits, no credit card required!