Python Requests Cheatsheet

Jan 9, 2024 ยท 9 min read

Overview

Requests is a HTTP library for Python that allows you to send HTTP requests easily. Some key features of requests:

  • Make HTTP requests with simple Python code
  • JSON encoding and decoding
  • SSL Verification support
  • Automatic content decoding
  • Connection timeouts
  • Browser-style HTTP cookies
  • File uploads with multipart encoding
  • Helper methods for GET, POST, PUT, OPTIONS, DELETE requests
  • Making Requests

    Import requests:

    import requests
    

    Make a GET request:

    response = requests.get('<https://api.example.com/data>')
    

    Make a POST request:

    payload = {'key1': 'value1', 'key2': 'value2'}
    response = requests.post('<https://api.example.com/data>', data=payload)
    

    Response Content

    Get the response content as a string:

    html = response.text
    

    Get JSON response content:

    data = response.json()
    

    Get binary content:

    image = response.content
    

    Status Codes

    Check the status code:

    if response.status_code == 200:
    	print('Success!')
    elif response.status_code == 404:
    	print('Not Found.')
    

    Request Headers

    View request headers:

    headers = response.request.headers
    

    Add custom headers:

    headers = {'User-Agent': 'My Script'}
    response = requests.get(url, headers=headers)
    

    Query Parameters

    Add parameters to URL:

    params = {'key1': 'value1', 'key2': 'value2'}
    response = requests.get(url, params=params)
    

    POST Data

    Send data in request body:

    data = {'key': 'value'}
    response = requests.post(url, data=data)
    

    Send form-encoded data:

    data = {'key1': 'value1', 'key2': 'value2'}
    response = requests.post(url, data=data)
    

    File Uploads

    Upload file:

    files = {'file': open('report.xls', 'rb')}
    response = requests.post(url, files=files)
    

    Upload multiple files:

    files = {'file1': open('report.xls', 'rb'),
             'file2': open('data.json', 'rb')}
    response = requests.post(url, files=files)
    

    Timeouts

    Set connection timeouts:

    requests.get(url, timeout=3.05)
    

    Authentication

    Pass HTTP Basic Auth credentials:

    response = requests.get(url, auth=('user', 'pass'))
    

    Use OAuth1:

    import requests_oauthlib
    
    oauth = requests_oauthlib.OAuth1('client_key', client_secret='secret')
    response = requests.get(url, auth=oauth)
    

    Sessions

    Create a session to persist parameters across requests:

    session = requests.Session()
    
    session.params = {'key': 'value'}
    
    response = session.get('<http://httpbin.org/get>')
    

    Error Handling

    Check if a response was successful:

    if response.status_code == 200:
       # successful request
    elif response.status_code == 404:
       # handle 404 error
    

    Catch connection errors:

    try:
       response = requests.get(url, timeout=3)
    except requests.exceptions.ConnectionError:
       # handle connection error
    

    SSL Verification

    Verify SSL certificate to ensure request authenticity:

    response = requests.get(url, verify=True)
    

    Suppress SSL warnings for insecure requests:

    response = requests.get(url, verify=False)
    

    Proxy Servers

    Make requests over a proxy server:

    proxies = {
       'http': '<http://10.10.1.1:3128>',
       'https': '<http://10.10.1.1:1080>'
    }
    requests.get(url, proxies=proxies)
    

    Advanced Section

    More Examples of Different HTTP Request Types

    PUT Request:

    data = {'key':'value'}
    response = requests.put('<https://api.example.com/data>', data=data)
    

    DELETE Request:

    response = requests.delete('<https://api.example.com/data/1>')
    

    HEAD Request:

    response = requests.head('<http://example.com>')
    print(response.headers)
    

    OPTIONS Request:

    response = requests.options('<https://api.example.com/data>')
    print(response.headers['Allow']) # allowed HTTP methods
    

    Using Sessions for Efficiency

    session = requests.Session()
    session.auth = ('username', 'password')
    
    response = session.get('<https://api.example.com/user>')
    # subsequent requests will use authentication
    

    Handling Cookies

    url = '<http://example.com>'
    cookies = {'my_cookie': 'cookie_value'}
    response = requests.get(url, cookies=cookies)
    

    Streaming Response Content

    with requests.get(url, stream=True) as response:
        for chunk in response.iter_content(8192):
            print(chunk)
    

    Setting Timeouts & Retries

    from requests.exceptions import ConnectionError
    
    try:
        response = requests.get(url, timeout=3.05)
    except ConnectionError as ce:
        response = requests.get(url, timeout=5)
    

    Custom SSL Certificate Verification

    response = requests.get(url, verify='path/to/cert.pem')
    

    Authentication to APIs

    url = '<https://api.example.com/data>'
    oauth = OAuth1('client_key', client_secret='secret')
    response = requests.get(url, auth=oauth)
    

    Using Proxies

    proxies = {'http': '<http://10.10.1.1:3128>'}
    response = requests.get(url, proxies=proxies)
    

    Optimizing Performance with Keepalive & Connection Pools

    session = requests.Session()
    adapter = requests.adapters.HTTPAdapter(pool_connections=100, pool_maxsize=100)
    session.mount('http://', adapter)
    
    response = session.get(url) # reused connection
    

    Mocking Out Requests for Testing

    import requests_mock
    
    with requests_mock.mock() as m:
        m.get('<http://test.com>', text='data')
        response = requests.get('<http://test.com>')
        print(response.text) # prints 'data'
    

    Exceptions & Error Handling

    try:
        response = requests.get(url, timeout=3)
    except requests.exceptions.Timeout:
        # Handle timeout
    except requests.exceptions.SSLError:
        # Handle SSL error
    

    Debugging Requests with Hooks & Logging

    import logging
    import http.client as http_client
    
    http_client.HTTPConnection.debuglevel = 1
    
    logging.basicConfig()
    logging.getLogger().setLevel(logging.DEBUG)
    requests_log = logging.getLogger("requests.packages.urllib3")
    requests_log.setLevel(logging.DEBUG)
    requests_log.propagate = True
    
    response = requests.get(url)
    

    Multipart File & Data Uploads

    files = {'file': open('report.pdf', 'rb')}
    data = {'key':'value'}
    response = requests.post(url, files=files, data=data)
    

    JSON Techniques

    # Serialize data to JSON
    data = {'key': 'value'}
    json_data = json.dumps(data)
    
    # Decode JSON response
    response = requests.get(url)
    data = response.json()
    
    # Encode params into JSON
    params = {'key': 'value'}
    response = requests.get(url, json=params)
    

    Custom User-Agents and Headers

    headers = {
        'User-Agent': 'My Bot 1.0',
        'Authorization': 'Bearer <<token>>'
    }
    
    response = requests.get(url, headers=headers)
    

    Response Metadata - Access Headers, Encoding, History

    response = requests.get(url)
    
    print(response.headers['Content-Type']) # headers
    print(response.encoding) # 'utf-8'
    print(response.history) # response history
    print(response.url) # final URL
    

    Handling Compression and Encodings

    response = requests.get(url)
    content = response.content
    
    if response.headers['Content-Encoding'] == 'gzip':
        content = gzip.decompress(content)
    elif response.headers['Content-Encoding'] == 'deflate':
        content = deflate.decompress(content)
    

    Streaming Uploads/Downloads

    with requests.post(url, data=open('file.bin', 'rb'), stream=True) as response:
        print(response.content) # streamed upload
    
    with requests.get(url, stream=True) as response:
        response.raw.decode_content = True
        with open('filename', 'wb') as fd:
            for chunk in response.iter_content(2048):
                fd.write(chunk) # streamed download
    

    Additional HTTP Request Types

    PUT Request

    Update a resource:

    data = {'key': 'new value'}
    response = requests.put(url, data=data)
    

    PATCH Request

    Partial update of resource:

    data = {'name': 'new name'}
    response = requests.patch(url, data=data)
    

    HEAD Request

    Get headers for resource:

    response = requests.head(url)
    print(response.headers)
    

    OPTIONS Request

    Get allowed HTTP methods:

    response = requests.options(url)
    print(response.headers['Allow'])
    

    Authentication Methods

    Basic Auth

    requests.get(url, auth=('user', 'pass'))
    

    Digest Auth

    requests.get(url, auth=HTTPDigestAuth('user', 'pass'))
    

    OAuth 1

    import requests_oauthlib
    oauth = requests_oauthlib.OAuth1(client_key, client_secret)
    requests.get(url, auth=oauth)
    

    API Keys

    headers = {'X-API-Key': 'abc123'}
    requests.get(url, headers=headers)
    

    JSON Web Tokens

    headers = {'Authorization': 'Bearer {token}'}
    requests.get(url, headers=headers)
    

    Handling Pagination

    Extract next page URL

    response = requests.get(url)
    next_page = response.links['next']['url']
    

    Iterate pages manually

    while next_page:
       data = response.json()
       # do something
    
       next_page = response.links.get('next', {}).get('url')
       if next_page:
           response = requests.get(next_page)
    

    Automate paging

    import requests
    
    def get_pages(url):
        response = requests.get(url)
        yield response.json()
    
        next_page = response.links.get('next')
        while next_page:
            response = requests.get(next_page['url'])
            yield response.json()
            next_page = response.links.get('next')
    
    for page in get_pages(url):
       print(page)
    

    Tips and Tricks

    Global Timeouts

    import requests
    
    requests.defaults(timeout=3)
    response = requests.get(url)
    

    Session Objects

    session = requests.Session()
    session.headers.update({'User-Agent': 'my-app/0.0.1'})
    response = session.get(url)
    

    Extract Links

    import re
    import requests
    
    response = requests.get(url)
    
    for link in re.findall('<a href="(.*?)">', response.text):
        print(link)
    

    Custom User-Agent

    headers = {'User-Agent': 'My Bot 1.0'}
    requests.get(url, headers=headers)
    

    Exceptions & Troubleshooting

    Common Exceptions

    try:
       response = requests.get(url, timeout=3)
    except requests.exceptions.Timeout:
       # Could not connect in time
    
    except requests.exceptions.ToomanyRedirects:
       # Exceeded max redirects
    
    except requests.exceptions.SSLError:
       # SSL Certificate issue
    

    Get Failure Reason

    try:
       response = requests.get(url)
       response.raise_for_status()
    except requests.exceptions.HTTPError as e:
        print(e.response.text) # the failure reason
    

    Debug Failed Requests

    import logging
    import http.client
    
    http.client.HTTPConnection.debuglevel = 2
    
    logging.basicConfig()
    logging.getLogger().setLevel(logging.DEBUG)
    requests_log = logging.getLogger("requests.packages.urllib3")
    requests_log.setLevel(logging.DEBUG)
    requests_log.propagate = True
    
    try:
       response = requests.get(url)
    except Exception as e:
       print(e)
    

    Sending Specialized Data

    Binary Data

    files = {'file': ('report.docx', open('report.docx', 'rb'), 'application/vnd.openxmlformats-officedocument.wordprocessingml.document')}
    
    response = requests.post(url, files=files)
    

    Custom Encoding

    data = json.dumps(payload).encode('utf-8')
    
    response = requests.post(url, data=data, headers={'Content-Type': 'application/json'})
    

    GZip Compressed Data

    import gzip
    data = gzip.compress(b'input data')
    
    response = requests.post(url, data=data,
                              headers={'Content-Encoding': 'gzip'})
    

    Efficiency Techniques

    Keepalive Connections

    session = requests.Session()
    session.keep_alive = False # reuse connection
    
    session.get(url1)
    session.get(url2)
    

    Connection Pooling

    import requests
    
    session = requests.Session()
    adapter = requests.adapters.HTTPAdapter(pool_connections=100, pool_maxsize=100)
    
    session.mount('https://', adapter)
    session.get(url) # Reuses connection
    

    Mocks and Testing

    Mock Request

    import requests_mock
    
    with requests_mock.mock() as m:
    
        m.get(url, text='data')
        response = requests.get(url)
    
        print(response.text) # data
    

    Response Simulation

    m = requests_mock.Mocker()
    m.get(url, text='Success')
    
    with m:
        response = requests.get(url)
        print(response.text)
    

    Integration Testing

    import responses # monkeypatch requests
    import mymodule
    
    def api_callback(request):
       return 200, {}, 'OK'
    
    responses.add_callback(
       responses.GET,
       '<https://api.example.com/data>',
       callback=api_callback
    )
    
    # Tests mymodule thatinternally calls requests
    result = mymodule.get_data()
    

    3rd Party Libraries

    BeautifulSoup Parsing

    import requests
    from bs4 import BeautifulSoup
    
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')
    
    print(soup.find('h1').text)
    

    Scrapy Integration

    import scrapy
    import requests
    
    class MySpider(scrapy.Spider):
    
        name = 'myspider'
    
        def start_requests(self):
            url = '<http://example.com>'
            request = requests.Request('GET', url)
            prepared = request.prepare()
    
            yield scrapy.Request(url=prepared.url, callback=self.parse)
    
    

    Here is a comparison of some popular alternate Python HTTP libraries:

    requests - The most popular library. Simple, intuitive API, powerful features, works for most cases. Lacks async support.

    urllib - Python's built-in HTTP library. Lower-level, less intuitive, fewer helper methods. Useful for basic HTTP needs.

    httpx - Built on requests, adds async support, HTTP/2, connection pooling, timeouts. Modern alternative.

    aiohttp - Async HTTP library for use with asyncio. Great for concurrency and parallel requests.

    httpie - User-friendly command line HTTP client. Great for testing/debugging APIs. Less features than requests.

    scrapy - Specialized web crawling and scraping framework. Great for large scraping projects. Lots of customization.

    Comparison

  • requests - Best all-round library, easy to use, batteries included
  • urllib - Only for trivial HTTP needs, too low-level
  • httpx - Modern, async requests alternative
  • aiohttp - Async concurrency for advanced use cases
  • httpie - Convenient CLI testing
  • scrapy - Heavy duty web scraping
  • The best choice depends on your specific requirements. Requests is the easiest general purpose library. Httpx for async, aiohttp for advanced concurrency, scrapy for large web scraping projects.

    Browse by tags:

    Browse by language:

    Tired of getting blocked while scraping the web?

    ProxiesAPI handles headless browsers and rotates proxies for you.
    Get access to 1,000 free API credits, no credit card required!