How to Use the NanoGPT API with Python.

Before you gain knowledge: This is my first post on here and I am trying to leave a good first impression, I would be more than happy to get reviews on this and what could be changed or optimised to make good tutorials on here! :) If you've been looking for a private, OpenAI-compatible API that doesn't hoard your prompts for training data, NanoGPT is worth checking out. It speaks the same language as OpenAI's API, meaning most of your existing code works with minimal changes, but your data stays yours. In this guide, I'll walk you through everything: installation, auth, making basic requests, streaming responses, and handling errors properly. By the end, you'll have a working Python client you can drop into any project. Before we write any code, let's talk about why you'd pick this over just hitting OpenAI directly: Privacy-first: Your prompts and completions aren't used for model training OpenAI-compatible: Drop-in replacement for most tools and libraries Model variety: Access to models like MiniMax M2.7 and others without managing infrastructure Simple pricing: Pay-per-token, no enterprise contracts required If you're building something where user data privacy matters — and honestly, when doesn't it? this is a solid choice. For more context on private AI tools, check out ai-privacy-tools.vercel.app. Start by installing the official nanogpt package: pip install nanogpt Or, if you prefer working with raw HTTP (no judgment — sometimes you want to see exactly what's going on), you can use requests or httpx instead. The API is standard REST + SSE, so any HTTP client works. If you want the full OpenAI SDK experience, install the OpenAI package and point it at NanoGPT's base URL: pip install openai Both approaches work. I'll show you both below. Grab your API key from nano-gpt.com and set it as an environment variable: export NANOGPT_API_KEY="your-api-key-here" Never hardcode your API key in source code. Seriously. I've seen production repos with API keys in them and it's always a bad day. Let's start with a simple request using the requests library: import requests import json BASE_URL = "https://nano-gpt.com/api/v1" API_KEY = os.environ.get("NANOGPT_API_KEY") headers = { "Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json" } payload = { "model": "minimax/minimax-m2.7", "messages": [ {"role": "system", "content": "You are a helpful coding assistant."}, {"role": "user", "content": "Write a Python function to flatten a nested list."} ], "temperature": 0.7, "max_tokens": 1024 } response = requests.post(f"{BASE_URL}/chat/completions", headers=headers, json=payload) data = response.json() print(data["choices"][0]["message"]["content"]) That's it. If you've worked with OpenAI's API before, this should look completely familiar. Same endpoint structure, same request/response format. Here's the same thing using the official OpenAI Python SDK pointed at NanoGPT: from openai import OpenAI import os client = OpenAI( api_key=os.environ.get("NANOGPT_API_KEY"), base_url="https://nano-gpt.com/api/v1" ) response = client.chat.completions.create( model="minimax/minimax-m2.7", messages=[ {"role": "system", "content": "You are a helpful coding assistant."}, {"role": "user", "content": "Write a Python function to flatten a nested list."} ], temperature=0.7, max_tokens=1024 ) print(response.choices[0].message.content) This approach is great because any existing code that uses the OpenAI SDK can be migrated to NanoGPT by just changing two lines the API key and the base URL. For longer responses, streaming makes a huge difference in perceived performance. Here's how to do it: import requests import json BASE_URL = "https://nano-gpt.com/api/v1" API_KEY = os.environ.get("NANOGPT_API_KEY") headers = { "Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json", "Accept": "text/event-stream" } payload = { "model": "minimax/minimax-m2.7", "messages": [ {"role": "user", "content": "Explain the difference between TCP and UDP."} ], "stream": True, "max_tokens": 1024 } response = requests.post( f"{BASE_URL}/chat/completions", headers=headers, json=payload, stream=True ) for line in response.iter_lines(): if line: line = line.decode("utf-8") if line.startswith("data: "): chunk = line[6:] if chunk.strip() == "[DONE]": break data = json.loads(chunk) delta = data["choices"][0]["delta"] if "content" in delta: print(delta["content"], end="", flush=True) print() # Newline after streaming The key things here: set "stream": True in the payload, add "Accept": "text/event-stream" to headers, and use requests.post(..., stream=True) so it doesn't buffer the entire response. Then iterate over lines and parse the SSE chunks. With the OpenAI SDK, streaming is even simpler: stream = client.chat.completions.create( model="minimax/minimax-m2.7", messages=[{"role": "user", "content": "Explain TCP vs UDP."}], stream=True ) for chunk in stream: if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="", flush=True) Don't skip this part. Here's a practical error handling wrapper you can actually use in production: import requests from requests.exceptions import RequestException import time class NanoGPTClient: def __init__(self, api_key, base_url="https://nano-gpt.com/api/v1"): self.base_url = base_url self.headers = { "Authorization": f"Bearer {api_key}", "Content-Type": "application/json" } def chat(self, messages, model="minimax/minimax-m2.7", max_retries=3, **kwargs): payload = { "model": model, "messages": messages, "max_tokens": kwargs.get("max_tokens", 1024), "temperature": kwargs.get("temperature", 0.7) } for attempt in range(max_retries): try: response = requests.post( f"{self.base_url}/chat/completions", headers=self.headers, json=payload, timeout=30 ) if response.status_code == 200: return response.json()["choices"][0]["message"]["content"] if response.status_code == 429: retry_after = int(response.headers.get("Retry-After", 2 ** attempt)) print(f"Rate limited. Retrying in {retry_after}s...") time.sleep(retry_after) continue if response.status_code == 401: raise ValueError("Invalid API key. Check your NANOGPT_API_KEY.") response.raise_for_status() except RequestException as e: if attempt == max_retries - 1: raise print(f"Request failed: {e}. Retrying ({attempt + 1}/{max_retries})...") time.sleep(2 ** attempt) raise RuntimeError("Max retries exceeded.") This handles the common pain points: Rate limiting (429): Respects the Retry-After header with exponential backoff Auth errors (401): Gives you a clear message instead of a cryptic stack trace Network failures: Retries with exponential backoff Timeouts: Configurable via the timeout parameter Here's a complete example that ties everything together — a simple CLI chatbot: import os from openai import OpenAI client = OpenAI( api_key=os.environ.get("NANOGPT_API_KEY"), base_url="https://nano-gpt.com/api/v1" ) def chat(): messages = [{"role": "system", "content": "You are a helpful assistant. Be concise."}] print("NanoGPT Chat (type 'quit' to exit)\n") while True: user_input = input("You: ").strip() if user_input.lower() in ("quit", "exit"): break messages.append({"role": "user", "content": user_input}) stream = client.chat.completions.create( model="minimax/minimax-m2.7", messages=messages, stream=True ) print("AI: ", end="") assistant_response = "" for chunk in stream: if chunk.choices[0].delta.content: text = chunk.choices[0].delta.content print(text, end="", flush=True) assistant_response += text print("\n") messages.append({"role": "assistant", "content": assistant_response}) if __name__ == "__main__": chat() What Value Base URL https://nano-gpt.com/api/v1 Auth header Authorization: Bearer YOUR_KEY Chat endpoint /chat/completions Default model minimax/minimax-m2.7 Streaming Set "stream": true in payload If you were already using OpenAI's Python SDK, migrating to NanoGPT is genuinely a two-line change. If you're starting fresh, you get a clean API that respects your privacy out of the box. The real win here is that you're not locked into a single provider. Since NanoGPT is OpenAI-compatible, you can swap between providers without rewriting your application code. That's the kind of flexibility worth building on. Got questions or hit a snag? Drop a comment below. Happy coding.

Key Takeaways

•Before you gain knowledge: This is my first post on here and I am trying to leave a good first impression, I would be more than happy to get reviews on this and what could be changed or optimised to make good tutorials on here! :) If you've been looking for a private, OpenAI-compatible API that doe

•This story was reported by Dev.to, covering developments in the dev space.

•AI advancements continue to reshape industries — read the full article on Dev.to for complete coverage.

How to Use the NanoGPT API with Python.

Key Takeaways

•This story was reported by Dev.to, covering developments in the dev space.

•AI advancements continue to reshape industries — read the full article on Dev.to for complete coverage.

How to Use the NanoGPT API with Python.

Key Takeaways

Related Articles

I let my AI agents rewrite their own prompts. The hard part was stopping them from getting worse.

Swift Structs — Access Control 🔐

I'm building my own Claude-based chatbot, powered by MongoDB

Free Models, Zero Compromise: Routing to Local and Free Tiers

Discussion

How to Use the NanoGPT API with Python.

Key Takeaways

Related Articles

I let my AI agents rewrite their own prompts. The hard part was stopping them from getting worse.

Swift Structs — Access Control 🔐

I'm building my own Claude-based chatbot, powered by MongoDB

Free Models, Zero Compromise: Routing to Local and Free Tiers

Discussion

Related Articles

Dev.to
I let my AI agents rewrite their own prompts. The hard part was stopping them from getting worse.
I let my AI agents rewrite their own prompts. The hard part was stopping them from getting worse. Most "self-evolving agent" demos die the moment you think about shipping them. Not because the idea is bad, but because an agent that can rewrite its own prompt can also quietly rewrite itself into some

Dev.to
Swift Structs — Access Control 🔐
So far, every property and method we've added to our structs has been freely accessible from anywhere. But that's not always what you want. Sometimes you need to protect certain data from being changed in ways that could break your logic. That's exactly what access control is for. 🍥 Let's say we're

Dev.to
I'm building my own Claude-based chatbot, powered by MongoDB
This tutorial was written by Néstor Daza. I use Claude every day. It's a great tool for my day-to-day work, helping with everything from research and text processing to, above all, writing code (nothing surprising here). But today, that little coder voice in my head asked me: what would it take to b

Dev.to
Free Models, Zero Compromise: Routing to Local and Free Tiers
Not every request needs a frontier model, and a surprising share of them can run for nothing at all. The problem is that "free" usually sounds like "worse," so teams pay for every request just to be safe. Routing is what removes that trade-off. There are actually two separate pools of zero-cost infe