Last Updated:

How to run own LLM


Here is an extensive guideline for running an LLM locally:

1. Run Jupyter Notebook and a Python runtime locally and use them to run an open source LLM like Anthropic's Claude.

Jupyter Notebook and Python are free and open source tools that can be easily installed locally on your computer. The Anthropic team has provided open source releases of Claude on GitHub that can be imported and used in Python and Jupyter notebooks. Here's a step by step guide to doing so:

A. Install Python and Jupyter Notebook on your computer using Anaconda. Anaconda is a convenient Python distribution that includes Python and Jupyter Notebook out of the box.

B. Install Claude's Python package from PyPI: pip install claude. This will give you access to Claude's model and capabilities within Python.

C. Launch Jupyter Notebook and create a new notebook. Import Claude by running import claude at the top of your notebook.

D. You can now call claude functions like claude.completions, claude.explain etc. to utilize Claude's natural language capabilities in your local notebook. See the Claude docs for details on all the available functions.

2. Use Docker to deploy an LLM like Cohere or GPT-3 in a local container.

Docker is a tool for running software applications in lightweight virtualized containers, which can be done entirely on your local machine. Here are the steps to deploy an API-accessible LLM like Cohere or GPT-3 locally with Docker:

A. Install Docker on your computer.

B. Pull the Cohere or GPT-3 docker image from DockerHub (or build your own from a Dockerfile). These images will include the necessary dependencies and runtimes to utilize the LLMs.

C. Pass your API key to the container as an environment variable: docker run -e "COHERE_API_KEY=xxx" cohereorg/cohere-gpu. Replace COHERE_API_KEY with your actual API key.

D. Send requests to the containerized LLM API from your local machine. For Cohere you can use their Python SDK locally to query the dockerized API and access full LLM capabilities.

3. Use Streamlit to build a simple front-end interface for local LLM serving

Streamlit allows you to quickly create web apps to visualize data and provide interfaces to other applications or models. Here is how to serve an LLM locally with Streamlit:

A. pip install streamlit
B. Write a streamlit Python script that loads your LLM and defines UI components to capture a user's input text and display the LLM's output.
C. Run streamlit run [your_script].py to launch the local web interface.
D. You now have a web form you and others can interact with to utilize the underlying LLM without API calls.

4. Install TensorFlow and run pretrained models like GPT-2 locally

Many LLMs release their pretrained models publicly which you can run entirely locally:

A. Install TensorFlow or PyTorch locally to utilize GPU/TPU acceleration.
B. Download model checkpoint files from an open source model like GPT-2.
C. Load model files directly into a TensorFlow or PyTorch model.
D. Generate text locally by sampling from the model.

No API keys or external calls required!

5. Use tools like Gradio to quickly build local interfaces

Gradio allows you to wrap any Python model or function in a user interface with just Python code.

You can follow their docs to:

A. pip install gradio
B. Import Gradio and decorate your LLM function with @gradio decorators
C. Launch the interface with gradio.Interface(). This auto-generates UIs for humans to interact locally with your LLM.