Turning Data into Insights in 10 Mins with Custom GPTs and APIs

We just built Open Source Benchmark, an innovative GPT designed to streamline open-source software (OSS) comparisons. Powered by our own data APIs and third-party APIs, this GPT analyzes two repositories and offers instant insights for tech decision-making. Open the GPT now and click PyTorch vs Tensor to see how it works in action.

How It Started

On November 6, 2023, OpenAI unveiled custom GPTs that allow the creation of tailored ChatGPT versions with enhanced capabilities and diverse skills. This development significantly altered the tech landscape. According to the announcement blog, developers can:

Effortlessly customize ChatGPT without any coding requirements.
Enrich ChatGPT’s knowledge base using external sources ranging from third-party APIs, data APIs to web scraping and knowledge base files.

As the team behind OSSInsight.io, an open-source insight platform, we felt as inspired and empowered as any other developers. Harnessing the power of GPTs, we managed to develop Open Source Benchmark, an open-source comparative analysis GPT, in 10 minutes. This GPT integrates with OSSInsight APIs, GitHub APIs, other third-party APIs, and web scraping capabilities.

In this post, we will share the general construction process of this GPT to turn data into insights and summarize the key techniques used.

Configure the GPTs

ChatGPT provides a default GPT Builder to facilitate the GPT construction process. While it is very friendly, we feel it is a bit wordy. To construct the GPT prompts to be as concise and streamlined as possible, we decided to manually configure the GPT in the Configure tab in the GPTs Editor.

Add appropriate descriptions to your GPTs in the Name and Description fields.
Click on the GPT icon, and select Use DALLÂ·E to generate a custom avatar for your GPTs.
Enable the necessary capabilities of the GPT. It’s recommended that you check all the capabilities.
- ✅ Web browsing
- ✅ DALL-E Image Generation
- ✅ Code Interpreter
Configure the instructions, or prompts for the GPT that specifies how AI interacts with you. You can find the prompts we used here. We will also provide some useful tips we learned later in this post.
Provide some conversation starters as interaction shortcuts, for example:
- PyTorch vs TensorFlow
- TiDB vs Vitess
- React vs Vue
- Golang vs Rust-lang

Configure GPT Actions

In addition to using GPTs’ built-in capabilities, we needed to define custom actions to make specific APIs available to the GPT. For our purpose in the Open Source Benchmark GPT, we used the following three APIs:

TiDB Serverless’s Data API to access the database
GitHub API to get the GitHub project and repository information
OSSInsight’s image API for reporting capabilities

You can find all the action configurations in this blog.

Enable GPT Access to Database

In the Open Source Benchmark GPT, we imported the historical data from Hackers News and over 6 billion GitHub events into TiDB Serverless, our backend database service. To enable GPT’s access to TiDB Serverless, we leveraged Data Service, a TiDB Serverless feature that allows us to turn data into HTTP APIs directly with simple SQLs and configurations in the console. This way we can expose our data to GPTs easily.

Create the Endpoint

As an example, we needed to provide a Data API to query “the number of times a specific keyword is mentioned each year on Hackers News“.

Log into the TiDB Serverless cluster.
Enter the Data Service tab from the left navigation bar, and select a Data App.

Create an endpoint with the following SQL:

SQL
use hackernews;
SELECT
    COUNT(*) AS count,
    DATE_FORMAT(FROM_UNIXTIME(time), '%Y-01-01') AS date
FROM items
WHERE (LOWER(url) LIKE  lower(CONCAT('%', ${keyword},'%')) OR LOWER(title) LIKE lower(CONCAT('%', ${keyword},'%')) OR LOWER(text) LIKE lower(CONCAT('%', ${keyword},'%')))
  and time >= UNIX_TIMESTAMP(DATE_SUB(CURRENT_DATE, INTERVAL 10 YEAR))
GROUP BY 2
ORDER BY 2 ASC;

Configure the Path field with /hackernews/mention_per_year as below:
Click the Review Draft & Deploy button to deploy this API.

Get the API Specification

After deployment, Data Service will assist you in generating a JSON-format API Specification based on your endpoint’s configuration. This is crucial for connecting with GPTs. To get the API Specification:

Enter Integrate with GPTs – Overview, and click the Get Configuration button.
In the opened Integrate with GPTs dialog, get:
- The API Specification URL address for the API Specification file.The API Key Encoded required to access the Data API.

Configure the Data API Action

To configure the Data API Action for GPTs, follow these steps:

Import the API Specification.
In the Edit actions tab, copy the API Specification URL you obtained from the previous step, paste it into the Schema URL input box, and click the Import button. This way, GPT can access the basic information of the APIs provided through the Data Service.
Set up API Authentication.
1. In the Authentication configuration box, copy the API Key Encoded you obtained in the previous step, and fill it in the API Key field.
2. Select Basic for Auth Type to authorize GPTs to access our newly created API.

By completing these steps, the GPT is now set to perform tasks like “Show me the Hacker News Mentions of ChatGPT.” It can access the database and retrieve the desired results using the API provided by the Data Service.

Enable GPT Access to GitHub API

To enrich our GPTs’ capabilities in performing comparative analysis of GitHub repositories, we’ve integrated the GitHub API into our actions. This integration allows for the retrieval of precise data from GitHub repositories. For comprehensive information on GitHub REST APIs, please refer to GitHub REST API documentation.

To improve the efficiency, we used the “ActionsGPT” provided by OpenAI to fetch and generate API Specifications. Take the Get a Repository REST API for example, we can get the API Specifications in the following two steps:

In the right-bottom corner of the GPT Editor, click the Get help from ActionsGPT popup command.

Input the prompts as below, and you will get the API Specification text of the Get a Repository API:

Generate OpenAPI spec for "Get a repository" of GitHub REST API;must include schema of status code 200 in response, include ".*_count.*", except ".*\burl\b.*", must include: license, language, created date, last push date, repository size in properties

Enable GPT Access to Charting API of OSSInsight

While GPTs are capable of data visualization through the Code Interpreter feature, this approach is limited by network restrictions – GPTs must convert data into text-based Python code before execution, which can be very time-consuming.

To streamline the chart delivery and enhance user experience, we leverage an external API server provided on the OSSInsight end. This server takes over the task of chart creation and provides a URL of rendered image in the API response. This method significantly reduces wait times for users.

As you can see, GPTs can display images immediately using the Markdown format.

You can find the API spec of this action here.

Tips for GPTs’ Instructions

Instructions are the core that drives GPTs; guiding what GPTs should do and how they should do it. Although instructions can be auto-generated through the GPTs Builder, we can further refine them in the Instruction editor. You can get all the instructions for GPTs here.

Tip 1: Setting up a role for your GPTs

Typically, at the beginning of constructing a GPT prompt, we specify a role for the GPT. In our case of building the Open Source Benchmark GPT, we inform that “you are a data analyst.” The consequence of doing this is that when answering questions, GPT will specifically draw upon knowledge that aligns with its declared identity/role.

For instance, after adding this role declaration, its ability to write SQL becomes stronger. When summarizing text, it adopts the style of an analyst and provides insights accordingly.

You are a data analysis expert.

Tip 2: Re-Specifying API Usage in the Prompt

When we enter API Specification in the GPT Builder, it is worth noting that the purpose of each API is already detailed within the “description” field in the API spec file. However, due to the GPT’s contextual limitations, especially when it toggles between tasks in Multitask mode, it is necessary to highlight the data each API retrieves directly in the prompt during GPT construction.

Based on our experience, doing this could reduce the need for extensive AI reasoning and thereby result in more stable and reliable outputs.

GitHub API for getting repo basic info
OSS Insight API for star history comparison chart API and stargazer's geo&company distribution
Hackernews mentions per_year API for hackernews trend

Tip 3: Defining Output Templates

To ensure uniformity in output formatting, particularly for specific sections, we’ve employed the use of text templates. A prime example is the Markdown Table section in the prompt. This structured approach guides the AI in effectively organizing and presenting the data fetched from the API within this predefined template:

Output the data in a markdown table for easy comparison. add your own knowledge for more insight. at least 8 metrics.

| Dimension      | A           | B           |
|----------------|-------------|-------------|
| Dimension 1    | Detail A1   | Detail B1   |
| Dimension 2    | Detail A2   | Detail B2   |
| Dimension 3    | Detail A3   | Detail B3   |
| ...            | ...         | ...         |
| Dimension N    | Detail AN   | Detail BN   |

Tip 4: Guiding GPT with Structured Steps for Question Solving

Adding instructions to let AI follow the sequence Plan -> Tools -> Action -> Output -> Deep Insight helps the AI not only design a logical approach to problem-solving but also prevents it from taking shortcuts. This structured approach ensures a more thorough and diligent response.

Incorporating a sequential framework of Plan -> Tools -> Action -> Output -> Deep Insight into instructions significantly enhances the AI’s problem-solving methodology. This structured strategy aids in developing a logical approach that prevents GPTs from taking shortcuts, leading to more comprehensive and thoughtful responses.

Let's beginPlan:Tools:Action:Output:Deep Insight:

Summary

Our journey into reimagining existing applications in the age of GPTs has been both enlightening and enjoyable. Through this process, we’ve gained valuable insights into:

Enhancing Domain-Specific GPTs: We’ve harnessed the power of integrating multiple external data sources with GPTs. This approach has significantly broadened the capabilities of domain-specific GPTs.
Efficient API Integration: We learned how to quickly provide GPTs access to existing data without the need for extensive coding. Our use of TiDB Cloud Data Service stands testament to this.
Augmenting GPT Drawing Capabilities: By leveraging external API resources, we’ve improved the chartering capabilities of current GPTs for more effective data visualization.

To enable others to explore similar advancements, we have shared the complete configurations for building the ‘Open Source Repository Comparison GPT.’ For those interested in replicating this GPT, all the necessary details are available here.

Try TiDB Serverless

25 GiB Free
Auto-Scale with Ease
MySQL Compatible

Start Now

Product

October 17, 2024

Have questions? Let us know how we can help.

TiDB Cloud Dedicated

TiDB Cloudのエンタープライズ版。
専用VPC上に構築された専有DBaaSでAWSとGoogle Cloudで利用可能。

サインアップ詳細を見る

TiDB Cloud Serverless

TiDB Cloudのライト版。
TiDBの機能をフルマネージド環境で使用でき無料かつお客様の裁量で利用開始。

無料で始める詳細を見る

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Turning Data into Insights in 10 Minutes with Custom GPTs and APIs