Use case 1 – summarizing the text – Labeling Text Data – Data Labeling in Machine Learning with Python

07/12/2022
0

Summarization is a crucial NLP task that involves condensing a piece of text while retaining its essential information and main ideas. In the context of Azure OpenAI, the following code exemplifies the application of summarization using the GPT-3.5-turbo model deployed on the Azure platform.

The following code example begins by setting the necessary environment variables for the Azure OpenAI API, including the API key and endpoint. The OpenAI API is then configured with the deployment name of the model, allowing the code to interact with the specific GPT-3.5-turbo instance.

The input text, which is a detailed description of Dachepalli, a town in Andhra Pradesh, India, is provided for summarization. The code utilizes the Azure OpenAI Completion API to generate a summary, employing parameters such as temperature, max tokens, and penalties for frequency and presence.

The output of the code includes the generated summary, showcasing the main ideas extracted from the input text. The summarized content emphasizes key aspects such as the author’s connection to Dachepalli, the town’s features, and notable historical events. This example demonstrates how Azure OpenAI can effectively summarize information, providing concise and informative outputs.

Let’s start by importing the required libraries and getting the configuration values (the Azure OpenAI key and endpoint, API version, and the GPT model deployment name) that we have set already:
import os
openai.api.key=os.getenv(“AZURE_OPENAI_KEY”)
Openai.api_base=os.getenv(“AZURE_OPENAI_ENDPOINT”)
Openai.api_type=’azure’
Openai.api_version=’2023-5-15′ # this might change in the future
#this will correspond to the custom name you choose for your deployment when you deployed a model.
model_deployment_name = ‘your_azure_openai_model_name’
# Set the input text
text = “
create a summary of below text and provide main idea.\n\n Dachepalli is popular town in palnadu district in Andhra pradesh, India.I love dachepalli because i born and brought up at Dachepalli.
I studied at Dachepalli zph school and got school first and my name was written on school toppers board at high school.My father worked in the same high school as hindi pandit for 20 years.The famous palnadu battle has took place near Naguleru river of Karempudi which flows across Dachepalli.It has lime mines and number of cement factories around Dachepalli.The Nadikudi railway junction connect Dachepalli to Hyderbad and Guntur.
being born in Dachepalli and studied at Dachepalli high school, I love Dachepalli.
“
response = openai.Completion.create(
    engine=model_deployment_name,
    prompt=text,
    temperature=0,
    max_tokens=118,
    top_p=1,
    frequency_penalty=0,
    presence_penalty=0,
    stop=None)

Let’s understand the parameters used in this OpenAI completion API.

OpenAI’s parameters control the behavior of the language model during text generation. Here’s a brief description of the provided parameters:

Temperature (temperature=0): It determines the randomness of the model’s output. A high value (e.g., 0.8) makes the output more diverse, while a low value (e.g., 0.2) makes it more deterministic.
Max tokens (max_tokens=118): This specifies the maximum number of tokens (words or characters) to generate in the output. It’s useful for limiting response length.
Top P (top_p=1): Also known as nucleus sampling, it controls the diversity of the generated output. Setting it to 1 ensures that only the top probability tokens are considered during sampling.
Frequency penalty (frequency_penalty=0): This discourages the repetition of specific tokens in the output. A non-zero value penalizes the model for choosing frequently occurring tokens.
Presence Penalty (presence_penalty=0): Similar to frequency penalty, presence penalty discourages the repetition of entire phrases or concepts, promoting more diverse responses.
Stop (stop=None): This allows users to specify a custom stopping criterion for generation. When the model encounters the specified token, it stops generating further content.

These parameters provide users with fine-grained control over the generation process, allowing customization of the model’s output based on factors such as randomness, length, diversity, and repetition. Adjusting these parameters enables users to tailor the language model’s behavior to meet specific requirements in various applications, such as chatbots, content generation, and more:
# Print the generated summary
print(“Generated summary:”, summary.choices[0].text.strip())

Running this code will output the following summary:
Generated summary: Main Idea: The author loves Dachepalli because he was born and brought up there and studied at Dachepalli high school.
The town is located in Palnadu district in Andhra Pradesh, India and is known for its lime mines and cement factories.
The Nadikudi railway junction connects Dachepalli to Hyderabad and Guntur.
The famous Palnadu battle took place near Naguleru river of Karempudi which flows across Dachepalli.
The author’s father worked in the same high school as a Hindi pandit for 20 years.

We have seen how to generate a summary using the OpenAI GPT-3.5 model. Now let’s see how to generate the topic for news articles using OpenAI’s GPT model.

Author

Example of video data labeling using k-means clustering with a color histogram – Exploring Video Data

29/08/2024
0

Let us see example code for performing k-means clustering on video data using the open source scikit-learn Python package and the Kinetics...

Frame visualization – Exploring Video Data

27/07/2024
0

We create a line plot to visualize the frame intensities over the frame indices. This helps us understand the variations in intensity...

Appearance and shape descriptors – Exploring Video Data

13/06/2024
0

Extract features based on object appearance and shape characteristics. Examples include Hu Moments, Zernike Moments, and Haralick texture features. Appearance and shape...