Here’s the output:

Figure 7.5 – Training a LabelModel
It then predicts the labels for the training set and prints them:
# Predict the labels for the training data
Y_train_pred = label_model.predict(L=L_train)
# Print the predicted labels
print(Y_train_pred)
Here’s the output:
[ 0 1 -1 -1 0 1]
Step 4: Analyzing labeling functions and creating a DataFrame with predicted labels. We can use the LFAnalysis class to analyze the labeling functions by passing the labels (L) and the list of labeling functions (lfs). The lf_summary() method provides an overview of the labeling functions and their coverage:
# Analyze the labeled data
LFAnalysis(L=L_train, lfs=lfs).lf_summary()
Here’s the output:

Figure 7.6 – LFAnalysis summary
The table is a summary of the results from LFAnalysis, specifically for three labeling functions: lf_positive_review, lf_negative_review, and if_neutral_review.
Let’s break down the columns:
- j: The index of the labeling function in the list of labeling functions. Here, j=0 corresponds to lf_positive_review, and j=1 corresponds to lf_negative_review.
- Polarity: The polarity assigned to the labeling function, representing the label value assigned by the function. In this case, lf_positive_review has a polarity of [0, 1], meaning it assigns both label 0 and label 1. On the other hand, lf_negative_review has a polarity of [0], indicating it only assigns label 0.
- Coverage: The set of labels predicted by the labeling function. For lf_positive_review, it predicts both label 0 and label 1 ([0, 1]), indicating it provides a non-abstain output for all examples. However, lf_negative_review predicts only label 0 ([0]), meaning it provides a non-abstain output for only 55.25% of the examples.
- Overlaps: The percentage of examples for which the labeling function provides a non-abstain output. It represents the extent to which the labeling function is applicable. In this case, both lf_positive_review and lf_negative_review have a coverage of 0.5525, indicating that they provide a non-abstain label for 55.25% of the examples.
- Conflicts: The percentage of examples for which the labeling function disagrees with at least one other labeling function. It measures the level of conflict between the labeling function and other functions. Both lf_positive_review and lf_negative_review have a conflict value of 0.2105, indicating they have conflicts with other labeling functions in approximately 21.05% of the examples.
This summary provides insights into the performance, coverage, and conflicts of the labeling functions, allowing you to assess their effectiveness and identify areas of improvement in your labeling process.
Lastly, the following chunk of code analyzes the labeling functions and creates a DataFrame with the predicted labels. It uses the LFAnalysis class from Snorkel to analyze the labeling functions and print a summary. It then creates a DataFrame with the predicted labels:
# Create a DataFrame with the predicted labels
df_train_pred = df_train.copy()
df_train_pred[‘predicted_label’] = Y_train_pred
# Display the DataFrame
print(df_train_pred)
Here’s the output:

Figure 7.7 – Predicted labels
In this example, we first created the Movie Reviews DataFrame. We then defined three rule-based labeling functions using regular expressions to label reviews as positive, negative, or neutral based on the presence of certain keywords. We applied these labeling functions to the text data using the PandasLFApplier provided by the Snorkel API. Finally, we analyzed the labeled data using LFAnalysis and printed a summary of the results.
Note that this is a simple example and you may need to adjust the code depending on the specific requirements of your use case. Also, you can add more labeling functions depending on your task, and these functions should be carefully designed and tested to ensure high-quality labels.
Now, let’s look into labeling the data using logistic regression.