Fine-tuning a Small Language Model to parse raw, unstructured text and extract relevant entities related to scheduling a calendar event.
SmolLM2-360M-Instruct-Text-2-JSON - A fine tuned version of SmolLM2-360M-Instruct-bnb-4bit specialized for parsing unstructured calendar event requests into structured JSON data.
You can use the SmolLM2-360M-Instruct-Text-2-JSON
model to parse natural language event descriptions into structured JSON format.
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
import json
# Load model and tokenizer
model_name = "pramodkoujalagi/SmolLM2-360M-Instruct-Text-2-JSON"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
def parse_calendar_event(text):
# Format the prompt
formatted_prompt = f"""<|im_start|>user
Extract the relevant event information from this text and organize it into a JSON structure with fields for action, date, time, attendees, location, duration, recurrence, and notes. If a field is not present, return null for that field.
Text: {text}
<|im_end|>
<|im_start|>assistant
"""
# Generate response
inputs = tokenizer(formatted_prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=512,
do_sample=True,
temperature=0.1,
top_p=0.95,
pad_token_id=tokenizer.eos_token_id
)
# Process response
output_text = tokenizer.decode(outputs[0], skip_special_tokens=False)
response = output_text.split("<|im_start|>assistant\n")[1].split("<|im_end|>")[0].strip()
# Return formatted JSON
parsed_json = json.loads(response)
return json.dumps(parsed_json, indent=2)
# Example input
event_text = "Plan an exhibition walkthrough on 15th, April 2028 at 3 PM with Harper, Grace, and Alex in the art gallery for 1 hour, bring bag."
# Output
print("Prompt:")
print(event_text)
print("\nModel Output:")
print(parse_calendar_event(event_text))
Output
Prompt:
Plan an exhibition walkthrough on 15th, April 2028 at 3 PM with Harper, Grace, and Alex in the art gallery for 1 hour, bring bag.
Model Output:
{
"action": "Plan an exhibition walkthrough",
"date": "15/04/2028",
"time": "3:00 PM",
"attendees": [
"Harper",
"Grace",
"Alex"
],
"location": "art gallery",
"duration": "1 hour",
"recurrence": null,
"notes": "Bring bag"
}
- Project Overview
- Dataset
- Fine-tuning Methodology
- Performance Evaluation
- Technical Implementation
- Future Enhancements
- Conclusion
The aim of this project is to fine-tune a Small Language Model (SmolLM2-360M-Instruct-bnb-4bit) to parse unstructured calendar event requests and extract structured information. The model identifies key scheduling entities such as action, date, time, attendees, location, duration, recurrence, and notes from natural language text.
Initial analysis on the dataset (event_text_mapping.jsonl
):
- 792 total examples
- Field presence distribution:
- action, date, time: 100% (792/792)
- attendees: 75.3% (596/792)
- location: 65.5% (519/792)
- duration: 81.2% (643/792)
- recurrence: 3.3% (26/792)
- notes: 1.8% (14/792)
- Format variations:
- Date formats: DD/MM/YYYY (88.5%), YYYY-MM-DD (11.5%)
- Time formats: 12-hour (93.2%), 24-hour (6.8%)
- Various duration formats
Data standardization (standardize_data.py
) was implemented to ensure consistency across:
- Date formats (standardized to DD/MM/YYYY)
- Time formats (standardized to 12-hour format with AM/PM)
- Duration expressions
- Attendees lists
The dataset was augmented to create a more balanced distribution:
- Increased total examples to 1,149
- Improved representation of less frequent fields:
- recurrence: increased to 20.1% (231/1,149)
- notes: increased to 19.5% (224/1,149)
- Maintained high coverage of core fields (action, date, time)
For improved fine-tuning performance, the data was restructured into an instruction-based (instruction_format.py
) format:
From
{"event_text": "Late night study session at the café on 15th, Dec 2024 at 9:00 pm for 2 hours.", "output": {"action": "study session", "date": "15/12/2024", "time": "9:00 PM", "attendees": null, "location": "café", "duration": "2 hours", "recurrence": null, "notes": null}}
To
{
"instruction": "Extract the relevant event information from this text and organize it into a JSON structure with fields for action, date, time, attendees, location, duration, recurrence, and notes. If a field is not present, return null for that field.",
"input": "Late night study session at the café on 15th, Dec 2024 at 9:00 pm for 2 hours.",
"output": "{\"action\": \"study session\", \"date\": \"15/12/2024\", \"time\": \"9:00 PM\", \"attendees\": null, \"location\": \"café\", \"duration\": \"2 hours\", \"recurrence\": null, \"notes\": null}"
}
This approach provided several critical advantages:
- Clear task definition: Explicit instructions helped the model understand exactly what was expected
- Format specification: The instruction clearly defined the required output structure
- Null-handling guidance: Explicit instructions on how to handle missing fields
- Improved generalization: The instruction-based format better leveraged the base model's instruction-following capabilities
The processed data was then prepared for Unsloth fine-tuning using prepare_unsloth_data.py
, which:
- Formats data in the Unsloth-compatible chat template
- Creates a train-validation split (90-10)
- Results in 1,034 training examples and 115 validation examples
- Base model: SmolLM2-360M-Instruct-bnb-4bit (Derived from HuggingfaceTB/SmolLM2-360)
- Context length: 2048 tokens
- Quantization: 4-bit quantization for memory-efficient training
The fine-tuning approach implemented in finetune_lora.py
uses Quantized Low-Rank Adaptation (QLoRA), combining 4-bit quantization with LoRA for parameter-efficient fine-tuning:
- LoRA rank: 64 (higher for better performance)
- Target modules: All key model components including attention modules, projections, and embedding layers
- LoRA alpha: 32
- Rank-stabilized LoRA (rsLoRA): Enabled for better stability
- Gradient checkpointing: Enabled with Unsloth optimizations for memory efficiency
The model was fine-tuned using the Unsloth framework with the following configuration:
- Batch size: 8 (2 per device × 4 gradient accumulation steps)
- Learning rate: 2e-4 with cosine scheduler
- Epochs: 3
- Weight decay: 0.01
- Optimizer: AdamW (8-bit)
- Training time: Updating
- Final training loss: Updating
- Final validation loss: Updating
- Validation perplexity: 1.2735 (excellent perplexity, close to 1.0)

The validation framework included:
- Regular evaluation every 30 steps
- Early stopping with patience of 3 evaluations
- Automated learning curve generation
- Overfitting detection and prevention
The low validation perplexity (1.2735) indicates strong model performance, with the validation loss being lower than the training loss suggesting good generalization without overfitting. The difference between training and validation loss demonstrates that the model learned the task effectively while maintaining generalization capabilities.
The evaluation process (eval.py
) focused on:
-
Per-field accuracy: Measures correctness of each extracted entity
- String-based matching for simple fields
- Jaccard similarity for list-type fields (e.g., attendees)
-
Overall accuracy: Average accuracy across all fields
-
JSON parse rate: Percentage of responses that parse as valid JSON
Field | Base Model | Fine-tuned Model | Improvement |
---|---|---|---|
action | 0.000000 | 0.947826 | +94.78% |
date | 0.000000 | 0.991304 | +99.13% |
time | 0.000000 | 0.991304 | +99.13% |
attendees | 0.228070 | 0.988304 | +333.35% |
location | 0.342105 | 0.964912 | +181.97% |
duration | 0.105263 | 1.000000 | +850.00% |
recurrence | 0.815789 | 0.982456 | +20.43% |
notes | 0.842105 | 1.000000 | +18.75% |
Overall | 0.290710 | 0.983242 | +238.22% |
JSON Parse Rate | 0.860870 | 1.000000 | +16.17% |
- The fine-tuned model achieved near-perfect performance on most fields
- Most dramatic improvements were in action, date, time, and duration fields
- Perfect JSON parse rate shows the model learned to maintain structured output format
- Perfect handling of duration field demonstrates successful standardization of varied time expressions
The project's implementation is organized into several modular components:
-
Data Processing
standardize_data.py
: Normalizes date, time, duration formatscurate_dataset.py
: Curates datasetinstruction_format.py
: Converts to instruction-based format
-
Model Training
prepare_unsloth_data.py
: Prepares data for Unslothfinetune_lora.py
: Implements LoRA fine-tuning
-
Evaluation
eval.py
: Evaluates model performance
Created a Gradio-based demo app to interact with the model in real time. Deployed it using Hugging Face Spaces platform.
📦 App Stack:
Gradio
for frontendtransformers
for loading the modelHugging Face Spaces
for hosting
While the current implementation achieves impressive results, several strategies could potentially improve performance further:
- Synthetic Data Generation: Generate additional examples using templates or larger language models to cover edge cases
- Adversarial Examples: Create challenging inputs to strengthen model robustness
- Cross-lingual Augmentation: Expand to multiple languages for broader applicability
- Advanced QLoRA Tuning: Experiment with different LoRA ranks and target modules to find optimal configurations (our current implementation already uses 4-bit quantization with LoRA)
- Controlled Hyperparameters: Experiment with lower learning rates (e.g., 5e-5) and more epochs (5-10) to potentially improve convergence
- Layer-specific Fine-tuning: Experiment with freezing certain layers and only fine-tuning others
- Few-shot Learning: Structure the fine-tuning to leverage the model's existing capabilities through in-context examples
- Curriculum Learning: Train on progressively more difficult examples
- Ensemble Approach: Train multiple models on different data splits and combine their predictions
- Human Evaluation: Supplement automatic metrics with human judgments
- Edge Case Testing: Create a specialized test set of particularly challenging cases
- Robustness Analysis: Test model performance with noisy or malformed inputs
The project successfully transforms the base SmolLM2-360M-Instruct-bnb-4bit model into a specialized entity extraction tool for calendar event scheduling. Through careful data curation, format standardization, and targeted fine-tuning, we can see the significant improvements across evaluation metrics.
The instruction-based fine-tuning approach proved particularly effective, allowing the model to generate consistently structured outputs while handling diverse input phrasing. The resulting model demonstrates impressive capabilities in extracting structured information from unstructured text, with near-perfect accuracy across multiple entity types.
This project is licensed under the Apache license 2.0 - see the LICENSE file for details.