Oracle Labs NL2SQL
Development of NL2SQL Generation Model for Oracle Database
During my internship at Oracle Labs (July 2023 - October 2023), I worked on developing advanced NL2SQL generation models specifically optimized for Oracle Database systems.
Challenge Identified
Discovered a critical gap in existing NL2SQL benchmarks:
- Representative benchmarks like Spider follow SQLite dialect
- More than 50% of queries cannot be executed on Oracle databases
- Existing models performed poorly on Oracle-specific features and syntax
Solution Development
Oracle-Specific Dataset Creation
- SQLGlot Integration: Used SQLGlot for automated translation from SQLite to Oracle dialect
- Feature Compatibility: Excluded SQLite-specific features not supported in Oracle
- Quality Assurance: Ensured all translated queries execute correctly on Oracle databases
Model Optimization Techniques
Implemented multiple fine-tuning approaches:
- Hyperparameter Tuning: Optimized model parameters for Oracle-specific patterns
- Prompt Engineering: Developed zero-shot and few-shot prompting strategies
- Parameter-Efficient Fine-Tuning: Applied PEFT techniques to reduce computational overhead
- Dataset Augmentation: Enhanced training data with Oracle-specific examples
Results
Achieved 27.4% improvement in execution accuracy over baseline models, demonstrating the importance of database-specific optimization in NL2SQL systems.
Technical Skills Developed
- Database Systems: Deep understanding of Oracle SQL dialect and features
- Model Fine-Tuning: Advanced techniques in language model optimization
- Prompt Engineering: Strategic approaches to improving model performance
- Data Processing: Large-scale dataset transformation and validation
Impact
This work highlighted the importance of database-specific considerations in NL2SQL research and provided practical solutions for enterprise database integration.
Collaboration
Worked closely with Oracle Labs research team, gaining experience in:
- Industrial research methodologies
- Enterprise software requirements
- Large-scale database systems
- Production-ready model development
Learn more: Generative AI in Oracle APEX