Data Scientist with M.A. in sociology, B.A. in environmental sociology, and 5+ years' experience teaching statistics. Completed TripleTen's 10 month data science bootcamp and a real-world data science externship with DataSpeak. Currently accepting data analysis and statistics consulting projects May 2024.
View My LinkedIn Profile
Background: An externship project with DataSpeak, a data science consulting firm, to develop an AI customer service chatbot that could be used across multiple clients. This chatbot learns from a dataset and answer questions on domain-specific knowledge.
Purpose: There were three goals for the chatbot:
Techniques: RAG, Llama-2, LangChain, Chainlit
The model accurately responds to open-ended questions with information from the dataset in under 5min on GPU.
The model correctly picks from a list of multiple choice questions, displaying accuracy when answering customer questions.
Data for this project came from a public dataset of python questions and answers from Kaggle.
Data Link: https://www.kaggle.com/datasets/stackoverflow/pythonquestions
This model should be tested on each domain-specific dataset to ensure it is able to learn accurate answers to common customer questions. Additionally response times can be sped up by running the app through GPU and vector storage in Pinecone. This app will can be deployed over a web service for use by customers.