Artificial Intelligence (AI)
- Artificial intelligence refers to the ability of computer systems to process information and perform tasks in a way that resembles human thinking. However, AI does not "understand" things—it operates based on statistical models. Everyday examples include translation programs and systems behind various recommendation lists.
Large Language Model (LLM)
- A large language model is an AI system trained on massive text datasets to predict the likely occurrence of words in a text. It can generate, summarize, and translate text. A language model does not search information from external sources but operates based on the data it was trained on.
Machine Learning
- Machine learning is a subfield of AI. In machine learning methods, a computer learns from data instead of being programmed with specific rules. For example, the computer is shown images of cats and dogs and learns to distinguish between them on its own.
Generative AI (Gen AI, GAI)
- Generative AI refers to artificial intelligence that, instead of merely classifying or recognizing, can create new content (e.g., text or images) based on the model it has learned.
Retrieval-Augmented Generation (RAG)
- Retrieval-Augmented Generation (RAG) combines traditional information retrieval with generative AI. The user's question is first transformed into a search query that a specific database can understand, and relevant sources are searched. The AI then selects the sources and generates a response using the language model based on those sources.
- RAG-based search does not rely solely on what the language model has learned from its training data: it is integrated with a specific database and uses its data in responses. RAG systems can also hallucinate or produce biased answers. Even if reliable scholarly sources are used, the model may misinterpret them or formulate a response that no longer reflects the original content or context.
Search String and Prompt
- In traditional information seeking, the user creates a search string to find documents in databases. Search results are based on keywords and algorithms, and the user selects sources that meet their needs.
- A search string is a combination of keywords used to seek information from a database or search engine, e.g., “artificial intelligence AND learning AND languages.”
- A prompt is a question or instruction given to AI to guide its response, e.g., “briefly explain how AI can support language learning.”
- Difference between search query and prompt: A search query retrieves existing information, while a prompt guides AI to generate new content.
Hallucination and Biases
- AI hallucination refers to the phenomenon where AI provides false or fabricated information in a convincing manner. This happens because AI generates responses based on probabilities. If it cannot find a suitable fact, it may fill in the gap itself.
- Biases refer to distorted or partial responses from AI. These reflect the prejudices and imbalances present in the data used to train the AI, such as gender stereotypes or overrepresentation of English-language content.
Open and Closed AI Environments
- In an open AI environment, such as the free version of ChatGPT, data flows to the AI service provider’s servers. This makes usage easy and affordable but introduces risks related to data security and privacy.
- In a closed AI environment, such as a university’s internal Copilot, data remains within a specific organization, improving security and control. However, it requires more resources and maintenance.