Handbook home
Web Search and Text Analysis (COMP90042)
Graduate courseworkPoints: 12.5On Campus (Parkville)
About this subject
Contact information
Semester 1
Overview
Availability | Semester 1 |
---|---|
Fees | Look up fees |
AIMS
The aims for this subject is for students to develop an understanding of the main algorithms used in natural language processing and text retrieval, for use in a diverse range of applications including search engines, machine translation, text mining, sentiment analysis, and question answering. Topics to be covered include text pre-processing, part-of-speech tagging, context-free grammars, n-gram language modelling, and text classification. The programming language used is Python.
INDICATIVE CONTENT
Topics covered may include:
- Text classification algorithms such as logistic regression
- Vector space models for natural language semantics
- Structured prediction, Hidden Markov models
- N-gram language modelling, including statistical estimation
- Alignment of parallel corpora
- Term indexing, term weighting for information retrieval
- Query expansion and relevance feedback
Intended learning outcomes
INTENDED LEARNING OUTCOMES (ILO)
On completion of this subject the student is expected to:
- Identify basic challenges associated with the computational modelling of natural language
- Understand and articulate the mathematical and/or algorithmic basis of common techniques used in natural language processing and information retrevial
- Implement relevant techniques and/or interface with existing libraries
- Carry out end-to-end research experiments, including evaluation with text corpora as well as presentation and interpretation of results.
Generic skills
On completing this subject, students should have the following skills:
- Formulate and implement algorithmic solutions to computational problems, with reference to the research literature
- Apply a systems approach to complex problems, and design for operational efficiency
- Design, implement and test programs for small and medium size problems in the Python programming language.
Last updated: 3 November 2022