MultiWordUnit Profiler

What is it?

MultiWordUnit Profiler is a free web application that allows you to analyze your texts in terms of multiword unit, or useful chunks.

Currently users can analyze their text using:

An Academic Formulas List

A Phrasal Expressions List

Biber et al. (2004) Lexical bundles list

Beta version was released on 2020/7/27

https://gyazo.com/ab76bf8fa4a9293c491e851f0a38d7bd https://multiwordunitsprofiler.pythonanywhere.com

How to cite the application:

Eguchi, M. (2020) Multi-Word Units Profiler. (Version 1.6.0) [Computer software]. Available from https://multiwordunitsprofiler.pythonanywhere.com

Recent development

Ver 1.6.0:

x Fixed some display issues

x Introduced analysis on lemmatizated MWUs.

Ver 1.5.2:

x Link the table output to external concordancer (i.e., Web Concordancer English by Prof. Cobb; permission obtained to link the page).

x Enabled table sorting

x Color-coding according to frequency levels

x Users are now able to analyze their text based on Lexical bundles (Biber et al., 2004).

Ver 1.3:

x Update the front-end design using bootstrap4 on 2020/7/28

Development plan:

_ Reducing false positives advanced tagging model

The current beta version employes simple pattern match algorithm. This is very simple implementation, but sometimes may behave poorly (lemmatization and grammatical relations) .

The plan is to use dependency tagging model called spaCy to lemmatize and analyze grammatical relations for some types if construction such as collocation. This is currently under development as of 2020/08/02

_ Color-coding according to function

_ Add more multiword unit list

_ Collocation lists

_ Idioms,

_ Binomials

_ Highlight strongly associated neighboring words that surrounds the identified MWUs.

_ Update the frequency information

Currently, the table of expressions present frequency and band informations from the original study, but displaying multiple frequency figures may be confusing for learners. For this reason, I am thinking a way to present similar info across the list. To do this, I need to reanalyze the frequency using available corpora.

_ User interface

x Button to empty the textbox

_ Tabs for tables of MWU.