Visitors: 0

What is n-gram?

Answers (1)

    • Mikhail Agapov

      In the fields of computational linguistics and probability, an n-gram (sometimes also called a Q-gram) is a contiguous sequence of n items for a given sample of text or speech. The items can be phonemes, syllables, letters, words, or base pairs according to the application. The n-grams typically are collected from a text or speech corpus. When the items are words, n-grams may be called shingles.

      Using Latin numerical prefixes, an n-gram of size 1 is referred to as a "unigram", size 2 is a "bigram" (or less commonly, a "digram"); size 3 is "trigram". 

      English cardinal numbers are sometimes used e.g., "four-gram", "five-gram", and so on. In computational biology, a polymer or oligomer of a known size is called a k-mer instead of an n-gram, with specific names using Greek numerical prefixes such as monomer, dimer, trimer, tetramer, pentamer, etc. or English cardinal numbers such as one-mer, two-mer, three-mer, etc.

      Applications
      An n-gram model is a type of probabilistic language model for predicting the next item in such a sequence in the form of an (n-1) order Markov model. n-gram models are now widely used in probability, communication theory, computational linguistics (for instance, statistical natural language processing), computational biology (for instance, biological sequence analysis), and data compression. Two benefits of n-gram models (and algorithms that use them) are simplicity and scalability - with larger n, a model can store more context with a well-understood space-time tradeoff, enabling small experiments to scale up efficiency.

    Topics


    Jammu & Kashmir - History, Culture & Traditions | J&K Current Trends | Social Network | Health | Lifestyle | Human Resources | Analytics | Cosmetics | Cosmetology | Forms | Jobs



    Quote of the Day


    "Time Flies Over, but Leaves its Shadows Behind"