← Back to projects

Origify

Source

Overview

Origify is a plagiarism detection tool built to maintain academic integrity in state education. The goal is to create a reliable system that identifies plagiarism in exam papers, ensuring every assessment remains authentic, transparent, and of the highest quality. By addressing academic dishonesty, Origify fosters a learning environment rooted in honesty, openness, and excellence.

Motives

Objective

Architecture

Data Processing

  1. Document Text Preprocessing:

    • Tokenization: Breaking text into individual tokens (words, punctuation, etc.) using nltk.word_tokenize.
    • Stop Word Removal: Filtering out common words (e.g., "the", "a", "is") using NLTK’s stopwords corpus.
    • Stemming/Lemmatization: Reducing words to their base form using PorterStemmer from NLTK.
  2. Document Vectorization:

    • TF-IDF (Term Frequency-Inverse Document Frequency): Converts text into numerical vectors that represent word importance across documents using sklearn.feature_extraction.text.TfidfVectorizer.

    Formula:

    • Term Frequency:

    • Inverse Document Frequency:

      Where:

      • ( N ) is the total number of documents in the corpus.
      • ( df(t,D) ) is the number of documents containing term ( t ) in the corpus ( D ).
    • TF-IDF Score:

      A higher TF-IDF score indicates the term is more relevant to the document compared to the entire corpus.

  3. Similarity Calculation:

    • Cosine Similarity: Measures the similarity between TF-IDF vectors of the query document and existing documents using sklearn.metrics.pairwise.cosine_similarity.

TODOs

Features

Developer Tech Stack

Frontend

Backend

Data Sources

Design Prototypes

Wireframe

Wireframe

Mobile User Flow

Mobile User Flow

Marketing

Monetization

Papers