GitHub - waghvedant/Similarity_Detection: With the aid of two approaches, we developed a plagiarism detection tool for this project that allows users to compare their scores with those of other documents uploaded to the database. On the basis of this, we also published the research paper at IEEE.

Here, we are demonstrating the Plagiarism Detection Tool, and the project's complete workflow and execution module are provided below. Finding the similarity score between various documents is essentially the main idea so that we can examine how similar various documents are to one another. We have employed two algorithms to accomplish this since the algorithm determines similarity not only by the number of words but also by the semantic analysis of the document. which is explained below. In essence, you have to execute out the project. py file that will generate a local host address on which our system as a whole operates. This project was created with Python and its flask module, which serves as an intermediary between front-end modules. To run the files, it will automatically create the databases. As explained below, our project's primary goal is to work in two phases: before and after the deadline.

1) Before Deadline Phase You are free to upload files during this phase, and we have access to any file types the user requests, such as PDFs, TXTs, and Docs, for verification. To upload a file, the user must first create an account and then log in. First-phase tasks include building a database for the user, which stores user account data, and preprocessing files after they are uploaded, which stores the raw data in the database.In order to run the algorithm on raw data, this phase entails preprocessing the file, which includes natural language processing and turning the file into raw data. Following the completion of the first phase, which signifies that the deadline has passed, all user-stored files will be sent to the algorithm for determining plagiarism. This algorithm includes two crucial algorithms: cosine similarity and latent semantic analysis, which determine similarity based on words and meaning. The entire data will then be stored in the database, and the user will be able to see that their score has been checked.

1) After Deadline Phase During this stage, users can access their account and view a list of other users' files as well as their own, allowing them to select the file they need to check their similarity score. and the user can now view the similarity score between their file and the chosen one in a graphical format, which makes it extremely difficult for the user to understand.

The report that was uploaded above contains all of the information regarding the files and various modules of this project. and we're pleased to report that we've written a standard research paper on the subject for IEEE. also provides a link to the paper.

Paper Link: https://ieeexplore.ieee.org/document/10837475

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
__pycache__		__pycache__
static		static
templates		templates
Document_Cal.py		Document_Cal.py
File_Remove.py		File_Remove.py
Latent.py		Latent.py
PLAG_REPORT.pdf		PLAG_REPORT.pdf
README.md		README.md
UPDATED_TF.py		UPDATED_TF.py
Updated_prepro.py		Updated_prepro.py
cosine_sim.py		cosine_sim.py
final_bow.py		final_bow.py
getPhoto_oTm.py		getPhoto_oTm.py
getPhoto_oTo.py		getPhoto_oTo.py
graph_oTm.py		graph_oTm.py
graph_oTo.py		graph_oTo.py
plagiarism.py		plagiarism.py
project.py		project.py
userfile_upload.py		userfile_upload.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Uh oh!

Releases

Packages

Languages

waghvedant/Similarity_Detection

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages