Programming Language Detector

This code provides a basic framework for detecting the programming language of a given code snippet. It has implemented in https://textcompare.io to compare code and text.

Supported Languages:

JavaScript
C
C++
Python
Java
HTML
CSS
Ruby
Go
PHP
C#
R
Objective-C
TypeScript
Swift
Perl
Haskell
Kotlin
Rust
Dart
Julia
Scala
Groovy
Bash
Lua
FSharp
MATLAB
Lisp
Prolog
COBOL
Fortran
Ada
Elm
Crystal
Elixir
Clojure
CoffeeScript
ObjectiveCPP
Racket
Erlang
Apex
VHDL
Verilog
Scheme
Tcl
COOL
Nim
K
Racket
Alice
APL
Bash
Kotlin
XML
YAML
JSON
Markdown
SGML
HTML5
TEI
LaTeX
AsciiDoc

How it Works:

Language Definitions:
- LANGUAGES is a dictionary where keys are language names (e.g., "JavaScript", "Python") and values are arrays of language-specific patterns.
- Each pattern within an array is a regular expression object with:
  - pattern: The regular expression to match.
  - points: Points awarded for each match. Positive points indicate features of the language, negative points indicate features that suggest it's not that language.
  - nearTop: (Optional) If True, the pattern should be matched near the beginning of the code.
Scoring:
- For each language:
  - Iterate through the patterns for that language.
  - Apply each pattern to the code snippet.
  - Accumulate the points for each match.
Language Prediction:
- Determine the language with the highest score.
- If no language has a significantly higher score than others, the prediction may be uncertain.

Limitations:

Simple Heuristics: The current implementation relies on simple heuristics (regular expressions) and may not accurately detect complex or ambiguous code.
Limited Language Support: While the code supports a wide range of languages, it may not be comprehensive and may miss some niche languages.
False Positives/Negatives: False positives (incorrectly identifying a language) and false negatives (failing to identify the correct language) are possible due to the nature of heuristics and the complexity of real-world code.

Potential Improvements:

More Sophisticated Language Models: Incorporate more advanced language models (e.g., machine learning models trained on large code datasets) for more accurate predictions.
Contextual Analysis: Consider the context of the code (e.g., surrounding files, project structure) to improve accuracy.
Dynamic Analysis: Analyze the code's behavior (e.g., by running it) to extract more meaningful features.
Lexical Analysis: Perform more in-depth lexical analysis (tokenization, parsing) to identify language-specific constructs.

Usage:

Prepare the code snippet: The code snippet should be a string.
Call the detection function:
- Pass the code snippet to the detection function.
- The function will analyze the code and return the predicted language.

Note: This is a simplified example. A production-ready implementation would require significant refinement and testing.

This README file provides a basic overview of the code and its limitations. Further documentation and testing are necessary for a complete understanding and evaluation.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
README.md		README.md
index.js		index.js
package-lock.json		package-lock.json
package.json		package.json
sample.html		sample.html
samplewithcdn.html		samplewithcdn.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Programming Language Detector

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

textcompare/detect-code-language

Folders and files

Latest commit

History

Repository files navigation

Programming Language Detector

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages