Ideas for enhancement #34

itinerant-fox · 2020-12-16T09:00:49Z

Support for taking input from multiple input files.
Wordlist can be spread across multiple files. Currently I am merging it and then passing to duplicut.
Need to work something like : duplicut -p 1.txt 2.txt 3.txt 4.txt -o output.txt
Progress bar. Need not be accurate. Can be a guesstimate.

Thanks for the software 👍

nil0x42 · 2020-12-17T20:19:42Z

Hi ! Thank you for your suggestions :)

1. Support for taking input from multiple input files.

I did not implement multiple input files because it will not be faster than doing:
cat 1.txt 2.txt 3.txt > all.txt && duplicut -i all.txt -o output.txt
I prefer focusing on features that would do things better than existing tools. But of course, if someone is willing to make a PR implementing it, i would be happy to merge it !

2. Progress bar.

All the needed source for progress bar is already implemented (status.c & uinput.c).
Current progress-tracking implementation is good enough, but the UX might be a little old-fashioned: Each time you press any key during execution, a line reports current progress info with ETA, to feel like john-the-ripper's progress tracking.

So implementing a progress-bar would be actually very easy, as it's only a matter of display, i'll add- it to my TODO for sure 👍

sectroyer · 2021-02-01T09:47:06Z

I have few dictionaries and I would like to remove duplicates across them. Therefore it would be good to have an option to remove words from a dict that are in some other dict. I want to keep those dicts separate but make sure that I am not testing same pws when using them in sequence( NOT always the case) :)

yanncam · 2022-03-22T09:07:15Z

Good morning all,

First of all, thank you for duplicut, the tool is particularly powerful!

However, I encountered the same problem than @sectroyer and @itinerant-fox: I needed to use duplicate in a unitary way on each of my wordlists, then to deduplicate the wordlists between them.

Duplicut is only designed to process a single file, so I designed a wrapper (in bash) that automates the process for N wordlists while relying on duplicut.

This wrapper generates a single temporary file concatenating all the wordlists with delimiters (need disk space), then after deduplication, recuts this single file to regenerate the initial deduplicated unitary wordlists, accompanied by some optimization statistics.

You will find the wrapper here : https://github.com/yanncam/multiduplicut

Hope it can help others!

Thanks again for this great tool :)!

nil0x42 · 2022-03-22T15:38:40Z

the delimiters idea is intreresting, actually it's probably the easiest way to implement it inside of duplicut without needing to rewrite a large part of the codebase.
I'll consider implementing multi file when i have time, so no ETA from now (always busy)
Anyway, your script is very nice, and i think it will help many people

nil0x42 · 2025-05-12T14:28:52Z

Hi ! I'm considering implementing this. Therefore i'm not sure about prefered behavior:

If i dedupe bigfile.txt, smallfile.txt, mediumfile.txt, which file should duplicut 'favor' ?

There are 2 possibilities
1 - Respect cli order -- remove string1 on smallfile.txt & mediumfile.txt
2 - Priorize removal on bigger files -- remove string1 on mediumfile.txt & bigfile.txt

I tend to prefer option 2, but i'd like to hear your opinion @itinerant-fox , @sectroyer , @yanncam

yanncam · 2025-05-12T14:52:01Z

Hello @nil0x42,

For my part, when I created the "multiduplicut" wrapper, I based my sorting on file size:

I leave all the words in the smallest file (in bytes);
Then I deduplicate each other dictionary from the previous one (always sorted in size order).

Generally, a small wordlist contains very specific/very common words, so for a cryptanalysis approach, such a wordlist will be very relevant and it's best to leave it intact to maximize time.

The larger a wordlist becomes, the longer it will take.

This is why I opted for an implicit sort based on file size.

But customizing this order with an ordered list as a command line argument could be interesting too :)!

sectroyer · 2025-05-19T08:46:34Z

I use following script:

#!/bin/bash

echo -e "\nDuplicut file extension\n"

if [ -z "$3" ] || [ ! -f "$1" ] || [ ! -f "$2" ]
then
	echo "Usage: $0 <input_file_to_clean.txt> <file_to_remove> <output_file.txt>"
	exit -1
fi

rm /mnt/data/trim/*.txt 2> /dev/null


echo "Counting lines..."
number_of_lines_to_skip=$(wc -l "$2" | cut -d ' ' -f 1)
number_of_lines_to_skip=$(($number_of_lines_to_skip + 1))


echo "Number of lines to skip: $number_of_lines_to_skip"

echo "Copying input 1..."
cat "$2" > /mnt/data/trim/input.txt

echo "Copying input 2..."
cat "$1" >> /mnt/data/trim/input.txt

echo "Removing duplicates..."
duplicut /mnt/data/trim/input.txt -p -o /mnt/data/trim/output.txt

echo "Trimming output.."
tail -n "+$number_of_lines_to_skip" /mnt/data/trim/output.txt > "$3"

echo ""

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Ideas for enhancement #34

Ideas for enhancement #34

itinerant-fox commented Dec 16, 2020

nil0x42 commented Dec 17, 2020

Uh oh!

sectroyer commented Feb 1, 2021

Uh oh!

yanncam commented Mar 22, 2022

Uh oh!

nil0x42 commented Mar 22, 2022

Uh oh!

nil0x42 commented May 12, 2025

Uh oh!

yanncam commented May 12, 2025

Uh oh!

sectroyer commented May 19, 2025

Uh oh!

Uh oh!

Ideas for enhancement #34

Ideas for enhancement #34

Comments

itinerant-fox commented Dec 16, 2020

nil0x42 commented Dec 17, 2020

Uh oh!

sectroyer commented Feb 1, 2021

Uh oh!

yanncam commented Mar 22, 2022

Uh oh!

nil0x42 commented Mar 22, 2022

Uh oh!

nil0x42 commented May 12, 2025

Uh oh!

yanncam commented May 12, 2025

Uh oh!

sectroyer commented May 19, 2025

Uh oh!