//$Header$ //{211fc600-db2e-4703-bf40-968a2f063b13} //------------------------------------------------------------------------------------------------- //Copyright (c) 2018, David T. Ashley // //This file is part of "ets_dedup", a program for eliminating duplicate files in a subdirectory //tree. // //This source code and any program in which it is compiled/used is licensed under the MIT License, //reproduced below. // //Permission is hereby granted, free of charge, to any person obtaining a copy of //this software and associated documentation files(the "Software"), to deal in the //Software without restriction, including without limitation the rights to use, //copy, modify, merge, publish, distribute, sublicense, and / or sell copies of the //Software, and to permit persons to whom the Software is furnished to do so, //subject to the following conditions : // //The above copyright notice and this permission notice shall be included in all //copies or substantial portions of the Software. // //THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR //IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, //FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE //AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER //LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, //OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE //SOFTWARE. //------------------------------------------------------------------------------------------------- #include const char *ets_dedup_description[] = { "ets_dedup (mnemonic: DE-DUPlicate) is a program for identifying and", "eliminating duplicate files (by any name) at any depth in a subdirectory and" "its children. The most common application for the program would be the", "reduction of personal clutter, i.e. duplicate photos and downloads.", }; const char *ets_dedup_instructions[] = { "Usage", "-----" "ets_dedup [-option_1 [ ... [-option_n]]] [--] [pref_dir_1 [ ... [pref_dir_n]]]" " If no options are provided, emits full documentation to stdout. If options", " are provided, analyzes and optionally deletes duplicate files in the", " current working directory and all its children. With large sets of files", " This program can take a long time to run (hours), because it calculates", " the SHA512 digest of every file." "", " ets_dedup is a dangerous program in that it can destroy information (which", " file is in which directory is information, and it is possible to destroy", " information without deleting the last of a set of duplicate files).", " However, ets_dedup is safe in the sense that it will never delete the last", " of a set of identical files (this cannot be done using this program,", " automatically or manually)." "", "Options", "-------", "-report", " Analyzes the current working directory and all its children for duplicates,", " and writes a full report to the console. The report includes which files", " are duplicates, and approximately how much storage space would be saved by", " eliminating all duplicates and by eliminating duplicates of individual", " files and of subdirectories. The report is voluminous and is typically", " redirected to a file.", "-dedup_full_auto", " Deletes all duplicate files, leaving only one copy (by any name or", " extension) of any file. If duplicates are in the same directory, the", " first one in alphabetical order is retained. If duplicates are in different", " directories, a non-deterministic algorithm is used that tends to leave", " larger directories intact while consuming smaller directories.", "-dedup_auto_dir_pri", " Deletes all duplicate files, leaving only one copy (by any name or", " extension). However, in the selection of which duplicates to delete, the", " copy in pref_dir_1 is given preference to remain over the copy in", " pref_dir_2, ..., over the copy in pref_dir_n, and finally over files in", " subdirectories not covered by any of the specified directories. If", " multiple copies of a file exist in the highest-priority preferred directory", " specified, they are all retained. If there are duplicates that exist only", " outside the set of specified preferred directories, none are deleted." "-dedup_auto_dir_equal", " The specified directories are given priority over all directories not", " specified (this creates two equivalence classes--the directories specified", " and the directories not specified). Duplicates that exist both in at least", " one of the specified directories and outside the set of specified", " directories have the outside copies deleted. No files within the set of", " specified directories are deleted. If a file has copies only outside, no", " copies are deleted.", "-dedup_manual_interactive", " Performs a full analysis, then allows interactive manual operations. The", " operations involve descending into and ascending out of directories, and", " setting a given directory or file as authoritative (meaning all external", " copies will be deleted) or non-authoritative (meaning that duplicates", " within the non-authoritative object are not retained).", "-dry_run", " Provides all information about what would have been deleted, but deletes no", " files. This option can be useful for ensuring that the behavior of the", " program will be acceptable.", "", "Limitations", "-----------", " ( 1) Unicode in path names supplied on the command line and in file and", " directory names is not supported.", " ( 2) Unicode in file and directory names may or may not be supported.", " This depends on technical details of Linux/Unix and Windows that", " are tool voluminous to include here.", " ( 3) The program rebuilds its internal data structures each time it is", " run (which involves calculating the SHA512 digest of every file in", " the current working directory and its children). This is a very", " time-consuming operation. The program does not save any information", " between invocations." " ( 4) The program builds all data structures in memory, and so is limited" " by the amount of usable memory in the computer system. A reasonable", " estimate of memory consumption might be 250 bytes per file to be", " analyzed (100 bytes for the path name, 128 bytes for the SHA512", " digest, and 22 bytes for other overhead). Assuming 1GB of usable", " memory, this gives an upper limit of around 4 million files.", " This suggests that the program would be usable for most de-duplication", " tasks.", " ( 5) The program does not provide information about near duplicates.", " The program processes files only in terms of same or different.", "", "Technical Notes", "---------------", " ( 1) Although the probability of two files with different contents having", " the same SHA512 digest is astronomically small (a hash collision has", " never been found), the program handles this case by ", }; int c_main(int argc, char **argv) { printf("Execution begins.\n"); printf("Execution ends.\n"); return 0; }