./devel/tlsh, Fuzzy matching library

[ CVSweb ] [ Homepage ] [ RSS ] [ Required by ] [ Add to tracker ]


Branch: CURRENT, Version: 4.8.2, Package name: tlsh-4.8.2, Maintainer: pkgsrc-users

TLSH is a fuzzy matching library. Given a byte stream with a minimum
length of 256 bytes (and a minimum amount of randomness - see note
in Python extension below), TLSH generates a hash value which can
be used for similarity comparisons. Similar objects will have
similar hash values which allows for the detection of similar
objects by comparing their hash values. Note that the byte stream
should have a sufficient amount of complexity. For example, a byte
stream of identical bytes will not generate a hash value.


Master sites:

Filesize: 3267.049 KB

Version history: (Expand)


CVS history: (Expand)


   2024-12-18 14:54:16 by Niclas Rosenvik | Files touched by this commit (2)
Log message:
devel/tlsh: Fix cannot find -lstdc++ on Linux
   2024-08-25 08:19:21 by Thomas Klausner | Files touched by this commit (575)
Log message:
*: replace CMAKE_ARGS with CMAKE_CONFIGURE_ARGS
   2024-01-14 21:07:45 by Amitai Schleier | Files touched by this commit (7) | Package updated
Log message:
tlsh: update to 4.8.2. Changes:

3.5.0:
- Added the - force option
  - Allows a user to force the generation of digests for strings down to
    50 characters long

3.5.1:
- Fixed the error in the Python extension

3.5.2:
- Added the BlackHat Asia tool (presented at Arsenal)

3.7.0:
- merged in various fixes - ifdef for SPARC and RH73
- corrected TLSH_CTC_final.pdf
- added a SHA1 to the NOTICE.txt file
- improved the make.sh so that it calls the test.sh (and does
  regression tests)
- improved regression tests to confirm that the hash is calculated
  correctly in your environment
- fixed the header file C++ standard violation (reserved identifier
  violation #21)

3.7.1:
- resolved issue #29 - the force option for Python
  Step 1 - adding a regression test for strings approx of length 50
  Step 2 - add python code

3.7.2:
- added code to set the distance parameters for ROC analysis

3.7.3:
- resolving issue #44
- making static library the default

3.7.4:
- resolving issue #45
- add a timing test for TLSH

3.7.5:
- resolving issue #46
- in include/tlsh_impl.h
	#define SLIDING_WND_SIZE  5
  this can be varied between 4 to 8

3.8.0:
- Adding    // access functions - required by tools using TLSH library
  - int Lvalue();
  - int Q1ratio();
  - int Q2ratio();

3.9.0:
- resolving issue #48 - tlsh_pattern program

3.9.1:
- resolving issue #38
- putting in fix in rand_tags.cpp so that it generates identical output
  to previous version while safely working with pointers

3.9.2:
- Also merged the contents of NOTICE.txt into LICENSE. This was done
  because NOTICE.txt is sometimes accidently removed when people clone
  this repository. And the LICENSE specifically states that NOTICE.txt
  should NOT be removed.
- Also added command line option -notice which displays the
  NOTICE.txt file

3.9.3:
- currently tlsh_pattern returns all the matches
  modify tlsh_pattern to return the best match
- remove the newline from the input fields when reading in the
  tlsh_pattern file

3.9.4:
- check in order_bug program which demonstrates issue #50
- resolved issue #50 - added code to tlsh_impl.cpp to check for invalid
  call sequences to update() and final()

3.9.5:
- issue #61: added a command line option -notest - do not do any testing

3.9.6:
- Have a cmake option to build tlsh with a zero byte checksum
  (development / research option)
- Default build has 1 byte checksum - which is strongly recommended

3.9.7:
- resolving issue #50 for bin/timing_unittest

3.9.8:
- timing_unittest measures the time taken to do distance calculations
- add a command line option -size - so that you can measure the time
  taken to evaluate different sizes of string

3.9.9:
- resolve issue #62
- remove dependancy on GNUInstallDirs

3.10.0:
- Adding // access function - required by tools using TLSH library
  - int BucketValue(int bucket);
  - int Checksum(int k);

3.11.0:
- Make calculation of TLSH digests approx 7 times faster (for large
  files), done by
  - inline functions
  - unrolling loops
  - fixing the -O2 optimization option

3.11.1:
- tidy up:
  1. use fast_b_mapping() instead of b_mapping()
  2. remove declaration of unsigned r which is never used
  3. remove #include which is not required

3.12.0:
- remove floating point calculations such as log() function
  use alookup table instead

3.13.0:
- .vcproj files and instructions for builing TLSH on Windows using
  Visual Studio

3.13.1:
- fixing setup.py so that you can install Python Extension on Windows

3.14.0:
- adding sliding window size to tlsh_version
- changing test.sh to read the sliding window size

3.14.1:
- fixing error in test script for -xlen option (print statements about
  considering length were incorrect)
- improved test.sh - tests for existance of expected output files

3.15.0:
- Refactor code - so that input of directory or digest is in a struct.
  The code to process input is in library code (input_desc.cpp,
  shared_file_functions.cpp). The input routines can be used by
  myultiple programs. Also, preparing for things like csv input files.

3.15.1:
- added command line option -help to show full help information

3.15.2:
- tlsh_pattern uses refactored code introduced in 3.15.0

3.16.0:
- improved tlsh_pattern functionality
- added regression tests for tlsh_pattern

3.16.1:
- improved tlsh functionality
- add options

3.16.2:
- added regression tests for 3.16.1

3.17.0:
- Make command line option  -force        (50 char limit) the default behaviour
- Add a command line option -conservative (256 char limit)

3.17.3:
- add checking to confirm that TLSH digests are the correct length in
  - -c option
  - -d option
  - the appropriate column of -l listfile options

3.18.0:
- resolve issue #72 - remove tlsh_version

3.19.0:
- preperation for Windows build
  remove ../Testing/ from test.sh script and from regression test
  results

3.19.1:
- in test.sh and testlen.sh - make TLSH_PROG a variable

4.0.0:
- version 4: adding version identifier to each digest: 'T1'
  - adding command line option -old to generate old style digests
  - In this version - the showvers is defaulted to off - so this will
    pass the old regression tests

4.0.1:
- turing on T1 functionality by setting showvers=1 in main
- updating regression tests to have T1 at the start of digests

4.1.0:
- adding -o option for output filename (output will go to stdout if no
  output file given)
  - changed test scripts to use -o option
- adding -ojson option for json output
  - added regression test for -ojson option
- adding -onull option to output empty files / files too small as TNULL

4.2.0:
- Windows version using minGW

4.2.1:
- resolve issue #78 json objects do not validate on windows

4.2.2:
- resolve issue #81
- Pass regression tests

4.2.3:
- add regression tests that are compatible with
  https://github.com/glaslos/tlsh

4.3.0:
- issue #79 - divide by 0 if q3 == 0
  solution. if (q3 == 0) return invalid hash

4.4.0:
- Fixing Python Extension
  - updated python extension to T1 hashes (4.0.0)
  - fixed python_test.sh (which attempted to access old expected
    results files)
  - added license information to py_ext/tlshmodule.cpp

4.4.1:
- Command line options to tlsh_digest.py
  -conservative	enforce 256 byte limit
  -old		generate old style hash (without "T1")
- added python functions to tlsh package (for backwards compatibility)
  tlsh.oldhash(data)
  tlsh.conservativehash(data)
  tlsh.oldconservativehash(data)

4.5.0:
- Checking in files to create pypi package

4.6.0:
- Add architecture ppc64le to travis build (Thanks ddeka2910)

4.7.0:
- Release updated package py-tlsh on Pypi.org
- Merging in pull request that adds functions to Python package
  lvalue, q1ratio, q2ratio, checksum, bucket_value and is_valid
- resolve issue #102 - correct Python version numbers

4.7.2:
- regression tests for C++ and Python functions for:
  lvalue, q1ratio, q2ratio, checksum, bucket_value
- resolve issue #95 - allow Requires-Python: >=2.7

4.8.0:
- Fix the make install target by adding the version.h in the
  installed files

4.8.1:
- Improve portability, add shared library build, install tlsh_unittest

4.8.2:
- fixed tlsh_win_version.h
   2021-10-26 12:20:11 by Nia Alarie | Files touched by this commit (3016)
Log message:
archivers: Replace RMD160 checksums with BLAKE2s checksums

All checksums have been double-checked against existing RMD160 and
SHA512 hashes

Could not be committed due to merge conflict:
devel/py-traitlets/distinfo

The following distfiles were unfetchable (note: some may be only fetched
conditionally):

./devel/pvs/distinfo pvs-3.2-solaris.tgz
./devel/eclipse/distinfo eclipse-sourceBuild-srcIncluded-3.0.1.zip
   2021-10-07 15:44:44 by Nia Alarie | Files touched by this commit (3017)
Log message:
devel: Remove SHA1 hashes for distfiles
   2017-09-01 09:52:02 by Thomas Klausner | Files touched by this commit (3)
Log message:
Replace patch with correct upstream fix.

Bump PKGREVISION.
   2016-12-17 15:18:32 by Joerg Sonnenberger | Files touched by this commit (2)
Log message:
Fix pointer abuse.
   2016-09-04 12:41:48 by Thomas Klausner | Files touched by this commit (1)
Log message:
Add GITHUB_PROJECT so fetching works for py-tlsh.