InChIKey collision resistance: an experimental testing - PubMed
- ️Sun Jan 01 2012
InChIKey collision resistance: an experimental testing
Igor Pletnev et al. J Cheminform. 2012.
Abstract
InChIKey is a 27-character compacted (hashed) version of InChI which is intended for Internet and database searching/indexing and is based on an SHA-256 hash of the InChI character string. The first block of InChIKey encodes molecular skeleton while the second block represents various kinds of isomerism (stereo, tautomeric, etc.). InChIKey is designed to be a nearly unique substitute for the parent InChI. However, a single InChIKey may occasionally map to two or more InChI strings (collision). The appearance of collision itself does not compromise the signature as collision-free hashing is impossible; the only viable approach is to set and keep a reasonable level of collision resistance which is sufficient for typical applications.We tested, in computational experiments, how well the real-life InChIKey collision resistance corresponds to the theoretical estimates expected by design. For this purpose, we analyzed the statistical characteristics of InChIKey for datasets of variable size in comparison to the theoretical statistical frequencies. For the relatively short second block, an exhaustive direct testing was performed. We computed and compared to theory the numbers of collisions for the stereoisomers of Spongistatin I (using the whole set of 67,108,864 isomers and its subsets). For the longer first block, we generated, using custom-made software, InChIKeys for more than 3 × 1010 chemical structures. The statistical behavior of this block was tested by comparison of experimental and theoretical frequencies for the various four-letter sequences which may appear in the first block body.From the results of our computational experiments we conclude that the observed characteristics of InChIKey collision resistance are in good agreement with theoretical expectations.
Figures

Molecular skeleton of Spongistatin I.

The observed (circles) and theoretically expected (curve) average number of InChIKey second block collisions vs. the number of considered stereoisomers of Spongistatin I.a) The whole data range; abscissa values: log(number of isomers); b) low-collision region; abscissa values: number of isomers.

The dependence of observed average number of InChIKey second block collisions for 370 000-entry datasets vs. the number of samplings m.

Normalized frequencies of various letters within the first block of InChIKey. Measured using InChIKeys for 1 097 996 constitutional isomers of C8H8Cl3F5; the values are normalized to the frequency of ‘A’.
Similar articles
-
InChI version 1.06: now more than 99.99% reliable.
Goodman JM, Pletnev I, Thiessen P, Bolton E, Heller SR. Goodman JM, et al. J Cheminform. 2021 May 24;13(1):40. doi: 10.1186/s13321-021-00517-z. J Cheminform. 2021. PMID: 34030732 Free PMC article.
-
InChI in the wild: an assessment of InChIKey searching in Google.
Southan C. Southan C. J Cheminform. 2013 Feb 11;5(1):10. doi: 10.1186/1758-2946-5-10. J Cheminform. 2013. PMID: 23399051 Free PMC article.
-
Wohlgemuth G, Haldiya PK, Willighagen E, Kind T, Fiehn O. Wohlgemuth G, et al. Bioinformatics. 2010 Oct 15;26(20):2647-8. doi: 10.1093/bioinformatics/btq476. Epub 2010 Sep 9. Bioinformatics. 2010. PMID: 20829444 Free PMC article.
-
Drahos L, Vékey K. Drahos L, et al. J Mass Spectrom. 2001 Mar;36(3):237-63. doi: 10.1002/jms.142. J Mass Spectrom. 2001. PMID: 11312517 Review.
-
Characterisation of data resources for in silico modelling: benchmark datasets for ADME properties.
Przybylak KR, Madden JC, Covey-Crump E, Gibson L, Barber C, Patel M, Cronin MTD. Przybylak KR, et al. Expert Opin Drug Metab Toxicol. 2018 Feb;14(2):169-181. doi: 10.1080/17425255.2017.1316449. Epub 2017 Apr 23. Expert Opin Drug Metab Toxicol. 2018. PMID: 28375027 Review.
Cited by
-
International chemical identifier for reactions (RInChI).
Grethe G, Goodman JM, Allen CH. Grethe G, et al. J Cheminform. 2013 Oct 24;5(1):45. doi: 10.1186/1758-2946-5-45. J Cheminform. 2013. PMID: 24152584 Free PMC article.
-
Dark chemical matter as a promising starting point for drug lead discovery.
Wassermann AM, Lounkine E, Hoepfner D, Le Goff G, King FJ, Studer C, Peltier JM, Grippo ML, Prindle V, Tao J, Schuffenhauer A, Wallace IM, Chen S, Krastel P, Cobos-Correa A, Parker CN, Davies JW, Glick M. Wassermann AM, et al. Nat Chem Biol. 2015 Dec;11(12):958-66. doi: 10.1038/nchembio.1936. Epub 2015 Oct 19. Nat Chem Biol. 2015. PMID: 26479441
-
InChI, the IUPAC International Chemical Identifier.
Heller SR, McNaught A, Pletnev I, Stein S, Tchekhovskoi D. Heller SR, et al. J Cheminform. 2015 May 30;7:23. doi: 10.1186/s13321-015-0068-4. eCollection 2015. J Cheminform. 2015. PMID: 26136848 Free PMC article.
-
Progress and Impact of Latin American Natural Product Databases.
Gómez-García A, Medina-Franco JL. Gómez-García A, et al. Biomolecules. 2022 Aug 30;12(9):1202. doi: 10.3390/biom12091202. Biomolecules. 2022. PMID: 36139041 Free PMC article. Review.
-
A possible extension to the RInChI as a means of providing machine readable process data.
Jacob PM, Lan T, Goodman JM, Lapkin AA. Jacob PM, et al. J Cheminform. 2017 Apr 11;9(1):23. doi: 10.1186/s13321-017-0210-6. J Cheminform. 2017. PMID: 29086180 Free PMC article.
References
-
- IUPAC International Chemical Identifier (InChI) Programs InChI version 1, software version 1.04 (September 2011) http://www.inchi-trust.org/downloads/ Last accessed 2012-09-12.
-
- Federal Information Processing Standards Publication 180–2 (+ Change Notice to include SHA-224) http://csrc.nist.gov/publications/fips/fips180-2/fips180-2withchangenoti... Last accessed 2012-09-12.
-
- IUPAC International Chemical Identifier (InChI) Programs InChI version 1, software version 1.04 (September 2011), User’s Guide. http://www.inchi-trust.org/fileadmin/user_upload/software/inchi-v1.04/In... Last accessed 2012-09-12.
-
- Graham RL, Grötschel M, Lovász L. Handbook of Combinatorics, Volume 2. Elseveir; 1995.
-
- InChIKey Collision. http://www-jmg.ch.cam.ac.uk/data/inchi/ Last accessed 2012-09-12.
LinkOut - more resources
Full Text Sources