If the skin discriminates against the molecule itself, not because it is responsible for the molecule itself, but because it is not very common in nature in the first place and humans have not had the chance to come into contact with it, and therefore the skin discriminates against it as foreign, it cannot match any molecular descriptor that you bring.
I wrote a blog about that.
Some people are quick to say things like “Then you can’t do Materials Informatics”, which is not nice.
Humans have food that they have eaten in their time and in their region.
Ester compounds are found in basic foods, so there are not many toxic compounds.
The only substance with a LD50 below 2000 is Allyl Acetate, which is not very toxic. Therefore, there is basically no skin irritation.
Carboxylic acids and alcohols, which are formed when ester compounds are hydrolysed in the stomach, are also less toxic because they cannot be digested and killed.
Interestingly, the most toxic primary alcohol is C7.
The alcohols used to measure the octanol/water partitioning ratio, logKow, can be used to consider whether highly toxic alcohols, or alcohols that easily partition to that degree of polarity, are bad for the body.
The only carboxylic acid with an LD50 below 2000 is formic acid.
The exceptions are those with functional groups attached to double bonds, which are highly toxic.
Those with large molecules are less toxic, either because they cannot be absorbed.
Lipinski’s ‘Rule of Five’.
5 or less hydrogen bond donors (OH, NH).
10 or less hydrogen bond acceptors (N, O, etc.).
Octanol-water partition coefficient (LogP) of 5 or less.
Molecular weight less than 500.
Anything outside this range is neither poison nor medicine.
In a long preamble, the previous blog.
‘If the molecules themselves are not to blame, and the skin discriminates against them as foreign because they are not very common in nature in the first place and humans have never had the chance to come into contact with them, then whatever molecular descriptors you bring in, they will not fit.’
It would be the first time in the world that these unscientific things are put into machine learning.
Items that do not exist in nature to any great extent are expensive.
So, I suggest that an item to be added to Lipinski’s Rule of Five is whether the price per 500g is more than 5000 yen.
Compound | CAS | Unit | Yen | Positive rate (%) | |
Isopropyl palmitate | 142-91-6 | 500ML | 3,500 | TCI | 0 |
Butyl benzoate | 136-60-7 | 500ML | 3,700 | TCI | 0 |
Heptyl butyrate | 5870-93-9 | 1 kg | 16200 | Aldrich | 0 |
Methyl laurate | 111-82-0 | 500ML | 5,000 | TCI | 0 |
Isopropyl myristate | 110-27-0 | 500ML | 5,100 | TCI | 3.3 |
Methyl caproate | 106-70-7 | 500mL | 6800 | TCI | 0 |
Methyl palmitate | 112-39-0 | 5G | 12,300 | TCI | 3.4 |
Benzyl salicylate | 118-58-1 | 500G | 7,000 | TCI | 0 |
Linalyl acetate | 115-95-7 | 500ML | 11,800 | TCI | 3.2 |
Hexyl salicylate | 6259-76-3 | 250ml | 8420 | Aldrich | 0 |
Of course, with the current depreciation of the yen, 5,000 yen does not make much sense.
Well, the two pricier ones are certainly skin irritants.
Well, in this way, input values for machine learning can sometimes be matched even if the price of the reagent is included.
It might even be the page number of the catalogue in which the reagent is listed.
What is important is what can be deciphered from this, not the purpose of obtaining a black box.
When I see papers that talk about calculated values of MO or calculated values of MD, I want to say something about the price of the reagents, although this is just the ramblings of a cynical person.