Having to deal with thousands of files, pointing out an anomaly in a certain file is quite hard. This is why I would to create a data checker.
Leaning towards converting xml files into excel so maybe it will make it easier to work with the file, but working with xml might also be a choice.
I have done some research but wondering if someone can give me any advice or source on ways that you can check for anomalies in files.
- way is following a tree structure of rules and depending on which branch the files goes into and if it is not within the range it will be classified as an anomaly.
- way would be using Clustering
- machine learning
Can someone suggest me any sources where I can learn more about this or what approaches are recommended?