Personnel files are often still kept in paper form

Every employment relationship in Germany includes a personnel file. Many are still available in paper form. While some are still being actively processed, old files serve as proof and essentially only take up space. Digital files have numerous advantages: Greater editing convenience, less space requirement or easier access control. Particular importance is attached to data protection. Personal data must be deleted after certain periods. Even if there are no direct retention periods for personnel files, these result indirectly from laws such as the German Civil Code, the Income Tax Act or the Social Security Code. These periods are between 3 and 30 years and concern individual documents in the personnel file, not the entire file. Strictly speaking, therefore, each paper file must be regularly assessed to determine whether parts need to be destroyed. Otherwise there is a risk of penalties for unlawful data storage. With digital documents, this process is much easier.

Why digitisation of personnel files makes sense

There is therefore a lot to be said for digitising paper-based personnel files. There are many service providers on the market who scan files and make them available as electronic documents. In this way, the individual documents can be imported into the DMS (document management system) and assigned to the appropriate persons and processors. However, in order to ensure the above-mentioned deletion periods and to increase the processing comfort, the individual documents must be sufficiently classified.

The actual digitisation of the files is not a particularly difficult task. Modern scanners manage a few hundred sheets per minute.

Manual pre-processing is complex and expensive

The main effort arises from the necessary manual pre-processing of the files to be digitized. Usually, rule-based systems take over this task after digitization based on the image files. For this purpose, a human being programs how documents are to be recognized. For a high automation rate with low costs, homogeneous document structures and uniform quality are just as necessary as reusable document classes of the companies.

Personnel files usually have special challenges that hinder automation:

  • The files have grown over the years and are therefore not always of the best quality, which impairs readability.
  • Handwritten notes belong to almost every personnel file and cannot be processed de facto by machine.
  • The layout and structure of the documents change over time, so that there are many variants of actually identical document classes.
  • The filing system varies from company to company. For this reason, it is almost impossible to program a standard schema for separation; every project starts from scratch.
  • The enterprises often specify document classes and like to follow the principle of "Wünschdirwas". No chance of reusing the schemas between projects.

Separation and classification are difficult to automate for the reasons mentioned above. As a consequence, a manual process is often used: Employees manually separate the documents with separator sheets or barcodes before scanning.Classification is also done manually in the process.

As a result, the digitization of personnel files is comparatively expensive - too expensive for some interested customers because it is not economical. Furthermore, due to the professional qualifications required to process personnel documents, there are sometimes not enough capacities available. Long waiting times are common or customer inquiries are not served.

Learn more about your benefits of automation with AI in our web seminar

For documents with less structure and complex business requirements, cloud-based platforms with machine learning and AI offer the opportunity to create good extraction results in a short time and at low unit costs. Learn how this works in the web seminar!

Let machines learn the process!

Artificial Intelligence and Machine Learning can learn the entire processing of incoming documents in a short time with just a few examples and apply them to new documents. The starting point is a paradigm shift compared to the previous way of working:

  • Specialization: Instead of programming comprehensive applications that cover the entire process, AI services are used as building blocks for specific tasks. These can be trained faster and better than comprehensive functions that should be able to do "everything". 
  • Training instead of programming: Instead of programming rules, intelligent systems learn the processing steps from a technical expert or trainers using a few examples. In this way, fast implementation can be combined with high flexibility.
  • Human in the loop: Based on their security (confidence), the AI suggests examples in the training and singles out cases in the dark processing. In this way, a continuous improvement process is created with minimal effort and the human keeps control of the process and its quality.

Achieve more with AI with less effort

Modern AI platforms apply the principles mentioned above and combine individual, specific AI services into an efficient and effective overall process. With regard to the digitisation process of personnel files, Machine Learning allows a large part of the manual work to be shifted from scan preparation to digital post-processing and carried out automatically:

  • Optimization: AI learns to distinguish user data from disturbances on documents. Optical disturbances, such as yellowed paper colors or document backgrounds, can be compensated to improve text recognition and extraction.
  • Structuring: AI modules are grouped without human intervention: The documents are alreadygrouped at according to visual and textual characteristics in order to control the further processing in a targeted manner. In this way, certain recurring document types are grouped into so-called clusters with a high automation rate.
  • Separation: Based on patterns, AI systems learn to automatically separate incoming documents. To do this, the AI independently uses page numbers, layout information and optical features.
  • Classification: Also by means of examples AI systems learn to recognize document classes with high accuracy. And this is independent of layout or keywords.
  • Extraction: By marking relevant data on some training documents by a trainer, the AI learns to automatically recognize relevant technical data on all other documents. 

By cleverly combining the services described above, very short implementation and training phases can be combined with low unit costs.  

Conclusion: Faster digitization of personnel files with AI at lower costs

By the combined use of several AI or machine learning methods, more quality can be achieved with less effort when digitizing personnel files:

  • By automating the processing steps, the scan preparation is relieved and the costs are significantly reduced.
  • The improved recognition quality of the AI further reduces the effort and costs for manual post-processing.
  • Lower costs mean that lower prices can be realized and additional customers can be won for whom digitization of personnel files was previously not economically viable.
  • Capacity bottlenecks can be eliminated by reducing the need for skilled labour.
  • The additional capacity means that even very large orders can be processed more quickly.

The introduction and use of such solutions does not require a multi-year project and comparatively low investment. With powerful, service-oriented architectures such as the inserve platform, digitisation processes for personnel and other files can be automated in a short time.

Questions or comments? We look forward to your feedback on the article and a personal exchange.

We will be happy to answer any questions you may have regarding the use of AI for the digitization of personnel files.