Hstyle

Historical documents can reveal a great deal of information about our past, such as, form of writing,
wording, content that did not exist and more. In order to perform computational learning (Machine Learning)
a huge amount of classified data (Classified Data) is needed. The process of creating classified data
(Annotations) is expensive and tedious work, and therefore in the field of historical documents,
the databases that exist for training models are small. These datasets do not allow training deep
models to get high results.

In order to create a large database of data, in an easy way that requires less resources, it is necessary
to create synthetic data. In the this project, we researched a method for creating synthetic
historical data and developed a system (website) that allows each user to synthesize documents himself.

Our method is a deep learning method based on neural style transfer. In order to improve
the results of the method, we used several techniques of computer vision, such as Binarization,
Dilation and Image Processing.

This Project was created with Python, FastAPI, TensorFlow, Keras, OpenCV, Angular, Bootstrap and more libraries.

Project Research

In order to understand the steps and what we did you are welcome to look at
the Project Book.

Project Setup and Run

In order to run this project with docker your environment needs to support TensorFlow Docker. You can follow this link to get everything set settled.

Run on local environment:

Clone this repository.
Open cmd/shell/terminal and go to application folder: cd Hstyle/app
Run the docker-compose file: docker-compose -f docker-compose-local.yml up
Open this link.
Enjoy the application.

Run on production environment:

Clone this repository.
Open the following file: Hstyle/app/client/src/environments/environment.prod.ts
In the opened file from step 2 change the API_URL to ‘http://PRODUCTION_IP_ADDRESS:5000‘ where PRODUCTION_IP_ADDRESS is your deployment server IP address.
Open cmd/shell/terminal and go to application folder: cd Hstyle/app
Run the docker-compose file: `docker-compose -f docker-compose-prod.yml up``
Open this link http://PRODUCTION_IP_ADDRESS:3000/ where PRODUCTION_IP_ADDRESS is your deployment server IP address.
Enjoy the application.

Demo

Examples

Content Image	Style Image	Changes Applied To Content Image	Result
		Original
		Original
		Apply dilation
		Apply dilation
		Apply binarization
		Apply binarization
		Apply dilation and binarization
		Apply dilation and binarization
		Replace white background with style average pixel value
		Replace white background with style average pixel value
		Replace white background with style average pixel value + Apply dilation
		Replace white background with style average pixel value + Apply dilation
		Replace white background with style average pixel value + Apply binarization
		Replace white background with style average pixel value + Apply binarization
		Replace white background with style average pixel value + Apply binarization + Apply dilation
		Replace white background with style average pixel value + Apply binarization + Apply dilation

Evaluation

In order to evaluate and determine which technique is best from 3 techniques, which we thought have the best results (Original content image, Dilate content image, Binary content image), we performed a survey of 50 participants and asked them to rate image readability and image historical look, 1-being the lowest (poor) and 5-being the highest (great).

Result for image historical look

Historical Image Readability

As we can see, ‘dilate content image’ and ‘binary content image’ get the highest amount of votes for rate three and above, meaning, these results have the highest readability.

Result for image historical look

Image Historical Look

As we can see, ‘dilate content image’ gets the highest amount of votes for rate three and above, meaning, these results have the most historical look.