No description
Find a file
2024-06-30 22:17:16 +01:00
.vscode Open big ipynbs 2024-06-30 21:48:00 +01:00
assets Update renders 2024-06-30 22:17:16 +01:00
data Add example img 2024-06-18 22:06:12 +01:00
src Update renders 2024-06-30 22:17:16 +01:00
.gitignore Update training 2024-06-27 22:30:27 +01:00
README.md Draft 2024-06-30 22:14:52 +01:00

Neural photo histogram enhancement

Example of the network enhancing the colour of old digital photographs.

This project trains a neural network for automatically editing the style of digital photographs by learning a mapping from histograms of "bad" images to their aesthetic counterparts. Thus, both the inputs and outputs of the network are 3D RGB histograms: \text{bin}_{\text{red}} \times \text{bin}_{\text{green}} \times \text{bin}_{\text{blue}} \to \text{bin}'_{\text{red}} \times \text{bin}'_{\text{green}} \times \text{bin}'_{\text{blue}}

By only exposing histograms to the network, we allow it to learn style-tranfer while eliminating the risk of changing the underlying structure of the source image in the process which is a shortcoming of existing deep learning-based approaches 1 & 2. At the same time, non-linear transformations of the RGB colour distribution allow for much greater flexibility than predefined global adjustment tools such as Brightness or Contrast.

Training overview

The neural network has been trained on community-acclaimed photos of Unsplash 3. We take each photo and apply various edits to them, these include:

  • Non-linearly transforming the brightness (including gamma correction)
  • Adding color spill (colour temperatue & tint)
  • Adjusting the saturation of different colours
  • Changing the contrast
  • Adding noise

An example image from the dataset together with 8 random edits applied to it is shown below:

a 3 by 3 grid of the original an 8 edited images of a puppy

Image source: https://unsplash.com/photos/long-coated-brown-and-gray-puppy-covered-by-white-jacket-on-persons-lap-PZuIash2jZU

We can simply take the 3D RGB histograms of the photos' pixels. These are the following (with 1-to-1 correspondance with the above):

a 3 by 3 grid of 3D histograms

For an interactive version of the histogram, open inference.ipynb.

Using these, the training task of the neural network is to predict the "Original" histogram given a single "Edit" histogram. The loss function is KL divergence.

After hyperparameter optimisation, the best performing model produces the following histogram predictions:

a 3 by 3 grid of predicted 3D histograms

As we can see, the predictions closely line up with the original histogram. The model's ability to learn the mapping of "photo enhancing" is further supported by the results we get when appyling the predicted histograms to the edited images.

a 3 by 3 grid of the 9 predicted styles of an image of a puppy

Image source: https://unsplash.com/photos/long-coated-brown-and-gray-puppy-covered-by-white-jacket-on-persons-lap-PZuIash2jZU

We apply the predicted histograms to the source image using Pitie's method 4.

Background

Histogram-based colour transfer has been already explored 5 to transfer colours between different images but not for enhancing the image's colours.

The input dataset's images "aesthetic goodness" serves as the baseline that the network is aspiring to learn a mapping to from less-aesthetic images. Thus, the high quality of the input images is paramount to the the network's success. The Unsplash 3 dataset seems ideal for the metrics it provides conveying the included images' metrics, such as whether it was featured or its number of views. We could have chosen alternative datasets such as Laion Aesthetic 6, however, this idea was quickly discarded due to the dataset's ethically questionable mode of collection.