• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Sony patents a machine learning image upscaling method

kyliethicc

Member

Discusses a method of using ML to improve image quality and fix rendering flaws - called holes.


Mentions specifically this could be used for VR headsets.

"The computing device may be a virtual reality device. The virtual reality device may be a virtual reality headset."

"Virtual reality headsets require much higher computing power to display a satisfactory image to a user than a conventional computer monitor. This is because the monitors of a virtual reality headset are much closer to a user's eyes, subtend a much larger angle, and operate at a higher and sustained frame rate. Providing a virtual reality headset configured according to the method as described above provides the advantage of requiring much less computing power without sacrificing image quality where it is needed most for maintaining user comfort and immersion."



DHqh95G.jpg




Claims

"Claims

1. A computer-implemented method for completing an image, the method comprising: dividing image data of an image to be completed into a plurality of image portions; applying a first filling process to fill a first image portion comprising a first hole, the first hole associated with a first quantity and/or a first quality; applying a second filling process to fill a second image portion comprising a second hole, the second hole associated with a second quantity different to the first quantity and/or a second quality different to the first quality, the second process being different to the first process; and combining the filled first and second image portions to complete the image.

2. The computer-implemented method of claim 1, comprising generating a mask of the image to be completed, performing at least one morphological operation on the mask to generate an altered mask, and determining to apply the first filling process and the second filling process based on respective first and second quantities and/or qualities of the altered mask.

3. The computer implemented-method of claim 2, comprising determining to apply the first filling process on a first hole based on an absence of a hole corresponding to the first hole in the altered mask, and determining to apply the second filling process on a second hole based on a presence of a corresponding hole in the altered mask.

4. The computer-implemented method of claim 2 or claim 3, wherein the at least one morphological operation includes erosion and/or dilation.

5. The computer-implemented method of claim 1, comprising applying the first filling process to the first image portion in response to a determination that the first hole has a dimension smaller than a first threshold value. 6. The computer-implemented method of claim 1 or claim 2, comprising applying the second filling process to the second image portion in response to a determination that the second hole has a dimension larger than a second threshold value.

7. The computer-implemented method of any preceding claim, wherein the first filling process includes filling a pixel of a hole of the first type according to an average of surrounding pixels.

8. The computer-implemented method of claim 7, comprising determining the average using surrounding pixels having a material identification which is the same as the pixel to be filled.

9. The computer-implemented method of claim 7 or claim 8, wherein the determination of the average includes weighting values of the surrounding pixels to be averaged.

10. The computer-implemented method of any preceding claim, wherein the second filling process includes a machine learning inference process.

11. The computer-implemented method of claim 10 wherein the machine learning inference process is implemented by a data model, said data model or associated with an artificial neural network (ANN).

12. The computer-implemented method as claimed in claim 11 wherein the ANN is a convolutional neural network (CNN).

13. The computer-implemented method of any preceding claim, further comprising combining the filled image portions with image portions that were not filled by the first and second processes.

14. A computing device comprising one or more processors that are associated with a memory, the one or more processors configured with executable instructions which, when executed, cause the computing device to carry out the computer-implemented method of any preceding claim.

15. The computing device of claim 14, wherein the device is configured to receive image data of an image to be completed from a server, such as a cloud-based image rendering server, and/or an image capture device or processor associated therewith.

16. The computing device of claim 14 or 15, wherein a hole of the received image data corresponds to an occluded area.

17. The computing device of any one of claims 14 to 16, wherein the device is a virtual reality device, such as a virtual reality headset."


Description

"Computer-implemented method for completing an image

Introduction

The present disclosure relates to a computer-implemented method for completing an image and a computer device configured to carry out the method. The present disclosure relates particularly to detecting, categorising, and filling holes in an image according to the categorisation.

Background

Digital images can contain regions of missing or corrupted image data. Missing or corrupted regions are referred to in the art as "holes". Holes are normally undesirable, and methods of inferring what information is missing or corrupted are employed to fill the holes. Filling holes in images is also referred to as image completion or inpainting.

A variety of processes exist for filling holes in images. Machine learning inference techniques, which rely on trained processes, can fill holes in images with high-quality results. However, machine learning techniques are performance intensive, requiring powerful computer hardware and a large amount of time.

Holes in images arise in image-based rendering systems. For example, where there are two or more images representing perspectives of the same environment, there may be no image data corresponding to an intermediate perspective that a user would like to see. Alternatively, there may be some image data missing from one of the perspectives. Machine learning processes may be used to infer the intermediate perspective and to infer the missing image data. Executing machine learning processes to obtain missing data is computationally costly and time consuming.

An example of an image-based rendering system is a virtual reality device displaying a virtual reality environment. A user wearing a virtual reality headset is presented, by two monitors in the headset, with a representation of a three-dimensional scene. As the user moves their head, a new scene is generated and displayed according to the new position and orientation of the headset. In this way, a user can look around an object in the scene. Areas of the initial scene which become visible in the new scene due to the movement are described as being previously "occluded". The displayed scenes may be generated by computer hardware in a personal computer or console connected to the headset, or by a cloud-based rendering service remote from the headset. A rate at which image data is supplied to the headset is limited by bandwidth of the connection between the headset and the computer, console, or the cloud-based rendering system. Consequently, sometimes, not all the data required at a given time to entirely construct and display a scene is available due to bandwidth limitations or interruptions. Holes in the image data making up the scene are an undesired result and have a significant negative impact on the immersion experienced by the user.

Summary and Statements of the present disclosure According to a first aspect of the present disclosure, there is provided a computer- implemented method for completing an image, the method comprising: dividing image data of an image to be completed into a plurality of image portions; applying a first filling process to fill a first image portion comprising a first hole, the first hole associated with a first quantity and/or a first quality; applying a second filling process to fill a second image portion comprising a second hole, the second hole associated with a second quantity different to the first quantity and/or a second quality different to the first quantity and/or quality, the second process being different to first process; and combining the filled first and second image portions to complete the image.

A hole of the received image data may correspond to an occluded area. The computer-implemented method may comprise generating a mask of the image to be completed, performing at least one morphological operation on the mask to generate an altered mask, and determining to apply the first filling process and the second filling process based on respective first and second quantities and/or qualities of the altered mask.

By generating an altered mask, holes are more quickly identifiable as having particular qualities and/or quantities, which makes categorising the holes based on the qualities/quantities faster and more versatile.

The computer implemented-method may comprise determining to apply the first filling process on a first hole based on an absence of a hole corresponding to the first hole in the altered mask, and determining to apply the second filling process on a second hole based on a presence of a corresponding hole in the altered mask.

By determining to apply the first filling process based on absence of a hole in the altered mask, and the second filling process based on presence of a hole in the altered mask, a particularly simple categorisation is provided which enables yet faster categorisation of holes to be filled. Thereby, comparatively less powerful computers are made able to perform the method to obtain better quality images more quickly.

The at least one morphological operation may include erosion and/or dilation.

A (first or second) quantity associated with a hole may be the hole size or shape or one or more dimensions associated with the holes that may be numerically quantified. In some examples this may be the number of pixels that may be associated with the hole. A (first or second) quality may be one or more features such as pixel resolution, brightness etc. of the hole.

By dividing image data into image portions, and applying different filling processes to the tiles depending on a quantity and/or quality associated with holes in the tiles, the method advantageously reduces the proportion of the image that requires any processing to remove holes and is more versatile, scalable and adaptable to filling holes in a variety of different images.

The computer-implemented method may comprise applying the first filling process to the first image portion in response to a determination that the first hole has a dimension smaller than a first threshold value.

The computer-implemented method may comprise applying the second filling process to the second image portion in response to a determination that the second hole has a dimension larger than a second threshold value.

This advantageously allows holes having smaller dimensions to be processed differently to those with larger dimensions, further increasing the versatility of the method when processing images with a range of hole dimensions.

The first filling process may include filling a pixel of a hole of the first type according to an average of surrounding pixels. This advantageously provides a fast and computationally inexpensive way of filling a hole.

The computer-implemented method may comprise determining the average using surrounding pixels having a material identification which is the same as the pixel to be filled.

This advantageously enables the hole to be filled quickly and efficiently, while increasing the likelihood of achieving a high-quality result. Nearby pixels having different material identifiers to the hole pixel are more likely to look different to the missing pixel data, than those with matching material identifiers. Therefore, using pixels with the same material identifiers advantageously reduces the computational burden on a processor, while more closely achieving an appropriately filled pixel. The determination of the average may include weighting values of the surrounding pixels to be averaged.

This enables some surrounding pixels to contribute more to the average than others, thereby advantageously increasing the versatility of the filling process according to the image being processed. The second filling process may include a machine learning inference process.

Machine learning inference processes provide high quality image filling results. By providing a machine learning inference process as the second filling process, advantageously an improved balance between speed and quality of image processing is achieved.

The computer-implemented method may comprise combining the filled image portions with image portions that were not filled by the first and second processes.

This provides the advantage of reconstructing a complete image without needing to process image portions that do not contain holes, thereby advantageously increasing the speed of the method.

According to a second aspect of the present disclosure, there is provided a computing device comprising one or more processors that are associated with a memory, the one or more processors configured with executable instructions which, when executed, cause the computing device to carry out any computer-implemented method described above. The computing device may be configured to receive image data of an image to be completed from a server. The server may be a cloud-based image rendering server. The computer device may be configured to receive data of an image to be completed from an image capture device, or from a processor associated with the image capture device.

Advantageously, filling holes in occluded areas reduces the reliance on receiving all rendered data corresponding to the occluded areas when the occluded areas become visible, thereby reducing the load on the computer device doing the rendering.

The computing device may be a virtual reality device. The virtual reality device may be a virtual reality headset.

Virtual reality headsets require much higher computing power to display a satisfactory image to a user than a conventional computer monitor. This is because the monitors of a virtual reality headset are much closer to a user's eyes, subtend a much larger angle, and operate at a higher and sustained frame rate. Providing a virtual reality headset configured according to the method as described above provides the advantage of requiring much less computing power without sacrificing image quality where it is needed most for maintaining user comfort and immersion."


Figures

fmi30IV.jpg

QyOYUGu.jpg

HwltMVP.jpg

kNP1nAu.jpg

jlUlsg2.jpg



Get ready boys, Cerny's gonna fill your holes.

YVcEFlt.png
 
Last edited:

MikeM

Member
If this ends up being on par with DLSS, then I could see this extending the generation length (sadly).
 

kyliethicc

Member
Based on the description this isn't like DLSS as it's not leveraging Neural Networks.
Claims 11 and 12

11. The computer-implemented method of claim 10 wherein the machine learning inference process is implemented by a data model, said data model or associated with an artificial neural network (ANN).

12. The computer-implemented method as claimed in claim 11 wherein the ANN is a convolutional neural network (CNN).
 
Last edited:
Claims 11 and 12

11. The computer-implemented method of claim 10 wherein the machine learning
inference process is implemented by a data model, said data model or associated with
an artificial neural network (ANN).

12. The computer-implemented method as claimed in claim 11 wherein the ANN is a
convolutional neural network (CNN).
Oh thanks for clarifying. Doesn’t solve the hardware issue though. RTX cards are considerably more performant when it comes to ML. No reason to believe there’s any way at all feasible for consoles to match that. Its a Pipedream.
 

kyliethicc

Member
Oh thanks for clarifying. Doesn’t solve the hardware issue though. RTX cards are considerably more performant when it comes to ML. No reason to believe there’s any way at all feasible for consoles to match that. Its a Pipedream.
One of the names on the patent wrote this on their LinkedIn.

"Most recently I've spearheaded work in using Neural Rendering to enhance traditional rendering methods, focusing on using implicit neural representations and how to make them run efficiently. This includes creating custom high performance inference using compute shaders. The target for this work is PlayStation 5. Development done in Python, Unity, C#, compute shaders with offline training using Pytorch."

So that seems to be referring to this patent or similar tech.
 

PSYGN

Member
If this ends up being on par with DLSS, then I could see this extending the generation length (sadly).

I remember nVidia taking AI/deep learning/neural networks far more seriously than their competitors or other tech giants. We are seeing the fruits of their labor and research, if Sony comes out with something I don't expect it to be on par with DLSS not for a while.

Advances in hardware have driven renewed interest in deep learning. In 2009, Nvidia was involved in what was called the “big bang” of deep learning, “as deep-learning neural networks were trained with Nvidia graphics processing units (GPUs).”[87] That year, Andrew Ng determined that GPUs could increase the speed of deep-learning systems by about 100 times.[88] In particular, GPUs are well-suited for the matrix/vector computations involved in machine learning.[89][90][91] GPUs speed up training algorithms by orders of magnitude, reducing running times from weeks to days.[92][93] Further, specialized hardware and algorithm optimizations can be used for efficient processing of deep learning models.
 
I remember nVidia taking AI/deep learning/neural networks far more seriously than their competitors or other tech giants. We are seeing the fruits of their labor and research, if Sony comes out with something I don't expect it to be on par with DLSS not for a while.
the other tech giants just don't have "consumer" products that technically "benefit" from it. It's all back end and presented in a way to the public as... "it just works"
 

BennyBlanco

aka IMurRIVAL69
I remember nVidia taking AI/deep learning/neural networks far more seriously than their competitors or other tech giants. We are seeing the fruits of their labor and research, if Sony comes out with something I don't expect it to be on par with DLSS not for a while.

Nvidia bet big on this stuff over 10 years ago and it’s just starting to pay off now. Everyone else is playing catch up.
 

01011001

Banned
Takes a long time to develop models,very computationally expensive,then you have to integrate (hopefully) at a hardware/software level.
Only example of ML currently on XB is auto HDR which is how it’s achieved for “free”

yeah but they showed it off back in the day with Forza Horizon 3 if I remember correctly... so it has been a while.

the Series X|S also has ML acceleration so to speak, so it would be a missed opportunity if they don't plan on actually using it
 

PaintTinJr

Member
Nvidia bet big on this stuff over 10 years ago and it’s just starting to pay off now. Everyone else is playing catch up.
Not really, it is just about visibility for each companies efforts IMO.

For example both Sony's Hi-res audio mp3 hi-res audio reconstruction, and their original 4K X9 model TV using the X1 processor (for picture enhancement) have used ML to get their enhancement since about 2014.
 
It won’t be. PS5 doesn’t have tensor cores nor does this patent mention anything about deep learning processes which DLSS relies upon.

Proper HW acceleration of AI isnt possible due to lack of HW unfortunately. Maybe in the pro versions of these consoles.
According to one of the inventor, it's for PS5.

working on the application of machine learning and neural rendering to real-time graphics on the PlayStation 5.
 

lh032

I cry about Xbox and hate PlayStation.
upscale 4k, max ray tracing, 60fps exclusive graphic mode on PS5.

Licking Jackie Chan GIF
 
Last edited:
Top Bottom