Skip to main content
Samsung Developer Program

Gear VR - The Basics

Learn the fundamentals before you start your project.


Since the launch of Gear VR, questions have arisen about the differences between monoscopic and stereoscopic 3D videos. Let's discuss both video types, including what makes them different and the challenges that each video type presents.

But first, let's define 360° video. A 360° video is simply a flat equirectangular video that is morphed into a sphere for playback on a VR headset. If a 360° video is monoscopic, it means that both eyes see a single, flat image or video file. If a 360° video is stereoscopic, it means there are two videos - one mapped to each eye - providing depth and 3D appearance. 

For more details on format requirements, see the Milk VR Production Guide.


Monoscopic 360° Video

Monoscopic 360° videos are the most commonly produced VR content. This type of video is usually captured with a single camera per field of view (FOV) and stitched together to form a single equirectangular video. Monoscopic videos have fewer technical challenges and are the easiest and cheapest to produce.

A common camera setup involves six cameras filming six different fields of view. Each camera's footage is stitched together to form a single equirectangular video. The maximum output resolution for 2D monoscopic video on Gear VR is 3480x1920 at 30 frames per second (fps) and 2880x1340 at 60 fps.


Stereoscopic 360° Video

Stereoscopic videos are usually filmed with two cameras per field of view - or one camera mapped to each eye - to give the perception of depth. While this experience is great when done correctly, it is much harder to get right. You need to stitch together camera footage for each eye separately, and then create a side-by-side (SBS) 3D video mapping the left and right video to each eye. The SBS 3D video comes in a few configurations - top and bottom or right and left side-by-side. It is important to note that stereoscopic 3D decreases video resolution because the two side-by-side videos split the resolution of the screen. The maximum output resolution for a 3D video at 30 fps is 1920x960 (half of the 3480x1920 for monoscopic video) or 1440x670 at 60 fps. The lower resolution can lead to blurring and loss of detail. This is why most content creators prefer to create higher resolution, monoscopic videos.


VR Terminology

2K Resolution

2K video resolution is 2048 x 1080 pixel resolution for digital editing and production. Milk VR does not support 2K video resolution.
4K Resolution
4K video resolution is 4096 x 2160 pixel resolution and four times the previous standard. The term "4K" comes from the Digital Cinema Initiatives (DCI), a consortium of motion picture studios that standardized a specification for the production and digital projection of 4K content. Milk VR requires a minimum of 4096 x 2048 resolution (slightly less than 4K) for monoscopic video or 4096 x 4096 resolution (more than 4K) for stereoscopic video. For more information, visit the Digital Cinema Initiatives site.



Equirectangular Projection
The method in which a sphere is mapped to a flat plane. This transforms the meridians - or poles - of the sphere to vertical straight lines and the latitudes to horizontal straight lines. This is commonly used for maps of the Earth. 


Field of View (FOV)
Field of view is the extent of the observable world, seen through an instrument, like a camera or VR headset display.


Monoscopic Video
A video shot with one camera per field of view, producing a flat equirectangular video displaying the same content for each eye.


An object seen from two different points of view will appear to have moved relative to the distant background between the two different views.


Point of View (POV)
Point of view is the camera lens direction, shot from the eye level of the virtual observer.
Side-by-side (SBS)
Side-by-side refers to rendering two separate camera recordings in a single video file, with one recording mapped to the left eye and the other mapped to the right eye. This is used to achieve the stereoscopic 3D effect.



Spherical Video
Video that is wrapped in 360 degrees around the content viewer in a sphere.


Stereoscopic 3D Video
A video shot with two cameras per field of view, producing a single equirectangular video file with two separate video recordings mapped to each eye. Videos can be rendered in one of the following configurations: side-by-side, top and bottom, and 180° hemispheres.


Top and Bottom
Top and bottom refers to rendering two separate camera recordings - one on the top and one on the bottom - in a single video file. This is used for stereoscopic 3D effect, with each recording shown separately to each eye.
The process of combining multiple videos or still images with overlapping fields of view to produce a single panoramic photo or equirectangular video.


  • Was this article helpful?