A pixel is not a tiny square


I had thought of a ‘pixel’ as the smallest unit of digital space, like a tiny square, and that digitial images are made of up such units. This article says it is not that simple and that it is related to Fourier transforms, in which any wave form can be decomposed into the sum of sinusoidal waves of different frequencies.

Perhaps the most unexpected person in this story – at least for readers in the United States – is Vladimir Kotelnikov, the man who turned Fourier’s idea into the pixel.

Early in his career, Kotelnikov showed how to represent a picture with what we now call pixels. His beautiful and astonishing sampling theorem, published in 1933, is the foundation of the modern picture world.

A pixel exists only at a point. It’s zero-dimensional (0D), with no extent. You can’t see a pixel.

What devices do is ‘spread’ those pixels to create an image.

You carry pixels around in your cellphone, say, stored in picture files. You cannot see the pixels. To see them, you ask for a picture file to be displayed. Typically, you ‘click on it’. Because of the astounding speed of today’s computers, this seems to happen instantaneously. The digital pixels are sent to the display device, which spreads them with the little glowing spots on the display’s screen. The act of display is the process I just described and diagrammed. Those glowing spots are actual pixel-spreaders at work.

Many people call these spots pixels – a very common error. Pixels are digital, separated, spiky things, and are invisible. The little glowing spots are analogue, overlapped, smooth things, and are visible. I suggest we call each a ‘display element’ to distinguish it from a ‘picture element’ (which is what the word ‘pixel’ abbreviates). Display elements and pixels are fundamentally different kinds of things. Display elements vary from manufacturer to manufacturer, from display to display, and over time as display technologies evolve. But pixels are universal, the same everywhere – even on Mars – and across the decades.

I have not invoked a little square even once in this discussion. A pixel – an invisible 0D point – cannot be a square, and the little glowing spot of light from a display device generally isn’t either. It can be, but only if the spreader is a hard-edged box – an unnatural shape, a mesa with a square footprint. A square mesa is a jarringly crude approximation to the gentle hillock spreader supported by the sampling theorem.

So why do so many people think that pixels are little squares? The answer is simple: apps and displays have fooled us for decades with a cheap and dirty trick. To ‘zoom in’ by a factor of 20, say, they replace each pixel with a 20-by-20 square array of copies of that pixel, and display the result. It’s a picture of 400 (spread) pixels of the same colour arranged in a square. It looks like a little square – what a surprise. It’s definitely not a picture of the original pixel made 20 times larger.

I cannot effectively summarize the long article because it is quite dense and contains a lot of images but I encourage interested people to follow the link.

Comments

  1. consciousness razor says

    That article doesn’t really go into it, but people have the same misconception about audio samples. They’re points, not tiny line segments.

  2. says

    Those who used the Apple II or other early home computers are familiar with this. Pixels on the Apple II screen (280x192 resolution) were individually visible. Up close, it almost looked like the ends of tightly stacked hexagonal pencils. Pixels tended to “bleed” into one another, giving the illusion of more colours by mixing them, intentionally as Steve Wozniak did, or unintentionally as you might see in text mode on composite video.

    https://paleotronic.com/2018/10/03/apple-ii-colour-computer-graphics/

  3. jenorafeuer says

    A similar thing exists with sound processing. On CDs each sample is a point, and proper decoding of the data essentially replaces every point with a normalized sinc function (sin(πx)/πx) which is the Fourier transform of a rectangular function. Any CD player that advertised ‘subsampling’ or the like was explicitly doing this in order to reduce some of the high frequency noise that might otherwise be generated.

    Of course, since the sampling dimension in sound is time rather than space, and the sinc function is infinite in both directions, no device ever actually used a full sinc function because all the times in between the sampling points rely on all samples both before and after. Instead CD players used a truncated sinc function that only went out half a dozen samples in either direction. A delay of sound production by a half dozen samples at 44.1kHz is still less than a millisecond, so nobody is going to notice. In sound, lag (average delay) is far less important than jitter (variation in delay).

    There’s actually another way in which pixels are not little squares: on actual monitors, the red, green, and blue components of any given display element are not in the same physical location. ‘Subpixel rendering’ is a thing, especially with fonts; it’s called ‘ClearType’ on Windows, and essentially they use different colours based on the actual element locations to create lines that are finer than the ‘pixel’ model allows:

  4. JM says

    The author is also mixing and matching a couple of different things. The data stored in a computer image may or may not conform to his idea of a pixel depending on format. Different types of displays must be sent different information which mostly won’t match his usage.

    The word pixel itself is defined in a way that defies his usage of the word. Pixel is the smallest addressable element of a display or image. The sort of mathematical model images he is talking about don’t have a smallest point* because a change could always be made to the image that requires adding new points to the list of point elements.

    His idea of moving definitions around to make them more clear would be a futile quest at this point.

    * in a mathematical sense. Image editors have program limits.

  5. says

    Speaking as an electrical engineering professor with a specialty in digital signal processing, let me just say that there is a difference between the mathematics of A/D sampling and the realization of ADCs and DACs in the physical world. For example, a popular method of digitizing an audio signal is to “freeze” the signal at an instant in time and then turn the resulting voltage level into a binary value (i.e., quantize it) using a relatively simple (in concept) digital logic circuit. So, while we can certainly talk about the abstract mathematical concept of convolving the input function with a series of unit impulse functions, the reality is that it is implemented the way it is often explained (the idea of deriving a “vertical line” meeting the audio signal). The same is true for video. There really are “pixels” in that we have individual devices that make up the display.

    Don’t get me wrong, understanding the mathematics of a Fourier decomposition of the resulting waveform (i.e., its spectrum) is crucial for engineers working with these circuits, but generally we don’t implement them in a purely mathematical way*. The problem that I see as a teacher is that, if you just rely on the implementation and don’t understand the mathematics behind it, you can come to false conclusions about its operation (e.g., that sampling an audio signal at a rate many times higher than twice the source bandwidth will yield more information, assuming all else is equal).

    *Of course, the signals can be manipulated “in a purely mathematical way” once digitized, but I’m referring to the ADC (analog to digital converter) and the DAC (digital to analog converter) which is the physical interface between the human, analog world and the digital domain.

  6. says

    And I should add that, yes, I understand that a physical pixel is really an RGB triplet, but that’s what we like to call “an implementation detail”. If we could somehow put all three of them in the same exact location, we would.

    What I think is of greater interest, and something that most users don’t fully grasp, is why we use RGB in the first place. Or as I like to tease my students, does your dog see what you see on the TV? Or even better, if we eventually meet up with intelligent extraterrestrials, what will they make of our TV and computer displays?

  7. John Morales says

    davex, same guy, same content. Been at it for a while, I see.

    What I think is of greater interest, and something that most users don’t fully grasp, is why we use RGB in the first place.

    Existing technology and purpose. For printing, easier to use CMY.

  8. consciousness razor says

    The problem that I see as a teacher is that, if you just rely on the implementation and don’t understand the mathematics behind it, you can come to false conclusions about its operation (e.g., that sampling an audio signal at a rate many times higher than twice the source bandwidth will yield more information, assuming all else is equal).

    Agreed. On the other hand, the takeaway from some of the math (not so much the math itself) can lead people astray too…. As this video explains pretty well, there’s no need for higher sample rates for audio on the consumer end. We’ve had 44.1 or 48 kHz standards for a long time, and there will never be any purpose to raising it with newer formats or technologies. We’re done. It doesn’t get any better than this. That’s the sort of thing that’s taught to a broad, not-too-specialized audience, because that’s the sort of thing that’s useful for them to know.

    However, it’s a different story on the production side. Of course, there’s still a practical limit which is enough to avoid noticeable aliasing and so on. The point is just that you don’t want to get confused into thinking it can’t ever matter to have a significantly higher sample rate. It only doesn’t matter for the consumer to get more than that.

  9. Holms says

    Mano,

    I had thought of a ‘pixel’ as the smallest unit of digital space, like a tiny square, and that digitial images are made of up such units.

    You are right. Zooming in on any screen will show you its square pixels; you may even be able to do this by leaning in and squinting at your current screen to see them. This is how the word ‘pixel’ is used, so that one of its meanings.

    ___

    #9 davex, #10 John
    Seems like this guy has a bit of a bee in his bonnet on this topic. I bet he brings it up at parties.

    Meanwhile, the world continues to use the word to refer to the little square units of a screen.

  10. davex says

    @John Morales: Alvy’s full title covers voxels too: “A Pixel Is Not A Little Square, A Pixel Is Not A Little Square, A Pixel Is Not A Little Square! (And a Voxel is Not a Little Cube)”

    I work with ocean models with discrete cells, and it works the same way — you have to consider your data as point samples of continuous properties located at various locations of interest. We put temperature, salinity, and density at our cell centers, and fluxes through the centers of faces between cells. You can interpolate the data to wherever you like in space or time, but to do math on it, you have to be careful about what the numbers you store and calculate with actually mean.

    When the 1°x1° GFS atmospheric model says the temperature is 37.0°C at 90W 45N and 36.0°C at 90W 38N, it isn’t saying that there’s a 1°x1° perfectly constant platonic pixel box surrounding each integral coordinate, it’s modeling a continuous process. At the root, imagery models are the same: point samples in space.

  11. John Morales says

    davex:

    I work with ocean models with discrete cells, and it works the same way — you have to consider your data as point samples of continuous properties located at various locations of interest.

    Sure. I myself did a stint at digitising tidal chart readings, back in the day (1979 or so).

    I mean, I get what’s being said, but again, I’ll refer to what JM wrote above.
    Point (heh) being, it’s in the name: pixel → picture element.

    They’re intended as the smallest displayable entities in imaging, whereas you’re referring to data points for modelling. Different concepts.

    (Think of a phone screen: it has an actual number of pixels. They represent discrete physical thingies, not some sort of abstractions)

  12. says

    The definition of pixels have changed considerably since bitmap monochrome displays, which were the original bitmapped pixels. When someone says a screen is 4k they mean it has pixel dimensions of 3840x2160 pixels -- or something like that. The pixels then have a number of bits assigned to each to store color information. That was 1 bit, then 4, 8, 32 and now 48. Image encodings like jpeg use fourier transforms to compress the image data, which is then decoded to render specific color values in individual pixels.

    Pixels are a measure of the underlying hardware of a display and “dots per inch” or DPI relates to encoding. If an image is being decoded for printing on an inkjet printer, for example, it is rendered (nowadays) as probabilities that a certain density of ink will be output -- but the printer’s underlying hardware still computes it into a something by something matrix of pixels then attaches the probabilities to those regions.

    DPI is a measure of image rendering while pixels are a unit of hardware resolution. If someone is talking about pixels being the compression algorithm’s encoding they simply don’t understand the relationship between what is the capability of the hardware medium versus the encoding. There are lots of encodings including uncompressed bit-maps like GIF and TIFF, and run-length or even fractal encodings. Those are best thought of as an abstraction layer that sits above the underlying hardware.

  13. says

    Zooming in on any screen will show you its square pixels

    No!!

    Pixels are the hardware unit and they are not necessarily square!! They can be rectangular and I believe an old TV encoder sometimes produced hexagonal ‘pixels’! Early generation color LED billboards encoded a single ‘pixel’ with 3 or 4 LEDs (rgb/w) in close proximity. In other words the hardware supports ‘pixels’ that may be composites of smaller units. The hardware is allowed to lie about the pixels, so long as they are a dot + bit depth color map.

  14. says

    Pixels on the Apple II screen (280x192 resolution) were individually visible.

    Put a drop of clear water on the screen of an iphone and it will act as an enlarging lens so you can see the pixels in the phone’s display. They’re still there they just get smaller and smaller.

    The first computer I used had ‘pixels’ produced by a print head in paper. It was 80 pixels wide by 23 deep, with the entire ASCII character set as the ‘bit depth’ ;). A DEC vt52 (I still have mine!) produced the same ‘size’ output resolution but you could erase individual cells (pixels) on the screen. We didn’t even talk about pixels until the first bitmapped displays started turning up.

    I used my old Sun 3/50 as a terminal because of that gorgeous monochrome bitmap display -- 1600x1280 on a 19″ display; it was gorgeous. And the cat loved how hot the monitor would get. Good times.

  15. davex says

    Pixels might mean a particular thing on a particular piece of hardware, or in a particular file encoding format, but when you need to translate image data across devices (or even different scales and rotations on the same device, you need a better data model than the little box/little voxel model. Alvy’s description of Kotelnikov’s work with 0D point estimates and deconstruction/reconstruction filters is the better abstract data model to think about pixels across devices or transformations.

    This same abstraction is important in translating the varying mix of 50m to 5000m triangles of my models into GIS compatible rasters for continent to meter-scale zooms--the finite volumes aren’t homogenous boxes, they are estimates of various properties defined at various points.

  16. John Morales says

    davex:

    when you need to translate image data across devices (or even different scales and rotations on the same device, you need a better data model than the little box/little voxel model.

    Why?

    As long as the notional pixel/voxel is no larger than the smallest possible display element of the display device, I can’t see such a need.

  17. davex says

    The notional pixel/voxel is often larger than the smallest possible display element when zooming, when watching low resolution videos on high-resolution screens, and in my hydrodynamic models, when I’m showing a 50m-sided equilateral triangular element near a 1m resolution coastline.

    Thinking of a pixel as a point sample rather than as a homogenous box at the resolution of the generating device/program ensures the “pixel” is no larger than the smallest possible display element of any possible display device.

    I don’t see the need to induce artificial discontinuities between pixels, which is what the little box/voxel model does.

  18. John Morales says

    I don’t see the need to induce artificial discontinuities between pixels

    Can’t avoid that.
    Most imaging (e.g. imagers) generates a dataset composed of a collection of pixels. If those were considered 0D as you suggest, that would introduce a requirement for interpolation and spacing, lest the entire rendering also be a point.

    But sure, there are other ways of encoding images, for example vector graphics, that don’t use pixels.

  19. davex says

    “”I don’t see the need to induce artificial discontinuities between pixels”

    Can’t avoid that.”

    Yes, you can avoid that by using Alvy’s interpretation of pixels as Kotelnikov’s samples in space.

    You still need interpolation and spacing for any zooming--whether the interpolation algorithm is nearest neighbor box filter, subpixel antialiasing, or reconstruction filters.

Leave a Reply

Your email address will not be published. Required fields are marked *