In almost three weeks, the (FIFA) World Cup will start, in Brazil. I have to admit that I am not a big fan of soccer, so I will not talk to much about it. Actually, I wanted to talk about colors, and variations on some colors. For instance, there are a lot of blues. In order to visualize standard blues, let us consider the following figure, inspired by the well known chart of R colors,

BLUES=colors()[grep("blue",colors())] RGBblues=col2rgb(BLUES) library(grDevices) HSVblues=rgb2hsv( RGBblues[1,], RGBblues[2,], RGBblues[3,]) HueOrderBlue=order( HSVblues[1,], HSVblues[2,], HSVblues[3,] ) SetTextContrastColor=function(color) ifelse( mean(col2rgb(color)) > 127, "black", "white") TextContrastColor=unlist( lapply(BLUES, SetTextContrastColor) ) c=11 l=6 plot(0, type="n", ylab="", xlab="",axes=FALSE, ylim=c(0,11), xlim=c(0,6)) for (j in 1:11){ for (i in 1:6){ k=(j-1)*6 + i rect(i-1,j-1,i,j, border=NA, col=BLUES[ HueOrderBlue[k] ]) text(i-.5,j-.5,paste(BLUES[k]), cex=0.75, col=TextContrastColor[ HueOrderBlue[k] ])}}

All the color names that contain “*blue*” in it are here.

Having the choice between several possible colors is interesting, but it can also be interesting to get a *palette* of blue colors, What we can get is the following

library(RColorBrewer) blues=colorRampPalette(brewer.pal(9,"Blues"))(100)

In order to illustrate the use of palette colors, consider some data, on soccer players (officially registered). The dataset – lic-2012-v1.csv – can be downloaded from http://data.gouv.fr/fr/dataset/… (I will also use a dataset we have on location of all towns, in France, with latitudes and longitudes)

base1=read.csv( "http://freakonometrics.free.fr/popfr19752010.csv", header=TRUE) base1$cp=base1$dep*1000+base1$com base2=read.csv("lic-2012-v1.csv", header=TRUE) base2=base2[base2$fed_2012==111,] names(base2)[1]="cp" base2$cp=as.numeric(as.character(base2$cp))

The problem with France (I should probably say *one of the many problems*) is that regions and departements are not well coded, in the standard functions. To explain *where* départements are, let us use the dept.rda file, and then, we can get a matching between R names, and standard (administrative) ones,

base21=base2[,c("cp","l_2012","pop_2010")] base21$dpt=trunc(base21$cp/1000) library(maps) load("dept.rda") base21$nomdpt=dept$dept[match(as.numeric(base21$dpt),dept$CP)] L=aggregate(base21$l_2012,by=list(Category=base21$nomdpt),FUN=sum) P=aggregate(base21$pop_2010,by=list(Category=base21$nomdpt),FUN=sum) base=data.frame(D=P$Category,Y=L$x/P$x,C=trunc(L$x/P$x/.0006)) france=map(database="france") matche=match.map(france,base$D,exact=TRUE) map(database="france", fill=TRUE,col=blues[base$C[matche]],resolution=0)

Here are the rates of soccer players (with respect to the total population) It is also possible to look at rate not by *département*, but by *town*,

base10=base1[,c("cp","long","lat","pop_2010")] base20=base2[,c("cp","l_2012")] base=merge(base10,base20) Y=base$l_2012/base$pop_2010 QY=as.numeric(cut(Y,c(0,quantile(Y,(1:99)/100),10),labels=1:100)) library(maps) map("france",xlim=c(-1,1),ylim=c(46,48)) points(base$long,base$lat,cex=.4,pch=19,col=blues[QY])

The darker the dot, the more player, We can also zoom in, to get a better understanding, in the northern part of France, for instance, or in the Southern part,

We can obtain a map which is not (too) far away from the one mentioned a few months ago on http://slate.fr/france/78502/.