Post by Anja MirenskaHi Brian,
dat <- data.frame(x=seq(-100, 1000, by=10), y=seq(-100, 1000, by=10))
ggplot(dat, aes(x,y)) + geom_point()
Here I've got two equally spaced linear scales. Now I try to define a
transformation function (actually, I don't want to transform the data, but
biexp_trans <- function(lim = 100){
trans <- function(x){
vec <- vector(mode = "numeric", length = length(x))
for (i in seq_along(x)){
if (x[i] <= lim){vec[i] <- x[i]} else {vec[i] <- log(x[i], 10)}
}
return(vec)
}
inv <- function(x) {
vec <- vector(mode = "numeric", length = length(x))
for (i in seq_along(x)){
if (x[i] <= lim){vec[i] <- x[i]} else {vec[i] <- 10 ^ x[i]}
}
return(vec)
}
trans_new("biexp-", trans, inv)
}
ggplot(dat, aes(x,y)) + geom_point() + scale_y_continuous(trans="biexp")
The data transformation obviously works, but I don't want to plot the
logarithm of the data, I still want a 1000 to be a 1000. I only want to
change the way the data are displayed: low values (including negative
values) should be displayed on a linear scale, while values above a
particular limit should be plotted on a logarithmic scale. Do you have any
idea how I could achieve this?
Best wishes
Anja
Thanks for the example; it gives me something to work from.
Your transformation are not quite right yet. In particular, you map x to
x when less than the limit, but to log(x) when greater than the limit.
Post by Anja Mirenskabiexp_trans()$trans(dat$x)
[1] -100.000000 -90.000000 -80.000000 -70.000000 -60.000000
[6] -50.000000 -40.000000 -30.000000 -20.000000 -10.000000
[11] 0.000000 10.000000 20.000000 30.000000 40.000000
[16] 50.000000 60.000000 70.000000 80.000000 90.000000
[21] 100.000000 2.041393 2.079181 2.113943 2.146128
[26] 2.176091 2.204120 2.230449 2.255273 2.278754
[31] 2.301030 2.322219 2.342423 2.361728 2.380211
[36] 2.397940 2.414973 2.431364 2.447158 2.462398
[41] 2.477121 2.491362 2.505150 2.518514 2.531479
[46] 2.544068 2.556303 2.568202 2.579784 2.591065
[51] 2.602060 2.612784 2.623249 2.633468 2.643453
[56] 2.653213 2.662758 2.672098 2.681241 2.690196
[61] 2.698970 2.707570 2.716003 2.724276 2.732394
[66] 2.740363 2.748188 2.755875 2.763428 2.770852
[71] 2.778151 2.785330 2.792392 2.799341 2.806180
[76] 2.812913 2.819544 2.826075 2.832509 2.838849
[81] 2.845098 2.851258 2.857332 2.863323 2.869232
[86] 2.875061 2.880814 2.886491 2.892095 2.897627
[91] 2.903090 2.908485 2.913814 2.919078 2.924279
[96] 2.929419 2.934498 2.939519 2.944483 2.949390
[101] 2.954243 2.959041 2.963788 2.968483 2.973128
[106] 2.977724 2.982271 2.986772 2.991226 2.995635
[111] 3.000000
If the transformed value is 2, was the original value 2 or 100? So what
you want is a scale that increases logarithmically above the limit, but
has unique values. This also points out another issue: the relative size
of the two parts of the scales. Numerically, right now, each decade on
the logarithmic scale is the same size as a single unit on the linear
scale. Looking at the example graphs you gave, this isn't the case. Each
decade is around the same size as the original limit (or bigger) [that
is, the space from 0 to 100 in your examples is about the same as the
space between 100 and 1000, 1000 and 10000, 10000 and 100000, etc.]
Adding in this scaling, putting in an offset, and making sure that the
transition around the limit is continuous gives the following
transformation and inverse functions:
trans <- function(x){
ifelse(x <= lim, x, lim + decade.size *
(suppressWarnings(log(x, 10)) - log(lim, 10)))
}
inv <- function(x) {
ifelse(x <= lim, x, 10^(((x-lim)/decade.size) + log(lim,10)))
}
Note that I've also vectorized the functions rather than have an
explicit loop.
This trans on your data (with lim=100 and decade.size=100) gives
[1] -100.0000 -90.0000 -80.0000 -70.0000 -60.0000 -50.0000
[7] -40.0000 -30.0000 -20.0000 -10.0000 0.0000 10.0000
[13] 20.0000 30.0000 40.0000 50.0000 60.0000 70.0000
[19] 80.0000 90.0000 100.0000 104.1393 107.9181 111.3943
[25] 114.6128 117.6091 120.4120 123.0449 125.5273 127.8754
[31] 130.1030 132.2219 134.2423 136.1728 138.0211 139.7940
[37] 141.4973 143.1364 144.7158 146.2398 147.7121 149.1362
[43] 150.5150 151.8514 153.1479 154.4068 155.6303 156.8202
[49] 157.9784 159.1065 160.2060 161.2784 162.3249 163.3468
[55] 164.3453 165.3213 166.2758 167.2098 168.1241 169.0196
[61] 169.8970 170.7570 171.6003 172.4276 173.2394 174.0363
[67] 174.8188 175.5875 176.3428 177.0852 177.8151 178.5330
[73] 179.2392 179.9341 180.6180 181.2913 181.9544 182.6075
[79] 183.2509 183.8849 184.5098 185.1258 185.7332 186.3323
[85] 186.9232 187.5061 188.0814 188.6491 189.2095 189.7627
[91] 190.3090 190.8485 191.3814 191.9078 192.4279 192.9419
[97] 193.4498 193.9519 194.4483 194.9390 195.4243 195.9041
[103] 196.3788 196.8483 197.3128 197.7724 198.2271 198.6772
[109] 199.1226 199.5635 200.0000
[1] -100 -90 -80 -70 -60 -50 -40 -30 -20 -10 0 10 20
[14] 30 40 50 60 70 80 90 100 110 120 130 140 150
[27] 160 170 180 190 200 210 220 230 240 250 260 270 280
[40] 290 300 310 320 330 340 350 360 370 380 390 400 410
[53] 420 430 440 450 460 470 480 490 500 510 520 530 540
[66] 550 560 570 580 590 600 610 620 630 640 650 660 670
[79] 680 690 700 710 720 730 740 750 760 770 780 790 800
[92] 810 820 830 840 850 860 870 880 890 900 910 920 930
[105] 940 950 960 970 980 990 1000
inv is really the inverse of trans.
Post by Anja Mirenskatrans(c(99.99, 100, 100.01))
[1] 99.9900 100.0000 100.0043
Post by Anja Mirenskainv(trans(c(99.99, 100, 100.01)))
[1] 99.99 100.00 100.01
For breaks, I just created a function which called pretty_breaks and/or
log_breaks as appropriate given the range of the data. Putting this all
together (I named it biexp2_trans so that I could have both versions at
once; you would drop the "2" part):
biexp2_trans <- function(lim = 100, decade.size = lim){
trans <- function(x){
ifelse(x <= lim,
x,
lim + decade.size * (suppressWarnings(log(x, 10)) -
log(lim, 10)))
}
inv <- function(x) {
ifelse(x <= lim,
x,
10^(((x-lim)/decade.size) + log(lim,10)))
}
breaks <- function(x) {
if (all(x <= lim)) {
pretty_breaks()(x)
} else if (all(x > lim)) {
log_breaks(10)(x)
} else {
unique(c(pretty_breaks()(c(x[1],lim)),
log_breaks(10)(c(lim, x[2]))))
}
}
trans_new(paste0("biexp-",format(lim)), trans, inv, breaks)
}
And here are examples of use, including some with a larger range of data
and showing the effect of decade.size.
ggplot(dat, aes(x,y)) + geom_point() + scale_y_continuous(trans="biexp2")
ggplot(dat, aes(x,y)) + geom_point() +
scale_y_continuous(trans=biexp2_trans(lim=100, decade.size=200))
ggplot(dat, aes(x,y)) + geom_point() +
scale_y_continuous(trans=biexp2_trans(lim=100, decade.size=200)) +
scale_x_continuous(trans=biexp2_trans(lim=100, decade.size=200))
dat2 <- data.frame(x=c(seq(-100, 1000, by=10), seq(1000, 100000, by=1000)),
y=c(seq(-100, 1000, by=10), seq(1000, 100000, by=1000)))
ggplot(dat2, aes(x,y)) + geom_point() + scale_y_continuous(trans="biexp2")
ggplot(dat2, aes(x,y)) + geom_point() +
scale_y_continuous(trans=biexp2_trans(lim=100, decade.size=200))
Post by Anja MirenskaPost by Brian DiggsPost by Anja MirenskaHi Brian,
Thanks, the first blog post is very clear and helpful! I've tried to create
a biexponential transformation building upon your example. A conditional
transformation of the values did work, but the scale itself remained the
same, so that e.g. 1000 became 3 (log-transformed) and was plotted at the
break "3". The problem is that I want the original data values to remain
the same (so 1000 should still be 1000 rather than 3) while the scale
format should change from linear to logarithmic halfway through. So I guess
it's more a matter of the breaks and format functions rather than
transformation, right? However, I still don't have a clue how to define
this kind of breaks. Maybe someone could give me another hint?
Can you show us what you have so far, and the mathematical definition of
the transformation? My guess the problem is in the breaks and labels part,
but it is hard to say without (minimal) a reproducible example.
Best wishes
Post by Anja MirenskaAnja
Post by Anja MirenskaHi Brandon,
Post by Anja MirenskaThanks for the pointer! I'll have a closer look at the scales package, it
didn't come to my mind.
Of course I still would be happy if a "scales-expert" or simply a more
experienced R-user than me could offer additional help.
http://blog.ggplot2.org/post/****25938265813/defining-a-new-**<http://blog.ggplot2.org/post/**25938265813/defining-a-new-**>
transformation-for-ggplot2-****scales<http://blog.ggplot2.**
org/post/25938265813/defining-**a-new-transformation-for-**
ggplot2-scales<http://blog.ggplot2.org/post/25938265813/defining-a-new-transformation-for-ggplot2-scales>
http://blog.ggplot2.org/post/****29433173749/defining-a-new-**<http://blog.ggplot2.org/post/**29433173749/defining-a-new-**>
transformation-for-ggplot2-****scales-part<http://blog.**
ggplot2.org/post/29433173749/**defining-a-new-transformation-**
for-ggplot2-scales-part<http://blog.ggplot2.org/post/29433173749/defining-a-new-transformation-for-ggplot2-scales-part>
The first is probably more closely related to what you want to do, and is
hopefully laid out with an easy enough to follow example.
Best wishes
Post by Anja MirenskaAnja
****
gmane.org <brandon.hurr-**Re5JQEeQqe8AvxtiuMwx3w-**
?trans_new
--
Brian S. Diggs, PhD
Senior Research Associate, Department of Surgery
Oregon Health & Science University
--
You received this message because you are subscribed to the ggplot2 mailing list.
Please provide a reproducible example: https://github.com/hadley/**
devtools/wiki/Reproducibility<**https://github.com/hadley/**
devtools/wiki/Reproducibility<https://github.com/hadley/devtools/wiki/Reproducibility>
More options: http://groups.google.com/****group/ggplot2<http://groups.google.com/**group/ggplot2>
<http://groups.**google.com/group/ggplot2<http://groups.google.com/group/ggplot2>
--
Brian S. Diggs, PhD
Senior Research Associate, Department of Surgery
Oregon Health & Science University
--
Brian S. Diggs, PhD
Senior Research Associate, Department of Surgery
Oregon Health & Science University