ggplot2 - How can I color a line graph by grouping the variables in R? -


i have produced line graph looks this

generated using ggplot2

i have data set of 50 countries , gdp last 10 years.
sample data:

country variable value   china   y2007    3.55218e+12 usa     y2007    1.45000e+13 japan   y2007    4.51526e+12 uk      y2007    3.06301e+12 russia  y2007    1.29971e+12  canada  y2007    1.46498e+12 germany y2007    3.43995e+12  india   y2007    1.20107e+12 france  y2007    2.66311e+12 skorea  y2007    1.12268e+12 

i generated line graph using code

gdp_lineplot = ggplot(data=gdp_linechart, aes(x=variable,y=value)) +    geom_line() +    scale_y_continuous(name = "gdp(usd in trillions)",                       breaks = c(0.0e+00,5.0e+12,1.0e+13,1.5e+13),                       labels = c(0,5,10,15)) +    scale_x_discrete(name = "years", labels = c(2007,"",2009,"",2011,"",2013,"",2015)) 

the idea make graph this. how can plot colors

i tried adding

group=country, color = country 

it outputs coloring countries.

how can color countries top 4 , rest?

ps: still naive r.

by plotting subsets, other groups aren't included in colour legend on right. alternative approach below manipulates factor levels , uses customized color scale overcome this.

preparing data

it assumed gdp_long contains data in long format. in line data shown op (gdp_lineplot, see data section below differences). manipulate factor levels, forcatspackage used (and data.table).

library(data.table) library(forcats) # coerce data.table, reorder factors values in last = actual year setdt(gdp_long)[, country := fct_reorder(country, -value, last)] # create new factor collapses countries "other" except top 4 countries gdp_long[, top_country := fct_other(country, keep = head(levels(country), 4))] 

create plot

library(ggplot2) ggplot(gdp_long, aes(year, value/1e12, group = country, colour = top_country)) +    geom_point() + geom_line(size = 1) + theme_bw() + ylab("gdp(usd in trillions)") +   scale_colour_manual(name = "country",                        values = c("green3", "orange", "blue", "red", "grey")) 

enter image description here

the chart quite similar expected result. lines of top 4 countries displayed in different colours while other countries displayed in grey appear in colour legend right.

note groupaesthetic still needed single line plotted each country while colour controlled levels of top_country.

data

the data set large reproduced here (even dput()). structure

str(gdp_long) 'data.frame':   1763 obs. of  3 variables:  $ country: chr  "afghanistan" "albania" "algeria" "andorra" ...  $ year   : int  2007 2007 2007 2007 2007 2007 2007 2007 2007 2007 ...  $ value  : num  9.84e+09 1.07e+10 1.35e+11 4.01e+09 6.04e+10 ... 

is similar op's data exception variable column converted integer column year. give nicely formatted x-axis without additional effort.


Comments

Popular posts from this blog

How to understand 2 main() functions after using uftrace to profile the C++ program? -

c# - Update a combobox from a presenter (MVP) -

How to put a lock and transaction on table using spring 4 or above using jdbcTemplate and annotations like @Transactional? -