CSV File 1 "employees.csv"
# <employee id>;<name>;<location> 1;Anton;37;Geneva 2;Marie;28;Paris 3;Tom;25;London
CSV File 2 "task.csv"
# <task id>;<employee id>;<task description> 1000;2;Setup new Project 1001;3;Call customer X 1002;3;Get package from post officeAnd now some action:
A naive try...
The following command
join employee.csv tasks.csv
... doesn't produce any output. This is because it expects the shared key to reside at the first column in both file, which is not the case. Also the default separator for 'join' is a whitespace.
Full Join
join -t ";" -1 1 -2 2 employee.csv tasks.csv
We need to run join with '-t ";"' to get tell it that we have CSV format. Then to avoid the pitfall of not having the common key in the first column we need to tell join where the join key is in each file. The switch "-1" gets the key index for the first file and "-2" for the second file.
2;Marie;28;Paris;1000;Setup new Project 3;Tom;25;London;1001;Call customer X 3;Tom;25;London;1002;Get package from post office
Print only name and task
join -o1.2,2.3 -t ";" -1 1 -2 2 employee.csv tasks.csv
We use "-o" to limit the fields to be printed. "-o" takes a comma separated list of "<file nr>.<field nr>" definitions. So we only want the second file of the first file (1.2) and the third field of the second file (2.3)...
Marie;Setup new Project Tom;Call customer X Tom;Get package from post office
Summary
While the syntax of join is not that straight forward, it still allows doing things quite quickly that one is often tempted to implement in a script. It is quite easy to convert batch input data to CSV format. Using join it can be easily grouped and reduced according to the your task.
If this got you interested you can find more and non-CSV examples on this site.