15 Linux Split and Join Command Examples to Manage Large Files

by Himanshu Arora on October 16, 2012

Linux split and join commands are very helpful when you are manipulating large files. This article explains how to use Linux split and join command with descriptive examples.

Join and split command syntax:

join [OPTION]… FILE1 FILE2
split [OPTION]… [INPUT [PREFIX]]

Linux Split Command Examples

1. Basic Split Example

Here is a basic example of split command.

$ split split.zip 

$ ls
split.zip  xab  xad  xaf  xah  xaj  xal  xan  xap  xar  xat  xav  xax  xaz  xbb  xbd  xbf  xbh  xbj  xbl  xbn
xaa        xac  xae  xag  xai  xak  xam  xao  xaq  xas  xau  xaw  xay  xba  xbc  xbe  xbg  xbi  xbk  xbm  xbo

So we see that the file split.zip was split into smaller files with x** as file names. Where ** is the two character suffix that is added by default. Also, by default each x** file would contain 1000 lines.

$ wc -l *
   40947 split.zip
    1000 xaa
    1000 xab
    1000 xac
    1000 xad
    1000 xae
    1000 xaf
    1000 xag
    1000 xah
    1000 xai
...
...
...

So the output above confirms that by default each x** file contains 1000 lines.

2.Change the Suffix Length using -a option

As discussed in example 1 above, the default suffix length is 2. But this can be changed by using -a option.

As you see in the following example, it is using suffix of length 5 on the split files.

$ split -a5 split.zip
$ ls
split.zip  xaaaac  xaaaaf  xaaaai  xaaaal  xaaaao  xaaaar  xaaaau  xaaaax  xaaaba  xaaabd  xaaabg  xaaabj  xaaabm
xaaaaa     xaaaad  xaaaag  xaaaaj  xaaaam  xaaaap  xaaaas  xaaaav  xaaaay  xaaabb  xaaabe  xaaabh  xaaabk  xaaabn
xaaaab     xaaaae  xaaaah  xaaaak  xaaaan  xaaaaq  xaaaat  xaaaaw  xaaaaz  xaaabc  xaaabf  xaaabi  xaaabl  xaaabo

Note: Earlier we also discussed about other file manipulation utilities – tac, rev, paste.

3.Customize Split File Size using -b option

Size of each output split file can be controlled using -b option.

In this example, the split files were created with a size of 200000 bytes.

$ split -b200000 split.zip 

$ ls -lart
total 21084
drwxrwxr-x 3 himanshu himanshu     4096 Sep 26 21:20 ..
-rw-rw-r-- 1 himanshu himanshu 10767315 Sep 26 21:21 split.zip
-rw-rw-r-- 1 himanshu himanshu   200000 Sep 26 21:35 xad
-rw-rw-r-- 1 himanshu himanshu   200000 Sep 26 21:35 xac
-rw-rw-r-- 1 himanshu himanshu   200000 Sep 26 21:35 xab
-rw-rw-r-- 1 himanshu himanshu   200000 Sep 26 21:35 xaa
-rw-rw-r-- 1 himanshu himanshu   200000 Sep 26 21:35 xah
-rw-rw-r-- 1 himanshu himanshu   200000 Sep 26 21:35 xag
-rw-rw-r-- 1 himanshu himanshu   200000 Sep 26 21:35 xaf
-rw-rw-r-- 1 himanshu himanshu   200000 Sep 26 21:35 xae
-rw-rw-r-- 1 himanshu himanshu   200000 Sep 26 21:35 xar
...
...
...

4. Create Split Files with Numeric Suffix using -d option

As seen in examples above, the output has the format of x** where ** are alphabets. You can change this to number using -d option.

Here is an example. This has numeric suffix on the split files.

$ split -d split.zip
$ ls
split.zip  x01  x03  x05  x07  x09  x11  x13  x15  x17  x19  x21  x23  x25  x27  x29  x31  x33  x35  x37  x39
x00        x02  x04  x06  x08  x10  x12  x14  x16  x18  x20  x22  x24  x26  x28  x30  x32  x34  x36  x38  x40

5. Customize the Number of Split Chunks using -C option

To get control over the number of chunks, use the -C option.

This example will create 50 chunks of split files.

$ split -n50 split.zip
$ ls
split.zip  xac  xaf  xai  xal  xao  xar  xau  xax  xba  xbd  xbg  xbj  xbm  xbp  xbs  xbv
xaa        xad  xag  xaj  xam  xap  xas  xav  xay  xbb  xbe  xbh  xbk  xbn  xbq  xbt  xbw
xab        xae  xah  xak  xan  xaq  xat  xaw  xaz  xbc  xbf  xbi  xbl  xbo  xbr  xbu  xbx

6. Avoid Zero Sized Chunks using -e option

While splitting a relatively small file in large number of chunks, its good to avoid zero sized chunks as they do not add any value. This can be done using -e option.

Here is an example:

$ split -n50 testfile

$ ls -lart x*
-rw-rw-r-- 1 himanshu himanshu 0 Sep 26 21:55 xag
-rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:55 xaf
-rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:55 xae
-rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:55 xad
-rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:55 xac
-rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:55 xab
-rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:55 xaa
-rw-rw-r-- 1 himanshu himanshu 0 Sep 26 21:55 xbx
-rw-rw-r-- 1 himanshu himanshu 0 Sep 26 21:55 xbw
-rw-rw-r-- 1 himanshu himanshu 0 Sep 26 21:55 xbv
...
...
...

So we see that lots of zero size chunks were produced in the above output. Now, lets use -e option and see the results:

$ split -n50 -e testfile
$ ls
split.zip  testfile  xaa  xab  xac  xad  xae  xaf

$ ls -lart x*
-rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:57 xaf
-rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:57 xae
-rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:57 xad
-rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:57 xac
-rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:57 xab
-rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:57 xaa

So we see that no zero sized chunk was produced in the above output.

7. Customize Number of Lines using -l option

Number of lines per output split file can be customized using the -l option.

As seen in the example below, split files are created with 20000 lines.

$ split -l20000 split.zip

$ ls
split.zip  testfile  xaa  xab  xac

$ wc -l x*
   20000 xaa
   20000 xab
     947 xac
   40947 total

Get Detailed Information using –verbose option

To get a diagnostic message each time a new split file is opened, use –verbose option as shown below.

$ split -l20000 --verbose split.zip
creating file `xaa'
creating file `xab'
creating file `xac'

Linux Join Command Examples

8. Basic Join Example

Join command works on first field of the two files (supplied as input) by matching the first fields.

Here is an example :

$ cat testfile1
1 India
2 US
3 Ireland
4 UK
5 Canada

$ cat testfile2
1 NewDelhi
2 Washington
3 Dublin
4 London
5 Toronto

$ join testfile1 testfile2
1 India NewDelhi
2 US Washington
3 Ireland Dublin
4 UK London
5 Canada Toronto

So we see that a file containing countries was joined with another file containing capitals on the basis of first field.

9. Join works on Sorted List

If any of the two files supplied to join command is not sorted then it shows up a warning in output and that particular entry is not joined.

In this example, since the input file is not sorted, it will display a warning/error message.

$ cat testfile1
1 India
2 US
3 Ireland
5 Canada
4 UK

$ cat testfile2
1 NewDelhi
2 Washington
3 Dublin
4 London
5 Toronto

$ join testfile1 testfile2
1 India NewDelhi
2 US Washington
3 Ireland Dublin
join: testfile1:5: is not sorted: 4 UK
5 Canada Toronto

10. Ignore Case using -i option

When comparing fields, the difference in case can be ignored using -i option as shown below.

$ cat testfile1
a India
b US
c Ireland
d UK
e Canada

$ cat testfile2
a NewDelhi
B Washington
c Dublin
d London
e Toronto

$ join testfile1 testfile2
a India NewDelhi
c Ireland Dublin
d UK London
e Canada Toronto

$ join -i testfile1 testfile2
a India NewDelhi
b US Washington
c Ireland Dublin
d UK London
e Canada Toronto

11. Verify that Input is Sorted using –check-order option

Here is an example. Since testfile1 was unsorted towards the end so an error was produced in the output.

$ cat testfile1
a India
b US
c Ireland
d UK
f Australia
e Canada

$ cat testfile2
a NewDelhi
b Washington
c Dublin
d London
e Toronto

$ join --check-order testfile1 testfile2
a India NewDelhi
b US Washington
c Ireland Dublin
d UK London
join: testfile1:6: is not sorted: e Canada

12. Do not Check the Sortness using –nocheck-order option

This is the opposite of the previous example. No check for sortness is done in this example, and it will not display any error message.

$ join --nocheck-order testfile1 testfile2
a India NewDelhi
b US Washington
c Ireland Dublin
d UK London

13. Print Unpairable Lines using -a option

If both the input files cannot be mapped one to one then through -a[FILENUM] option we can have those lines that cannot be paired while comparing. FILENUM is the file number (1 or 2).

In the following example, we see that using -a1 produced the last line in testfile1 (marked as bold below) which had no pair in testfile2.

$ cat testfile1
a India
b US
c Ireland
d UK
e Canada
f Australia

$ cat testfile2
a NewDelhi
b Washington
c Dublin
d London
e Toronto

$ join testfile1 testfile2
a India NewDelhi
b US Washington
c Ireland Dublin
d UK London
e Canada Toronto

$ join -a1 testfile1 testfile2
a India NewDelhi
b US Washington
c Ireland Dublin
d UK London
e Canada Toronto
f Australia

14. Print Only Unpaired Lines using -v option

In the above example both paired and unpaired lines were produced in the output. But, if only unpaired output is desired then use -v option as shown below.

$ join -v1 testfile1 testfile2
f Australia

15. Join Based on Different Columns from Both Files using -1 and -2 option

By default the first columns in both the files is used for comparing before joining. You can change this behavior using -1 and -2 option.

In the following example, the first column of testfile1 was compared with the second column of testfile2 to produce the join command output.

$ cat testfile1
a India
b US
c Ireland
d UK
e Canada

$ cat testfile2
NewDelhi a
Washington b
Dublin c
London d
Toronto e

$ join -1 1 -2 2 testfile1 testfile2
a India NewDelhi
b US Washington
c Ireland Dublin
d UK London
e Canada Toronto

Linux Sysadmin Course Linux provides several powerful administrative tools and utilities which will help you to manage your systems effectively. If you don’t know what these tools are and how to use them, you could be spending lot of time trying to perform even the basic administrative tasks. The focus of this course is to help you understand system administration tools, which will help you to become an effective Linux system administrator.
Get the Linux Sysadmin Course Now!

If you enjoyed this article, you might also like..

  1. 50 Linux Sysadmin Tutorials
  2. 50 Most Frequently Used Linux Commands (With Examples)
  3. Top 25 Best Linux Performance Monitoring and Debugging Tools
  4. Mommy, I found it! – 15 Practical Linux Find Command Examples
  5. Linux 101 Hacks 2nd Edition eBook Linux 101 Hacks Book

Bash 101 Hacks Book Sed and Awk 101 Hacks Book Nagios Core 3 Book Vim 101 Hacks Book

{ 9 comments… read them below or add one }

1 Jalal Hajigholamali October 16, 2012 at 9:26 am

Hi,
very nice and useful article..

2 François October 16, 2012 at 11:53 am

Hi,
Just a little remainder: Ottawa is the capital of Canada. Toronto is both Canada’s largest city and the capital on Ontario, Canada’s most populated province.

3 Mike October 16, 2012 at 7:36 pm

A (tech) reminder: you put your split files back together again using cat, not join ;-)

cat x* > split.zip

Putting these two utilities together in the same article doesn’t imply that they actually are used together to (for instance) split a large file, transfer the chunks, then join back together.

Also, I typically use the PREFIX to make it clear to people at the other end:

split -db200000 some-file.zip some-file-part-

4 Sundar October 18, 2012 at 2:02 am

Hope join & paste command more or less same :)

TGS, Nice work over the years – please keep going on

5 sugatangitlog October 18, 2012 at 2:11 am

Very nice indeed. Thanks!

6 Parth Shah October 19, 2012 at 6:52 am

Suppose that I split a large pdf files using split. How can I re-join all the parts?

7 Mike November 8, 2012 at 12:04 am

Parth: with cat, like I hinted at a few days before.

For example, below I’ve split your large pdf file (I called it “large.pdf”, but you can use whatever name) into 20kB sized chunks called “part-00″ “part-01″ and so on. Then I join them back together using cat (and rely upon my shell expand the “part-*” into “part-00 part-01 …” in numerical/alphabetical order, which is the right order), redirecting cat’s output (with the “>” character) to a new file “large-copy.pdf”.

split -db20000 large.pdf part-
cat part-* > large-copy.pdf

After doing this, then I usually use the diff utility to assure myself that the copy is the same as the original, because that’s simplest. If transferring the parts over some narrow-band medium, I’d make a hash with md5sum and send that with the parts instead, but that’s more tricky.

diff large.pdf large-copy.pdf

(there should be no output from diff if the copy is the same after re-joining the parts)

8 Bernard Wiid February 27, 2014 at 1:49 pm

Good day, please help got file i want to change to a text file and split it in smaller files could someone help with linux command to do so

9 Anon March 18, 2014 at 10:13 pm

Dear jackass author, why did you not use the same zip file example in the join? Let me guess, you have no clue as to what you were writing about! Somebody told you, you need to write an article on ‘split’ & ‘join’ and you wrote up some jackshit!

Leave a Comment

Previous post:

Next post: