Examples

back to index page

Examples will be extended in future.

Introduction

Most slk and slk_helpers commands are available as functions pyslk.pyslk.slk_COMMAND(...). All command lines arguments of slk version 3.3.10 are available as the functions’ arguments. These functions are simple wrappers that print out the text, which the slk/slk_helpers commands normally print to the command line. A bit more advanced wrappers are available for a few commands via pyslk.parsers.slk_COMMAND_....

Basic examples

List files and directories in specific path

> from pyslk import pyslk as pslk
> from pyslk import parsers as psr
> print(pslk.slk_list('/arch'))
drwxrwxr-x- a270003     aa0049             0   14 Oct 2021 11:02 aa0049
drwxrwxr-x- a270003     aa0238             0   14 Oct 2021 11:02 aa0238
drwxrwxr-x- a270003     1079               0   14 Oct 2021 11:02 ab0036
drwxrwxr-x- a270003     1007               0   14 Oct 2021 11:02 ab0051
drwxrwxr-x- a270003     ab0246             0   14 Oct 2021 11:02 ab0246
drwxrwxr-x- a270003     1327               0   14 Oct 2021 11:02 ab0642
drwxrwxr-x- a270003     ab0916             0   14 Oct 2021 11:02 ab0916
drwxrwxr-x- a270003     ab0995             0   14 Oct 2021 11:02 ab0995
drwxrwxr-x- a270003     ab1011             0   14 Oct 2021 11:02 ab1011
drwxrwxr-x- a270003     ab1012             0   14 Oct 2021 11:02 ab1012
...

> psr.slk_list_formatted('/arch')
     permissions    owner   group  filesize       date   time      filename
0    drwxrwxr-x-  a270003  aa0049       0.0 2021-10-14  11:02  /arch/aa0049
1    drwxrwxr-x-  a270003  aa0238       0.0 2021-10-14  11:02  /arch/aa0238
2    drwxrwxr-x-  a270003    1079       0.0 2021-10-14  11:02  /arch/ab0036
3    drwxrwxr-x-  a270003    1007       0.0 2021-10-14  11:02  /arch/ab0051
4    drwxrwxr-x-  a270003  ab0246       0.0 2021-10-14  11:02  /arch/ab0246
..           ...      ...     ...       ...        ...    ...           ...
302  drwxrwxr-x-     4169    1082       0.0 2021-10-14  11:04  /arch/uo0123
303  drwxrwxr-x-    21118  uo0780       0.0 2021-10-14  11:04  /arch/uo0780
304  drwxrwxr-x-    20613  uo1075       0.0 2021-10-14  11:04  /arch/uo1075
305  drwxrwxr-x-   100662  uo1227       0.0 2021-10-14  11:04  /arch/uo1227
306  drwxrwxr-x-    21534    1374       0.0 2021-10-14  11:04  /arch/uu0808

[307 rows x 7 columns]

Calculate size of one folder recursivly

Solution 1

> from pyslk import pyslk as pslk
> from pyslk import parsers as psr

# folder size in byte
> pslk.slk_arch_size('/arch/bm0146/k204221/iow')
534305200000.0

# or with another unit
> psr.slk_arch_size_format('/arch/bm0146/k204221/iow', return_format="G")
'534.31G'

Solution 2

> from pyslk import pyslk as pslk
> from pyslk import parsers as psr

# do a recusive search in the folder of interest:
> pslk.slk_search_limited('{"path": {"$gte": "/arch/bm0146/k204221/iow"}}')
'Search continuing. .\nSearch ID: 112031'

# use search ID in psr.slk_list_search_formatted
> df = psr.slk_list_search_formatted(112031)
> df
    permissions      filesize                                    filename
0   -rw-r--r--t   20349714432   /arch/bm0146/k204221/iow/iow_data_002.tar
1   -rw-r--r--t   20942159872   /arch/bm0146/k204221/iow/iow_data_001.tar
2   -rw-r--r--t    8364490752  /arch/bm0146/k204221/iow/iow_data5_006.tar
3   -rw-r--r--t   20478689280  /arch/bm0146/k204221/iow/iow_data5_005.tar
4   -rw-r--r--t    8364490752   /arch/bm0146/k204221/iow/iow_data_006.tar
5   -rw-r--r--t   20478689280   /arch/bm0146/k204221/iow/iow_data_005.tar
6   -rw-r--r--t   20715667456   /arch/bm0146/k204221/iow/iow_data_004.tar
7   -rw-r--r--t   20883439616   /arch/bm0146/k204221/iow/iow_data_003.tar
8   -rw-r--r--t   11284774912  /arch/bm0146/k204221/iow/iow_data4_002.tar
9   -rw-r--r--t       4194304  /arch/bm0146/k204221/iow/iow_data4_001.tar
10  -rw-r--r--t   26466058240  /arch/bm0146/k204221/iow/iow_data3_002.tar
11  -rw-r--r--t  200701640704  /arch/bm0146/k204221/iow/iow_data3_001.tar
12  -rw-r--r--t   20715667456  /arch/bm0146/k204221/iow/iow_data5_004.tar
13  -rw-r--r--t   20883439616  /arch/bm0146/k204221/iow/iow_data5_003.tar
14  -rw-r--r--t   20349714432  /arch/bm0146/k204221/iow/iow_data5_002.tar
15  drwxr-xr-x-             0                /arch/bm0146/k204221/iow/doc
16  -rw-r--r--t   20942159872  /arch/bm0146/k204221/iow/iow_data5_001.tar
17  -rw-r--r--t   20349714432  /arch/bm0146/k204221/iow/iow_data2_002.tar
18  -rw-r--r--t   20942159872  /arch/bm0146/k204221/iow/iow_data2_001.tar
19  -rw-r--r---       1268945          /arch/bm0146/k204221/iow/INDEX.txt
10  -rw-r--r--t    8364490752  /arch/bm0146/k204221/iow/iow_data2_006.tar
21  -rw-r--r--t   20478689280  /arch/bm0146/k204221/iow/iow_data2_005.tar
22  -rw-r--r--t          3256      /arch/bm0146/k204221/iow/doc/Readme.md
23  -rw-r--r--t   20715667456  /arch/bm0146/k204221/iow/iow_data2_004.tar
24  -rw-r--r--t   20883439616  /arch/bm0146/k204221/iow/iow_data2_003.tar

> df.filesize.sum()
573660424585

> df.filesize.sum()/1024/1024/1024
534.2629035795107

Optimize retrieval of several files in different locations

Use Case

Files might be read from several tapes in parallel for one slk retrieve call. Also, if several files in one call of slk retrieve are located on one tape, these files are read from the tape after each other without removing the tape from the tape drive. If we retrieve 100 single files with 100 single calls of slk retrieve and several of these files are on one tape, then this tape might be removed from the tape driver several times. This slows down the retrieval of files. Also the copying process to the Lustre filesystem on Levante might be faster if several files are retrieved at once. Therefore, we suggest to perform a search, which finds all files that should be retrieved, and use the search ID of this search as input for slk retrieve.

Search queries are written in JSON. This is not intuitive for everyone. Therefore, we provide a command (slk_helpers gen_file_query) that generates such a query, which can be used in the search (slk_helpers search_limited or slk search). Here we show an example how to handle this workflow in python.

Example 1

# import modules
> from pyslk import pyslk as pslk
> from pyslk import parsers as psr

# Here we want to retrieve:
#  * the file /arch/bm0146/k204221/INDEX.txt
#  * all "*.txt" files in /arch/bm0146/k204221/iow
#      Note: we need to use regular expressions; shell wildcard "*" equals
#            the reg ex ".*" ("." = any sign; "*" = times 0 ... infinit)
#  * all files from /arch/bm0146/k204221/iow2
> pslk.slk_gen_file_query("/arch/bm0146/k204221/INDEX.txt /arch/bm0146/k204221/iow/.*.txt /arch/bm0146/k204221/iow2")
'{"$or":[{"$and":[{"path":{"$gte":"/arch/bm0146/k204221","$max_depth":1}},{"resources.name":{"$regex":"INDEX.txt|iow2"}}]},{"$and":[{"path":{"$gte":"/arch/bm0146/k204221/iow","$max_depth":1}},{"resources.name":{"$regex":".*.txt"}}]}]}'

# perform a seasrch
> pslk.slk_search_limited(pslk.slk_gen_file_query("/arch/bm0146/k204221/INDEX.txt /arch/bm0146/k204221/iow/.*.txt /arch/bm0146/k204221/iow2"))
'Search continuing. ....\nSearch ID: 111990'

# have a look which files we got
> print(pslk.slk_list_search(111990, only_files=True))
-rw-r--r---         1268945 /arch/bm0146/k204221/iow/INDEX2.txt
-rw-r--r---            7156 /arch/bm0146/k204221/INDEX.txt
Resources: 2

# or the same output in a Pandas.DataFrame
psr.slk_list_search_formatted(111990, only_files=True)
   permissions filesize                            filename
0  -rw-r--r--t  1268945  /arch/bm0146/k204221/iow/INDEX2.txt
1  -rw-r--r--t     7156      /arch/bm0146/k204221/INDEX.txt

# do the retrieval
pslk.slk_retrieve(111990, '/scratch/k/k204221/tmp')

Example 2

# import modules
> from pyslk import pyslk as pslk
> from pyslk import parsers as psr

# Here we want to retrieve:
#  * all INDEX.txt files in /arch/bm0146/k204221 and its sub-directories
#      Note 1: we use the argument "recursive"
#      Note 2: there is no '"$max_depth":1' as in the previous example,
               which indicates non-recursive search
> pslk.slk_gen_file_query("/arch/bm0146/k204221/INDEX.txt", recursive=True)
'{"$and":[{"path":{"$gte":"/arch/bm0146/k204221"}},{"resources.name":{"$regex":"INDEX.txt"}}]}'

# do search
> pslk.slk_search_limited(pslk.slk_gen_file_query("/arch/bm0146/k204221/INDEX.txt", recursive=True))
'Search continuing. ....\nSearch ID: 111994'

# list results, incl. their path
> print(pslk.slk_list_search(111994, only_files=True))
-rw-r--r--t            5082 /arch/bm0146/k204221/tmp/INDEX.txt
-rw-r--r--t            7156 /arch/bm0146/k204221/INDEX.txt
-rw-r--r--t           10442 /arch/bm0146/k204221/unpackData/INDEX.txt
-rw-r--r--t         1208734 /arch/bm0146/k204221/iow2_test/INDEX.txt
-rw-r--r--t          947084 /arch/bm0146/k204221/exp/jsbach/INDEX.txt
-rw-r--r--t         1347992 /arch/bm0146/k204221/exp/hamocc/INDEX.txt
-rw-r--r--t          947084 /arch/bm0146/k204221/exp/echam/INDEX.txt
Resources: 7

# "slk retrieve SEARCH_ID TARGET" retrieves all files into the same target
# directory. Therefore, it is a bad idea to run a slk retrieve on this
# search. Instead, we split the copy task "Tape => HSM-Cache" and
# "HSM-Cache => Lustre" into two parts. First we do one "slk recall" which
# copies all files from tape to the HSM-Cache. Then we do several single
# "slk retrieve"s.
> pslk.slk_recall(111994)
> pslk.retrieve('/arch/bm0146/k204221/tmp/INDEX.txt', '/scratch/k/k204221/tmp/arch/bm0146/k204221/tmp')
> pslk.retrieve('/arch/bm0146/k204221/INDEX.txt', '/scratch/k/k204221/tmp/arch/bm0146/k204221')
> pslk.retrieve('/arch/bm0146/k204221/unpackData/INDEX.txt', '/scratch/k/k204221/tmp/arch/bm0146/k204221/unpackData')
> pslk.retrieve('/arch/bm0146/k204221/iow2_test/INDEX.txt', '/scratch/k/k204221/tmp/arch/bm0146/k204221/iow2_test')
> pslk.retrieve('/arch/bm0146/k204221/exp/jsbach/INDEX.txt', '/scratch/k/k204221/tmp/arch/bm0146/k204221/exp/jsbach')
> pslk.retrieve('/arch/bm0146/k204221/exp/hamocc/INDEX.txt', '/scratch/k/k204221/tmp/arch/bm0146/k204221/exp/hamocc')
> pslk.retrieve('/arch/bm0146/k204221/exp/echam/INDEX.txt', '/scratch/k/k204221/tmp/arch/bm0146/k204221/exp/echam')