Home

Awesome

PyNetConvert - Network (Graph, Dataset) Converter

Network (graph, dataset) converter from Pajek, Metis and .nsl formats (including .ncol, Stanford SNAP and Edge/Arcs Graph) to .nsl (.nse/a that are more common than .snap and .ncol) and .rcg (Readable Compact Graph, former .hig; used by DAOC / HiReCS libs) formats. Additionally, an adjacency matrix conversion from the Mathlab (.mat) format to .nsl is provided by the dedicated script (matToNsl).

\author: Artem Lutov artem@exascale.info
(c) RCG (Readable Compact Graph)

Content

Input formats

Mathlab (.mat) adjacency matrix conversion to the nsl (nse/nsa) only format is performed by the matToNsl.py script.

Output formats

Requirements

The converter is written for the Python3 considering backward compatibility with Pyhon2 and PyPy. It is tested on Python3, but should also run on Python2 and PyPy.
There no any external dependencies.

The converter is implemented as a serial parser, i.e. it can process files of any size having very small memory footprint (until the --remdup option is specified to remove the duplicated links).

Usage

Just run the converter with specified input and output formats. Some formats are automatically recognized by the file extension.

Example

$ ./convert.py tmp/karate.graph -i mts
File "tmp/karate.graph" is opened, converting...
	unweight: False
	remdub: False
	frcedg: False
	inpfmt: mts
	resolve: o
	outfmt: rcg
	commented: True
File tmp/karate.rcg is created, filling...
Metis format  weighted: False, selfweights: 0
Parsed weighted: False, newsection: True
tmp/karate.graph -> tmp/karate.rcg conversion is completed

Options

$ ./convert.py -h
usage: convert.py [-h] [-f] [-i {pjk,nsa,nse,mts}] [-d] [-e] [-u] [-c]
                  [-o {nsa,nse,rcg}] [-r {o,r,s}]
                  [network]

Convert format of the specified network (graph).

positional arguments:
  network               the network (graph) to be converted

optional arguments:
  -h, --help            show this help message and exit
  -f, --showfmt         show supporting I/O formats description and exit

Input Format:
  -i {pjk,nsa,nse,mts}, --inpfmt {pjk,nsa,nse,mts}
                        input network (graph) format

Additional Modifiers:
  -d, --remdup          remove duplicated links to have unique ones
  -e, --frcedg          force edges output even in case of ars input: the
                        output edge is created by the first occurrence of the
                        input link (edge/arc) and has weight of this link
                        omitting the subsequent back link (if exists)
  -u, --unweight        force links to be unweighted instead of having the
                        input weights
  -c, --nocoms          clear (avoid) comments in the output file (conversion
                        provenance is not added, headers for .nsX are omitted,
                        etc.). Can be useful when .ncol file should be
                        produces instead of the Stanford SNAP-like format

Output Format:
  -o {nsa,nse,rcg}, --outfmt {nsa,nse,rcg}
                        output format for the network (graph)
  -r {o,r,s}, --resolve {o,r,s}
                        resolution strategy in case the output file is already
                        exists: o - overwrite the output file, r - rename the
                        existing output file and create the new one, s - skip
                        processing if such output file already exists

matToNsl Options

$ ./matToNsl.py -h
usage: matToNsl.py [-h] [-d] MatNet [MatNet ...]

Network converter from mathlab format to .nsl (nse/nsa).

positional arguments:
  MatNet          unsigned input network in the .mat format

optional arguments:
  -h, --help      show this help message and exit
  -d, --directed  form directed output network from possibly directed input
                  network

Datasets

Format Specification

RCG

Rcg (OUTP) - Readable Compact Graph format (former hig), native input format of DAOC. This format is similar to Pajek, but ids can start from any non-negative number and might not form a solid range. RCG is a readable and compact network format suitable for the evolving networks.. File extensions: rcg, hig

MTS

Mts (INP) - Metis Graph (Network) format. File extensions: graph, mtg, met. Specification:

  % Comments start with '%' symbol
  % Header:
  <vertices_num> <endges_num> [<format_bin> [vwnum]]
  % Body, vertices_num lines without the comments:
  [vsize] [vweight_1 .. vweight_vwnum] vid1 [eweight1]  vid2 [eweight2] ...
  ...

Notations:
Header:

Body:

PJK

Pjk (INP) - Pajek Network format. Node ids started with 1, both [weighted] arcs and edges might be present.. File extensions: pjk, pajek, net, pjn

NSL

Nsl - nodes graph specified by the newline separated links (edges/arcs), which are optionally weighted.

NSE

Nse (INP, OUTP) - nodes are specified in lines consisting of the single Space/tab separated, possibly weighted Edge (undirected link). It is similar to the ncol format and [Weighted] Edge Graph, but self-edges are allowed to represent node weights and the line comment is allowed using # symbol.. File extensions: nse, snap, ncol. Specification:

  # Comments start with '#', the header is optional:
  # Nodes: <nodes_num>	Edges: <edges_num>
  <from_id> <to_id> [<weight>]
  ...

Notations:
The header is optional. The edges (undirected links) are unique, i.e. either AB or BA is specified.
Id is a positive integer number (>= 1), id range is solid.
Weight is a non-negative floating point number.

NSA

Nsa (INP, OUTP) - nodes are specified in lines consisting of the single Space/tab separated, possibly weighted Arc (directed link), a self-arc can be used to represent the node weight and the line comment is allowed using # symbol.. File extensions: nsa. Specification:

  # Comments start with '#', the header is optional:
  # Nodes: <nodes_num>	Arcs: <arcs_num>
  <from_id> <to_id> [<weight>]
  ...

Notations:
The header is optional. The arcs (directed links) are unique and always in pairs, i.e. BA should be specified until it's weight is zero if AB is specified.
Id is a positive integer number (>= 1), id range is solid.
Weight is a non-negative floating point number.

Related Projects

Note: Please, star this project if you use it.