Home

Awesome

DISCO API

Java API for word embeddings

This is the source code repository for the linguatools DISCO API. For more information on DISCO visit http://www.linguatools.de/disco/disco_en.html.

Quickstart

Install DISCO API

Download the source code by cloning this repository:

git clone git@github.com:linguatools/disco.git

Go into the repository folder and build the executable jar with dependencies:

cd disco/
./gradlew shadowJar

For instructions on command line usage call DISCO API without any parameters:

java -jar build/libs/disco-3.0.0-all.jar

or consult the web page.

Import a vector file from fastText

Download a fastText vector file in text format and unpack it:

wget https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.de.300.vec.gz
gunzip cc.de.300.vec.gz

Download DISCO Builder:

wget http://www.linguatools.de/disco/DISCOBuilder-1.1.1.tar.bz2
tar jxf DISCOBuilder-1.1.1.tar.bz2

Convert the vector file into a DISCO DenseMatrix:

java -Xmx8g -cp DISCOBuilder-1.1.1/DISCOBuilder-1.1.1-all.jar de.linguatools.disco.builder.Import -in cc.de.300.vec -out cc.de.300.col.denseMatrix -wsType COL 

Query the new DISCO word space from the command line with the DISCO API:

java -Xmx4g -jar ~/repos-linguatools/disco/build/libs/disco-3.0.0-all.jar cc.de.300.col.denseMatrix/cc.de.300-COL.denseMatrix -s Haus Wohnung COSINE
0.64413786

Java API

To include DISCO in your Maven or Gradle project see below or visit the DISCO page on JitPack.

Gradle

Add this to your build.gradle file:

repositories {
    maven { url 'https://jitpack.io' }
}
dependencies {
    compile 'com.github.linguatools:disco:v3.0.0'
}

Maven

Add this to your pom.xml file:

<repositories>
	<repository>
	    <id>jitpack.io</id>
	    <url>https://jitpack.io</url>
	</repository>
</repositories>

<dependency>
    <groupId>com.github.linguatools</groupId>
    <artifactId>disco</artifactId>
    <version>v3.0.0</version>
</dependency>

Example Java code

DISCO disco = DISCO.load("cc.de.300-COL.denseMatrix");
float sim = disco.semanticSimilarity("Haus", "Häuschen", 
      	    	DISCO.getVectorSimilarity(SimilarityMeasure.COSINE));
System.out.println("similarity between 'Haus' and 'Häuschen': "+sim);
// get word vector for "Haus" as map
Map<String,Float> wordVectorHaus = disco.getWordvector("Haus");
// get word embedding for "Haus" as float array
float[] wordEmbeddingHaus = ((DenseMatrix) disco).getWordEmbedding("Haus");
// solve analogy x is to "Frau" as "König" is to "Mann"
List<ReturnDataCol> result = Compositionality.solveAnalogy("Frau", "König", "Mann", disco); 

Documentation

How to get word spaces for DISCO?

Features