Home

Awesome

cca-core

cca-core (Civic CrowdAnalytics Core) offers machine learning and natural language processing utilities for processing civic text input.

Requirements

Classes and Usage

SentimentAnalyzer

Analyzes the sentiment polarity of a collection of documents. It determines wether the feeling about each doc is positive, negative or neutral.

Parameters:

Methods

Attributes

Examples

# import the Sentiment Analyzer class
from cca_core import SentimentAnalyzer

# create an instance of the analyzer
sa = SentimentAnalyzer(neu_inf_lim=-0.05,
                       neu_sup_lim=0.05,
                       language='spanish')
# sample docs
docs = [
        'Reciclar me parece buena idea. Reutilizar desechos es muy provechoso.',
        'Mala gestión. Lamentable y pobre manjeo de los encargados.'
        ]

# analyze docs with the 'analyze_docs' method
sa.analyze_docs(docs)

# results are accesible through the 'tagged_docs' attribute
print(sa.tagged_docs[0])
# ('Reciclar me parece buena idea. Reutilizar desechos es muy provechoso.', 'pos', 1.0)

print(sa.tagged_docs[1])
# ('Mala gestión. Lamentable y pobre manjeo de los encargados.', 'neg', -0.15)

ConceptExtractor

Extract the most common concepts from a collection of documents.

Parameters:

Methods

Attributes

Examples

# import the Concept Extractor class
from cca_core import ConceptExtractor

# create an instance of the extractor
ce = ConceptExtractor(
                    num_concepts=4, 
                    language='english', 
                    pos_vec=['NN', 'NNP', 'NNS', 'NNPS']
                )
                
# sample docs
docs = [
    'Make new bikes lanes in the park',
    'Clean the campus and add more trash cans',
    'Use bikes instead of cars during weekends',
    'Clean up the streets',
    'Create a bike renting service for employees',
    'Too much garbage. Cleaning needed',
    'Use bikes or another alternative trasnportation',
    'Keep streets clean',
        ]

# extract most common concepts with the 'extract_concepts method'
ce.extract_concepts(docs)

# the 'common_concepts' attribute has the extracted concepts and its number of appearances
print(ce.common_concepts)
# [('bikes', 2), ('use', 2), ('streets', 2), ('lanes', 1)]
    

DocumentClustering

Cluster documents by similarity using the k-means algorithm.

Parameters:

Methods

Attributes

Examples

# import the Document Clustering class
from cca_core import DocumentClustering

# create an instance of the class
clu = DocumentClustering(num_clusters=2,
                        language='english',
                        max_features=5)

# sample docs
docs = [
    'Make new bikes lanes in the park',
    'Clean the campus and add more trash cans',
    'Use bikes instead of cars during weekends',
    'Clean up the streets',
    'Create a bike renting service for employees',
    'Too much garbage. Cleaning needed',
    'Use bikes or another alternative trasnportation',
    'Keep streets clean',
        ]

# start the clustering process with the 'clustering' method 
clu.clustering(docs)

# the 'clusters' attribute has the cluster label assigned to each doc
print(clu.clusters)
# [0, 1, 0, 1, 0, 1, 0, 1]

# the 'num_docs_per_cluster' is a dict that shows how many docs were assigned to each cluster
print(clu.num_docs_per_cluster)
# {'0': 4, '1': 4}

DocumentClassifier

Train a classifier with labeled documents and classify new documents into one of the labeled clases.

Parameters:

Methods

Attributes

Examples

# import the Document Classifier class
from cca_core import DocumentClassifier

# create an instance of the classifier
cla = DocumentClassifier(
                    language="english",
                    t_classifier="SVM",
                    vocab_size=5
                )
                
# sample docs. The last two are unclissified docs
docs = [
    ('Make new bikes lanes in the park', 'trasnportation'),
    ('Clean the campus and add more trash cans','cleaning'),
    ('Use bikes instead of cars during weekends', 'transportation'),
    ('Clean up the streets','cleaning'),
    ('Create a bike renting service for employees', 'transportation'),
    ('Too much garbage. Cleaning needed','cleaning'),
    ('Use bikes or another alternative trasnportation',''),
    ('Keep streets clean',''),
        ]

# classify docs with the 'classify_docs' method
cla.classify_docs(docs)

# all previously unclassified docs are now classified
print(cla.classified_docs[0])
# ('Use bikes or another alternative trasnportation', 'transportation')
print(cla.classified_docs[1])
# ('Keep streets clean', 'cleaning')