Home

Awesome

Spotify Subset

The 'Spotify Subset' includes file names from the Spotify Dataset (Tanaka et al. (2022)) for classifying language variations in Brazilian Portuguese. The selection of file names resulted from applying a filter to the original dataset metadata, focusing on idiomatic expressions and names or acronyms of locations.

<h2> Spotify A subset</h2> <h3>General Table</h3>
SpeakersDurationEpisodesFemaleMale
92~15hrs 24 min524338
<h3>Subset A Information</h3>
AccentSpeakerDurationFemaleMale
Rio de Janeiro549 min23
Bahia41hr 27 min4
Mato Grosso do Sul418 min31
Maranhão71hr 18 min23
Minas Gerais~355hrs 23 min~13~22
Recife103hrs 45 min
São Paulo~251hr 18 min~19~7
Rio Grande do Sul2~53 min2
<h2>Spotify B subset</h2> </hr> <h3>General Table</h3>
AccentTrain_speakersDev_speakersTest_speakersPodcastsEpisodesDurationsegments
RE6923111557~48.2314,008
SP5218151178~30.8811,906