Awesome
Spotify Subset
The 'Spotify Subset' includes file names from the Spotify Dataset (Tanaka et al. (2022)) for classifying language variations in Brazilian Portuguese. The selection of file names resulted from applying a filter to the original dataset metadata, focusing on idiomatic expressions and names or acronyms of locations.
<h2> Spotify A subset</h2>
<h3>General Table</h3>
Speakers | Duration | Episodes | Female | Male |
---|
92 | ~15hrs 24 min | 52 | 43 | 38 |
<h3>Subset A Information</h3>
Accent | Speaker | Duration | Female | Male |
---|
Rio de Janeiro | 5 | 49 min | 2 | 3 |
Bahia | 4 | 1hr 27 min | 4 | |
Mato Grosso do Sul | 4 | 18 min | 3 | 1 |
Maranhão | 7 | 1hr 18 min | 2 | 3 |
Minas Gerais | ~35 | 5hrs 23 min | ~13 | ~22 |
Recife | 10 | 3hrs 45 min | | |
São Paulo | ~25 | 1hr 18 min | ~19 | ~7 |
Rio Grande do Sul | 2 | ~53 min | | 2 |
<h2>Spotify B subset</h2>
</hr>
<h3>General Table</h3>
Accent | Train_speakers | Dev_speakers | Test_speakers | Podcasts | Episodes | Duration | segments |
---|
RE | 69 | 23 | 11 | 15 | 57 | ~48.23 | 14,008 |
SP | 52 | 18 | 15 | 11 | 78 | ~30.88 | 11,906 |