Georg Zhelev
The data format is the following:
Decompress files and return a list containing each line as a list item.
b'__label__2 Stuning even for the non-gamer: This sound track was beautiful! It paints the senery in your mind so well I would recomend it even to people who hate vid. game music! I have played the game Chrono Cross but out of all of the games I have ever played it has the best music! It backs away from crude keyboarding and takes a fresher step with grate guitars and soulful orchestras. It would impress anyone who cares to listen! ^_^\n'
[1 1 0 ... 1 0 1]
'Stuning even for the non-gamer: This sound track was beautiful! It paints the senery in your mind so well I would recomend it even to people who hate vid. game music! I have played the game Chrono Cross but out of all of the games I have ever played it has the best music! It backs away from crude keyboarding and takes a fresher step with grate guitars and soulful orchestras. It would impress anyone who cares to listen! ^_^'
Text is saved into a list, while labels saved into an array.
Length train: 49473 Length test: 10007
['Stuning even for the non-gamer: This sound track was beautiful! It paints the senery in your mind so well I would recomend it even to people who hate vid. game music! I have played the game Chrono Cross but out of all of the games I have ever played it has the best music! It backs away from crude keyboarding and takes a fresher step with grate guitars and soulful orchestras. It would impress anyone who cares to listen! ^_^'] ['stuning even for the non gamer this sound track was beautiful it paints the senery in your mind so well i would recomend it even to people who hate vid game music i have played the game chrono cross but out of all of the games i have ever played it has the best music it backs away from crude keyboarding and takes a fresher step with grate guitars and soulful orchestras it would impress anyone who cares to listen ']
1 25217 0 24256 dtype: int64
About equal distribution of classes. (1 is positive, while 0 a negative review).
Length train texts: 39578 Length validation texts: 9895 Length text texts 10007
CPU times: user 4.36 s, sys: 11.1 ms, total: 4.37 s Wall time: 4.37 s
First review: wonderful inspiring music so many artists struggle to put 10 songs on an album of which maybe half could be considered decent joseph arthur manages to create 1 for this album and there s not a loser in the bunch his songs are pure poetry surrounded by swirling layers of gorgeous music sometimes simplistic folk other times upbeat rock but his lyrics carry each one with often times devastating results in a good way tales of love lost and struggles to love are the most common but they never get tiring due to the diversity of the tracks for those who do love this album as much as i do check out gavin degraw as well his album chariot is arguably the best of 00 ebhp First encoded review: [235, 1992, 123, 29, 106, 1404, 2140, 5, 162, 240, 154, 20, 43, 104, 7, 91, 290, 374, 96, 27, 1598, 719, 2603, 3312, 2260, 5, 1275, 77, 12, 8, 104, 3, 52, 17, 16, 4, 4605, 10, 1, 1098, 54, 154, 25, 982, 2180, 7582, 53, 4939, 7, 2261, 123, 568, 3313, 3251, 79, 185, 3651, 447, 18, 54, 677, 1528, 272, 26, 19, 519, 185, 9023, 1222, 10, 4, 34, 99, 1844, 7, 78, 466, 3, 3217, 5, 78, 25, 1, 113, 1201, 18, 36, 118, 61, 8065, 771, 5, 1, 8066, 7, 1, 571, 12, 171, 65, 69, 78, 8, 104, 24, 73, 24, 2, 69, 589, 47, 24, 70, 54, 104, 9, 7347, 1, 82, 7, 310] Lenth before encoding 684 Lenth before encoding 121 wonderful 235 inspiring 1992
Found 64191 unique words. Documents 39578
Word Index [('the', 1), ('i', 2), ('and', 3), ('a', 4), ('to', 5)] [('revengeful', 64187), ('dices', 64188), ('laryngitis', 64189), ('guitarrist', 64190), ('punchless', 64191)] [('mr', 501), ('working', 502), ('entire', 503), ('name', 504), ('totally', 505)]
Word Counts [('wonderful', 1724), ('inspiring', 132), ('music', 3601), ('so', 13166), ('many', 3935)] [('revengeful', 1), ('dices', 1), ('laryngitis', 1), ('guitarrist', 1), ('punchless', 1)] [('0', 3342), ('just', 10498), ('over', 3880), ('month', 620), ('now', 3644)] 164470
241 3
[4, 260, 1143, 159, 63, 7, 1, 1842, 3, 25, 295, 106, 25, 85, 1143, 52, 25, 4, 177, 7, 2069, 60, 831, 621, 91, 9, 356, 18, 1, 108, 66, 21, 247, 111, 2, 102, 1, 465, 7, 8, 15, 9, 35, 1728, 89, 2111, 66, 21, 196, 117, 38, 28, 163, 30, 94, 25, 536, 265, 18, 12, 1] 61
[ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 260 1143 159 63 7 1 1842 3 25 295 106 25 85 1143 52 25 4 177 7 2069 60 831 621 91 9 356 18 1 108 66 21 247 111 2 102 1 465 7 8 15 9 35 1728 89 2111 66 21 196 117 38 28 163 30 94 25 536 265 18 12 1] 241 241
64192 12000 241 (39578, 241)
model = Sequential()
model.add(layers.Embedding(MAX_FEATURES, embedding_dim, input_length=maxlen))
model.add(layers.Conv1D(128, 5, activation='relu'))
model.add(layers.GlobalMaxPooling1D())
model.add(layers.Dense(10, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
model.summary()
_________________________________________________________________ Layer (type) Output Shape Param # ================================================================= embedding_3 (Embedding) (None, 241, 100) 1200000 _________________________________________________________________ conv1d_3 (Conv1D) (None, 237, 128) 64128 _________________________________________________________________ global_max_pooling1d_3 (Glob (None, 128) 0 _________________________________________________________________ dense_5 (Dense) (None, 10) 1290 _________________________________________________________________ dense_6 (Dense) (None, 1) 11 ================================================================= Total params: 1,265,429 Trainable params: 1,265,429 Non-trainable params: 0 _________________________________________________________________
%%time
history = model.fit(train_texts, train_labels,
epochs=3,
verbose=True,
validation_data=(val_texts, val_labels),
batch_size=512)
Training Accuracy: 0.9063 Testing Accuracy: 0.9056