pyBrain中的序列化，分类，机器学习，预测(Serialization, classification in pyBrain, machine learning, prediction)

我有这样的训练数据的例子（我有1000部电影用于训练），我需要预测每部电影的“预算”：

film_1 = { 'title': 'The Hobbit: An Unexpected Journey', 'article_size': 25000, 'producer': ['Peter Jackson', 'Fran Walsh', 'Zane Weiner'], 'release_date': some_date(2013, 11, 28), 'running_time': 169, 'country': ['New Zealand', 'UK', 'USA'], 'budget': dec('200000000') }

诸如'title' ， 'producer' ， 'country'等关键字可以被视为机器学习中的特征，而诸如'The Hobbit: An Unexpected Journey' ， 25000等等的值可被视为用于学习的值处理。但是，在训练中，输入主要被接受为实数而不是字符串格式。我是否需要将'title' ， 'producer' ， 'country' （字符串为字段）等字段转换为int （应该进行分类或序列化等操作？）或其他一些操作以使我能够使用这些字段数据作为我的网络的训练集？

I have such example of my training Data(i have 1000 films for training), I need to predict a 'budget' of each film:

film_1 = { 'title': 'The Hobbit: An Unexpected Journey', 'article_size': 25000, 'producer': ['Peter Jackson', 'Fran Walsh', 'Zane Weiner'], 'release_date': some_date(2013, 11, 28), 'running_time': 169, 'country': ['New Zealand', 'UK', 'USA'], 'budget': dec('200000000') }

The keys such as 'title', 'producer', 'country' can be viewed as features in machine learning, while values such as 'The Hobbit: An Unexpected Journey', 25000, etc.,can be viewed as values used for learning process. However, in training, the input is mostly accepted as real numbers rather than strings format. Do I need to convert such fields like 'title', 'producer', 'country' (fields which are strings) to int( such thing like classification or serialization should take place?) or some other manipulations to make me able to use these data as training set for my network?

最满意答案

我想知道这是否是你需要的：

film_list=['title','article_size','producer','release_date','running_time','country','budget'] flist = [(i,j) for i, j in enumerate(film_list)] label = [ seq[0] for seq in flist ] name = [ seq[1] for seq in flist ] print label print name >>[0, 1, 2, 3, 4, 5, 6] ['title', 'article_size', 'producer', 'release_date', 'running_time', 'country', 'budget']

或者你可以直接使用你的字典，

labels = film_1.keys() print labels # But the keys are sorted, labels[0] will give you 'producer' instead of 'title': >>['producer', 'title', 'country', 'release_date', 'budget', 'article_size', 'running_time']

I was wondering whether this is what you need:

film_list=['title','article_size','producer','release_date','running_time','country','budget'] flist = [(i,j) for i, j in enumerate(film_list)] label = [ seq[0] for seq in flist ] name = [ seq[1] for seq in flist ] print label print name >>[0, 1, 2, 3, 4, 5, 6] ['title', 'article_size', 'producer', 'release_date', 'running_time', 'country', 'budget']

Or you can use your dictionary directly,

labels = film_1.keys() print labels # But the keys are sorted, labels[0] will give you 'producer' instead of 'title': >>['producer', 'title', 'country', 'release_date', 'budget', 'article_size', 'running_time']

更多推荐