我的场景如下:我有一个数据表(少数字段,不到一百行),我在我的程序中广泛使用了它.我还需要这些数据是持久的,所以我将它保存为 CSV 并在启动时加载它.我选择不使用数据库,因为每个选项(甚至 SQLite)对于我卑微的要求来说都是一种矫枉过正(而且 - 我希望能够以简单的方式离线编辑值,没有什么比记事本更简单了).
假设我的数据如下所示(在文件中它以逗号分隔,没有标题,这只是一个说明):
行 |姓名 |年 |优先事项---------------------1 |猫 |1998 |12 |鱼 |1998 |23 |狗 |1999 |14 |土豚 |2000 |15 |小袋鼠 |2000 |16 |斑马 |2001 |3注意事项:
我用数据做的事情:
我知道这是 SQL 的哭声"...
我正在尝试找出数据结构的最佳选择.以下是我看到的几个选项:
行列表列表:
a = []a.append( [1, "Cat", 1998, 1] )a.append( [2, "鱼", 1998, 2] )a.append( [3, "Dog", 1999, 1] )...列列表列表(显然会有用于 add_row 等的 API):
a = []a.append( [1, 2, 3, 4, 5, 6] )a.append([猫"、鱼"、狗"、土豚"、小袋鼠"、斑马"])a.append( [1998, 1998, 1999, 2000, 2000, 2001] )a.append( [1, 2, 1, 1, 1, 3] )列列表字典(可以创建常量来替换字符串键):
a = {}a['ID'] = [1, 2, 3, 4, 5, 6]a['Name'] = [猫"、鱼"、狗"、土豚"、小袋鼠"、斑马"]a['年'] = [1998, 1998, 1999, 2000, 2000, 2001]a['优先级'] = [1, 2, 1, 1, 1, 3]键是 (Row, Field) 元组的字典:
创建常量以避免字符串搜索姓名=1年=2优先级=3a={}a[(1, NAME)] = "猫"a[(1, YEAR)] = 1998[(1, 优先级)] = 1a[(2, NAME)] = "鱼"a[(2, YEAR)] = 1998[(2, 优先级)] = 2...而且我确定还有其他方法......但是,当涉及到我的要求(复杂的排序和计数)时,每种方法都有缺点.
推荐的方法是什么?
澄清一下,性能对我来说不是主要问题.因为表太小,我相信几乎每一次操作都会在毫秒范围内,这对我的应用来说不是问题.
解决方案在内存中拥有一个需要查找、排序和任意聚合的表"确实需要 SQL.您说您尝试过 SQLite,但您是否意识到 SQLite 可以使用仅内存中的数据库?
connection = sqlite3.connect(':memory:')然后,您可以使用 SQLite 的所有功能在内存中创建/删除/查询/更新表,并且完成后不会留下任何文件.从 Python 2.5 开始,sqlite3 位于标准库中,因此 IMO 并不是真的矫枉过正".
以下是创建和填充数据库的示例:
导入csv导入 sqlite3db = sqlite3.connect(':memory:')定义 init_db(cur):cur.execute('''创建表 foo (行整数,名称文本,整数年,优先级整数)''')def populate_db(cur, csv_fp):rdr = csv.reader(csv_fp)cur.executemany('''INSERT INTO foo(行、名称、年份、优先级)值 (?,?,?,?)''', rdr)Cur = db.cursor()init_db(cur)populate_db(cur, open('my_csv_input_file.csv'))数据库提交()如果您真的不想使用 SQL,您可能应该使用字典列表:
lod = [ ] # "字典列表"def populate_lod(lod, csv_fp):rdr = csv.DictReader(csv_fp, ['Row', 'Name', 'Year', 'Priority'])lod.extend(rdr)def query_lod(lod, filter=None, sort_keys=None):如果过滤器不是无:lod = (r for r in lod if filter(r))如果 sort_keys 不是 None:lod = sorted(lod, key=lambda r:[r[k] for k in sort_keys])别的:lod = 列表(lod)返回 loddef lookup_lod(lod, **kw):对于 lod 行:对于 kw.iteritems() 中的 k,v:如果 row[k] != str(v): 中断别的:返回行返回无测试结果:
>>>lod = []>>>populate_lod(lod,csv_fp)>>>>>>pprint(lookup_lod(lod, Row=1)){'姓名':'猫','优先级':'1','行':'1','年份':'1998'}>>>pprint(lookup_lod(lod, Name='Aardvark')){'名称':'土豚','优先级':'1','行':'4','年份':'2000'}>>>pprint(query_lod(lod, sort_keys=('Priority', 'Year')))[{'姓名':'猫','优先级':'1','行':'1','年份':'1998'},{'姓名':'狗','优先级':'1','行':'3','年份':'1999'},{'名称':'土豚','优先级':'1','行':'4','年份':'2000'},{'姓名':'小袋鼠','优先级':'1','行':'5','年份':'2000'},{'名称':'鱼','优先级':'2','行':'2','年份':'1998'},{'名称':'斑马','优先级':'3','行':'6','年份':'2001'}]>>>pprint(query_lod(lod, sort_keys=('Year', 'Priority')))[{'姓名':'猫','优先级':'1','行':'1','年份':'1998'},{'名称':'鱼','优先级':'2','行':'2','年份':'1998'},{'姓名':'狗','优先级':'1','行':'3','年份':'1999'},{'名称':'土豚','优先级':'1','行':'4','年份':'2000'},{'姓名':'小袋鼠','优先级':'1','行':'5','年份':'2000'},{'名称':'斑马','优先级':'3','行':'6','年份':'2001'}]>>>打印 len(query_lod(lod, lambda r:1997 <= int(r['Year']) <= 2002))6>>>打印 len(query_lod(lod, lambda r:int(r['Year'])==1998 and int(r['Priority']) > 2))0我个人更喜欢 SQLite 版本,因为它可以更好地保留您的类型(无需在 Python 中使用额外的转换代码)并且可以轻松扩展以适应未来的需求.但话说回来,我对 SQL 很满意,所以 YMMV.
My scenario is as follows: I have a table of data (handful of fields, less than a hundred rows) that I use extensively in my program. I also need this data to be persistent, so I save it as a CSV and load it on start-up. I choose not to use a database because every option (even SQLite) is an overkill for my humble requirement (also - I would like to be able to edit the values offline in a simple way, and nothing is simpler than notepad).
Assume my data looks as follows (in the file it's comma separated without titles, this is just an illustration):
Row | Name | Year | Priority ------------------------------------ 1 | Cat | 1998 | 1 2 | Fish | 1998 | 2 3 | Dog | 1999 | 1 4 | Aardvark | 2000 | 1 5 | Wallaby | 2000 | 1 6 | Zebra | 2001 | 3Notes:
Things I do with the data:
I know this "cries" for SQL...
I'm trying to figure out what's the best choice for data structure. Following are several choices I see:
List of row lists:
a = [] a.append( [1, "Cat", 1998, 1] ) a.append( [2, "Fish", 1998, 2] ) a.append( [3, "Dog", 1999, 1] ) ...List of column lists (there will obviously be an API for add_row etc):
a = [] a.append( [1, 2, 3, 4, 5, 6] ) a.append( ["Cat", "Fish", "Dog", "Aardvark", "Wallaby", "Zebra"] ) a.append( [1998, 1998, 1999, 2000, 2000, 2001] ) a.append( [1, 2, 1, 1, 1, 3] )Dictionary of columns lists (constants can be created to replace the string keys):
a = {} a['ID'] = [1, 2, 3, 4, 5, 6] a['Name'] = ["Cat", "Fish", "Dog", "Aardvark", "Wallaby", "Zebra"] a['Year'] = [1998, 1998, 1999, 2000, 2000, 2001] a['Priority'] = [1, 2, 1, 1, 1, 3]Dictionary with keys being tuples of (Row, Field):
Create constants to avoid string searching NAME=1 YEAR=2 PRIORITY=3 a={} a[(1, NAME)] = "Cat" a[(1, YEAR)] = 1998 a[(1, PRIORITY)] = 1 a[(2, NAME)] = "Fish" a[(2, YEAR)] = 1998 a[(2, PRIORITY)] = 2 ...And I'm sure there are other ways... However each way has disadvantages when it comes to my requirements (complex ordering and counting).
What's the recommended approach?
EDIT:
To clarify, performance is not a major issue for me. Because the table is so small, I believe almost every operation will be in the range of milliseconds, which is not a concern for my application.
解决方案Having a "table" in memory that needs lookups, sorting, and arbitrary aggregation really does call out for SQL. You said you tried SQLite, but did you realize that SQLite can use an in-memory-only database?
connection = sqlite3.connect(':memory:')Then you can create/drop/query/update tables in memory with all the functionality of SQLite and no files left over when you're done. And as of Python 2.5, sqlite3 is in the standard library, so it's not really "overkill" IMO.
Here is a sample of how one might create and populate the database:
import csv import sqlite3 db = sqlite3.connect(':memory:') def init_db(cur): cur.execute('''CREATE TABLE foo ( Row INTEGER, Name TEXT, Year INTEGER, Priority INTEGER)''') def populate_db(cur, csv_fp): rdr = csv.reader(csv_fp) cur.executemany(''' INSERT INTO foo (Row, Name, Year, Priority) VALUES (?,?,?,?)''', rdr) cur = db.cursor() init_db(cur) populate_db(cur, open('my_csv_input_file.csv')) dbmit()If you'd really prefer not to use SQL, you should probably use a list of dictionaries:
lod = [ ] # "list of dicts" def populate_lod(lod, csv_fp): rdr = csv.DictReader(csv_fp, ['Row', 'Name', 'Year', 'Priority']) lod.extend(rdr) def query_lod(lod, filter=None, sort_keys=None): if filter is not None: lod = (r for r in lod if filter(r)) if sort_keys is not None: lod = sorted(lod, key=lambda r:[r[k] for k in sort_keys]) else: lod = list(lod) return lod def lookup_lod(lod, **kw): for row in lod: for k,v in kw.iteritems(): if row[k] != str(v): break else: return row return NoneTesting then yields:
>>> lod = [] >>> populate_lod(lod, csv_fp) >>> >>> pprint(lookup_lod(lod, Row=1)) {'Name': 'Cat', 'Priority': '1', 'Row': '1', 'Year': '1998'} >>> pprint(lookup_lod(lod, Name='Aardvark')) {'Name': 'Aardvark', 'Priority': '1', 'Row': '4', 'Year': '2000'} >>> pprint(query_lod(lod, sort_keys=('Priority', 'Year'))) [{'Name': 'Cat', 'Priority': '1', 'Row': '1', 'Year': '1998'}, {'Name': 'Dog', 'Priority': '1', 'Row': '3', 'Year': '1999'}, {'Name': 'Aardvark', 'Priority': '1', 'Row': '4', 'Year': '2000'}, {'Name': 'Wallaby', 'Priority': '1', 'Row': '5', 'Year': '2000'}, {'Name': 'Fish', 'Priority': '2', 'Row': '2', 'Year': '1998'}, {'Name': 'Zebra', 'Priority': '3', 'Row': '6', 'Year': '2001'}] >>> pprint(query_lod(lod, sort_keys=('Year', 'Priority'))) [{'Name': 'Cat', 'Priority': '1', 'Row': '1', 'Year': '1998'}, {'Name': 'Fish', 'Priority': '2', 'Row': '2', 'Year': '1998'}, {'Name': 'Dog', 'Priority': '1', 'Row': '3', 'Year': '1999'}, {'Name': 'Aardvark', 'Priority': '1', 'Row': '4', 'Year': '2000'}, {'Name': 'Wallaby', 'Priority': '1', 'Row': '5', 'Year': '2000'}, {'Name': 'Zebra', 'Priority': '3', 'Row': '6', 'Year': '2001'}] >>> print len(query_lod(lod, lambda r:1997 <= int(r['Year']) <= 2002)) 6 >>> print len(query_lod(lod, lambda r:int(r['Year'])==1998 and int(r['Priority']) > 2)) 0Personally I like the SQLite version better since it preserves your types better (without extra conversion code in Python) and easily grows to accommodate future requirements. But then again, I'm quite comfortable with SQL, so YMMV.
更多推荐
用于在内存中维护表格数据的数据结构?
发布评论