Hand History Parser for obtaining Player Stats
-
- Posts: 4
- Joined: Mon Apr 20, 2020 3:24 pm
Hand History Parser for obtaining Player Stats
Hi Kent and Team,
Is there an existing script or utility that parses hand histories?
I wanted to know if there exists a parsing tool for the hand history text in order to obtain ANY kind of data. I want to be able to obtain stats about myself and opponents, I'm talking HUD stats like VPIP, 3b, RFI, etc.
Any kind of script or program that interacts with the hand history files is of interest to me and could be a handy starting point.
Thanks,
8s
Is there an existing script or utility that parses hand histories?
I wanted to know if there exists a parsing tool for the hand history text in order to obtain ANY kind of data. I want to be able to obtain stats about myself and opponents, I'm talking HUD stats like VPIP, 3b, RFI, etc.
Any kind of script or program that interacts with the hand history files is of interest to me and could be a handy starting point.
Thanks,
8s
-
- Site Admin
- Posts: 5880
- Joined: Wed Mar 19, 2008 8:47 pm
Re: Hand History Parser for obtaining Player Stats
None that I know of.
Re: Hand History Parser for obtaining Player Stats
Here is a Python script that I wrote for parsing logs and coming up with amount bought in (including add-ons) for and what players left with.
This as built for ring games only, but can track activity across multiple tables.
There are loops for doing things like emailing players their activity and writing CSV contents to files. That is all about tracking session numbers, and you will want to ditch that most likely.
It does not attempt to capture play stats at the moment, but the code around line 241 is where it runs through hand by hand and looks for player actions. That is where you would want to extend it for stats.
This as built for ring games only, but can track activity across multiple tables.
There are loops for doing things like emailing players their activity and writing CSV contents to files. That is all about tracking session numbers, and you will want to ditch that most likely.
It does not attempt to capture play stats at the moment, but the code around line 241 is where it runs through hand by hand and looks for player actions. That is where you would want to extend it for stats.
Code: Select all
#!/usr/bin/python
# processLog.py
# Steve Grantz [email protected]
# 2020-04-26
# Usage:
# python processLog.py logfile.txt [logfile2.txt ...]
############################################################################################################
# WHAT THIS DOES
#
# goal of this program is to process Poker Mavens logs to track player activity
# - initial appearance (initial buy-in)
# - addition of chips
# - last known amount of chips
#
# To do this we will take the logs and first break it up into hand by hand, indexed by time
# for a chronology
#
# Then we can loop through each hand, and process for player activity
# the first time we see a player, we add them and their first known chip count
# then in each hand we not additional chips, as well as resolution of the pot
#
# When processing the next hand, everything SHOULD align
# if it does not, throw an error
# otherwise keep processing
#
# look for wins, add ons, and pot contributions
# log cash in and cash out for narrative at end of night
#
# KEY ASSUMPTIONS
#
# assume unique hand number (that is the hand number can NOT repeat across tables)
# assume that the hand nummber structure NNN-M can be reduced to NNN andt hat the M local part is not needed
#
#
# CHANGE LOG
# 2020-04-26 v0.1 first version
# 2020-04-28 v0.2 email results
#
import argparse
import csv
import datetime
import getpass
import os
import re
import sys
from os import path
from smtplib import SMTP
# constants
VERSION = "0.2"
CSVTRANS = "gamelog.csv"
CSVBALANCE = "balances.csv"
LOCAL = "local"
INDEX = "imdex"
TEXT = "text"
FIRST = "first"
LATEST = "latest"
LAST = "last"
IN = "cash in"
OUT = "cash out"
WAITING = "sitting out"
LEFT = "left table"
NOTES = "notes"
TABLE="table"
COUNT="count"
DATETIME="datetime"
NAME="name"
UNIT="unit"
RUNNERS="runners"
REBUYS="rebuys"
EMAIL="email"
WINNERS="winnerShares"
# constants around email options
EMAIL_SUBJ_PREFIX = "Game info from "
FROMADDRESS = '[email protected]'
CCADDRESS = '[email protected]'
SMTPSERVER = ''
SMTPPORT = 26
DEBUGLEVEL = 0
##################################################################################################################
#
# DATA STRUCTURES
#
hands = {} # the hands dictionary
# structure
# KEY - string - hand number
# LOCAL - string - he "dash" portion of the hand number, may recombine, but so far unique without it
# DATETIME - datetime - timestamp for the hand
# TABLE - string - table where the hand happened
# TEXT - string - full text of hand, with newlines
players = {} # the players dictionary
# structure
# KEY - string - player name as found in log
# IN - float - total money in
# OUT - float - total money out
# NOTES - string log of activity with newlines
# sub-dictionary by TABLE ******
# KEY - string for the table - will only exist if player was seen at table in logs
# FIRST - float - initial buy in for table - not really used much, could be deprecated
# IN - float - money in at this table
# OUT - float - money out at this table
# WAITING - Boolean - whether player is seated ut not in play
# LEFT - Boolean - player has been at table but is no longer seated
# LATEST - float - running tally of player holding at the table - IMPORTANT for checking consistency
tables = {} # the tables dictionary
# structure
# KEY - string - table name as found in log
# COUNT - integer - number of hands processed for table
# LATEST - datetime - the latest time stamp for a hand processed for this table
# LAST - string - hand number for the latest hand processed for this table
# LAST and LATEST are used to mark the "end" activity of players standing up
# they represent the last seen hand at the table from the processed logs
csvRows = [] # list of lists for the csv transaction content
# see CSV Header for list of fields
# CSV - log of activity in CS format - with newlines
csvHeader = ["Time",
"Table",
"Hand Number",
"Player",
"Action",
"Amount In",
"Amount Out"
]
csvBalances = [] # list of lists for the csv balance content
csvBalanceHeader = ["Date",
"Disposition",
"Player",
"Amount"
]
# resolvedScreenNames dictionary by Screen Name, has info needed for processing
# Structure
# KEY - screen name
# NAME - short name used in player ledger
# EMAIL - email address for the player for sending player notes for session
resolvedScreenNames = {
}
# end of data structures
#
#######################################################################################################################
lineCount = 0
sessionDate = datetime.datetime.now().strftime("%m/%d/%Y")
# get and parse command line arguments
# then process some key ones straight away
# namely, if roster option is used, dump the player roster and go
# if email option is activated, check for presence of password command line argument
# if not there prompt for it
parser = argparse.ArgumentParser(description='Process Poker Maven log files and deliver transaction info and player balances.')
parser.add_argument('-c','--csv', action="store_true",dest="doCsv",default=False,help="Output CSV content.")
parser.add_argument('-e','--email', action="store_true",dest="doEmail",default=False,help="Email player results.")
parser.add_argument('-p','--password', action="store",dest="password",
help=("Password for email account (" + FROMADDRESS + ")"))
parser.add_argument('-q','--quiet', action="store_true",dest="quiet",default=False,help="Run in quiet mode with minimal output.")
parser.add_argument('-r','--roster', action="store_true",dest="roster",default=False,
help="Show roster of players known to the script and exit.")
parser.add_argument('file', type=argparse.FileType('r'), nargs='*',help="plain text files of Poker Mavens hand histories to process.")
args = parser.parse_args()
if (args.roster):
if (args.doCsv):
print(" Screen Name,Nickname,EMail")
else:
print("Roster of Players: " + str(len(resolvedScreenNames)))
print("")
for player in sorted(resolvedScreenNames.keys(), key=str.casefold):
if (args.doCsv):
text = (player + "," + resolvedScreenNames[player][NAME] + ",")
if (EMAIL in resolvedScreenNames[player]):
text = text + resolvedScreenNames[player][EMAIL]
else:
text = (player + " (" + resolvedScreenNames[player][NAME] + ")")
if (EMAIL in resolvedScreenNames[player]):
text = text + " - " + resolvedScreenNames[player][EMAIL]
print (text)
sys.exit(0)
emailPassword = ''
if(args.doEmail):
if (args.password is None):
emailPassword = getpass.getpass("Enter the password for the enail account (" + FROMADDRESS +"): ")
else:
emailPassword = args.password
lastHandTime = datetime.datetime.now()
numArg = len(args.file)
if (numArg == 0):
print("Must provide a name of a log file to process.")
else:
# process each file listed on the command line
# first loop through is just to parse and get each hand separated, and get basic hand
# info into the hands dictionary
# basic hand info is hand number, local hand number, hand time, and table
# everything else goes into TEXT
for f in args.file:
line = f.readline()
while (len(line) != 0):
matches = re.search("Hand #(\d*)-(\d*) - (.*)$",line)
if (matches != None):
handNumber = matches.group(1)
handTime = datetime.datetime.strptime(matches.group(3),"%Y-%m-%d %H:%M:%S")
hands[handNumber] = {LOCAL: matches.group(1),
DATETIME: handTime,
TEXT: ''}
line = f.readline()
while (not (line.strip() == '')):
table = re.search("Table: (.*)$",line)
if (table != None):
tableName = table.group(1)
if (not tableName in tables):
tables[tableName] = {COUNT: 0, LATEST: ""}
hands[handNumber][TABLE] = tableName
hands[handNumber][TEXT] = hands[handNumber][TEXT] + line
line = f.readline()
else:
line = f.readline()
f.close()
handNumber = ""
handTime = datetime.datetime.now()
# now that we have all hands from all the files,
# use the timestamps of the imported hands to process them in chronological order
# this is the place for processing the text of each hand and look for player actions
for handNumber in sorted(hands.keys(), key=lambda hand: hands[hand][DATETIME] ):
# print(handNumber) #DEBUG
handTime = hands[handNumber][DATETIME]
table = hands[handNumber][TABLE]
tables[table][COUNT] += 1
tables[table][LATEST] = handNumber
tables[table][LAST] = handTime
lastHandTime = handTime
# print(handTime) # DEBUG
for line in hands[handNumber][TEXT].splitlines():
# the text match to look for a seated player and see their chip amount
seat = re.search("Seat \d+: (\w+) \(([\d.]+)\)",line)
if (seat != None):
player = seat.group(1)
stack = float(seat.group(2))
# print("Player found " + seat.group(1) + " with chip count " + seat.group(2))
if (not player in players):
players[player] = {IN: stack, OUT: 0, NOTES: ""}
players[player][table] = {FIRST: stack, IN: stack, LATEST: stack, OUT: 0, LEFT: False}
players[player][NOTES] = ("Player Notes for " + player + os.linesep + str(handTime) +
" table " + table +
" hand (" + handNumber + ") " +
"initial buy in " + str(stack) + os.linesep)
csvRows.append([handTime,table,handNumber,player,"initial buy in",stack,""])
elif (not table in players[player]):
players[player][IN] += stack
players[player][table] = {FIRST: stack, IN: stack, LATEST: stack, OUT: 0, LEFT: False}
players[player][NOTES] = players[player][NOTES] + (str(handTime) +
" table " + table +
" hand (" + handNumber + ") " +
"initial buy in " + str(stack) + os.linesep)
csvRows.append([handTime,table,handNumber,player,"initial buy in",stack,""])
else:
# check for consistent state of chips from last hand
# this is where we find corner cases and so on
# found split pot issue, side pot issue by virtue of having this consistency check
# NOTE - if player was waiting the stack may have changed,
# so adjust accordingly and log it
if (players[player][table][LATEST] != stack):
if (players[player][table][WAITING] or players[player][table][LEFT]):
if (stack > players[player][table][LATEST]):
adjustment = stack - players[player][table][LATEST]
players[player][table][LATEST] = stack
players[player][table][IN] += adjustment
players[player][IN] += adjustment
action = "player returned with " if (players[player][table][LEFT]) else "while waiting added on by "
players[player][NOTES] = (players[player][NOTES] + str(handTime) + " table " + table +
" hand (" + handNumber + ") " + action + str(adjustment) + os.linesep)
csvRows.append([handTime,table,handNumber,player,"add on while waiting",adjustment,""])
else:
adjustment = players[player][table][LATEST] - stack
players[player][table][LATEST] = stack
players[player][table][OUT] += adjustment
players[player][OUT] += adjustment
players[player][NOTES] = (players[player][NOTES] + str(handTime) + " " + table + " hand (" + handNumber + ") " +
"while waiting reduced by " + str(adjustment) + os.linesep)
csvRows.append([handTime,table,handNumber,player,"reduction while waiting","",adjustment])
else:
print("Inconsistent state for " + player + " in table " + table + " hand " + handNumber + " has " + str(stack) +
" expected " + str(players[player][table][LATEST]))
# player is active at this table, so mark the LEFT attribute for the tabe as False
players[player][table][LEFT] = False
# change state on sitting or waiting
if (re.search(r'sitting',line) or re.search(r'waiting',line)):
players[player][table][WAITING] = True
else:
players[player][table][WAITING] = False
# the text to match for an add on
addOn = re.search("(\w+) adds ([\d.]+) chip",line)
if (addOn != None):
player = addOn.group(1)
additional = float(addOn.group(2))
players[player][IN] += additional
players[player][table][IN] += additional
players[player][table][LATEST] += additional
players[player][NOTES] = (players[player][NOTES] + str(handTime) + " table " + table + " hand (" + handNumber + ") " +
"added on " + str(additional) + os.linesep)
csvRows.append([handTime,table,handNumber,player,"add on",additional,""])
# the text to check for a win
winner = re.search("(\w+) (wins|splits).*Pot *\d? *\(([\d.]+)\)",line)
if (winner != None):
player = winner.group(1)
win = float(winner.group(3))
players[player][table][LATEST] += win
# find contributions to the pot
# this is a series of contributions of the form "PlayerName: Amount" separated by commas
# needed for updating the LATEST amount on this table for each player, for consistency check next hand
pot = re.search("Rake.*Pot.*Players \((.*)\)", line)
if (pot != None):
potString = pot.group(1)
for contribution in potString.split(","):
(player,amount) = contribution.split(":")
player = player.strip()
players[player][table][LATEST] -= float(amount)
# end of for loop, loop through active players and see if anyone has left the table -
# if so, register a cash out and also mark the player as having LEFT the table
for player in players.keys():
seatSearch = r"Seat \d: " + re.escape(player)
if (not re.search(seatSearch, hands[handNumber][TEXT])):
if (table in players[player] and not players[player][table][LEFT]):
amount = players[player][table][LATEST]
players[player][OUT] += amount
players[player][table][OUT] += amount
players[player][table][LATEST] = 0
players[player][table][WAITING] = True
players[player][NOTES] = (players[player][NOTES] + str(handTime) + " table " + table + " hand (" + handNumber + ") " +
"stood up with " + str(amount) + os.linesep)
csvRows.append([handTime,table,handNumber,player,"stood up with","",amount])
players[player][table][LEFT] = True
# SUMMARIZE
# note how many unqiue players
# note how many hands processed for each table
# then for each table, and each player, find out who was still listed as not left and mark them
# as left and what they stood up with
print("Players: " + str(len(players)))
for table in tables:
print("Table " + table + ": Processed hands: " + str(tables[table][COUNT]))
for player in players.keys():
# done processing the hands, so get players up from the table
if (table in players[player] and not players[player][table][LEFT]):
amount = players[player][table][LATEST]
players[player][OUT] += amount
players[player][table][OUT] += amount
players[player][table][LATEST] = 0
players[player][table][LEFT] = True
players[player][NOTES] = (players[player][NOTES] + str(tables[table][LAST]) + " table " + table +
" hand (" + tables[table][LATEST] + ") " +
"ended table with " + str(amount) + os.linesep)
csvRows.append([tables[table][LAST],table,tables[table][LATEST],player,"ended table with","",amount])
netBalance = 0
# separator
print("")
if (lastHandTime is not None):
sessionDate = lastHandTime.strftime("%m/%d/%Y")
note = 'Python calculation of session'
for player in players.keys():
# final tally
cashIn = players[player][IN]
cashOut = players[player][OUT]
disposition=''
diff = 0
alias = player
if (player in resolvedScreenNames):
alias = resolvedScreenNames[player][NAME]
players[player][NOTES] = (players[player][NOTES] + "Total IN " + str(cashIn) + os.linesep)
players[player][NOTES] = (players[player][NOTES] + "Total OUT " + str(cashOut) + os.linesep)
if (cashIn == cashOut):
players[player][NOTES] = (players[player][NOTES] + player + ' breaks even.' + os.linesep)
disposition = "due"
elif (cashIn > cashOut):
diff = cashIn - cashOut
netBalance += diff
players[player][NOTES] = (players[player][NOTES] + player + ' owes ' +str(diff) + os.linesep)
disposition = "owes"
elif (cashIn < cashOut):
diff = cashOut - cashIn
netBalance -= diff
players[player][NOTES] = (players[player][NOTES] + player + ' is due ' +str(diff) + os.linesep)
disposition = "due"
csvBalances.append([sessionDate,disposition,alias,diff,note])
if(not args.quiet):
print(players[player][NOTES])
print("")
print("Net balance: " + str(netBalance))
if (args.doCsv):
# Output CSV file of transactions
with open(CSVTRANS, 'w', newline='') as csvfile:
logwriter = csv.writer(csvfile, quoting=csv.QUOTE_MINIMAL)
logwriter.writerow(csvHeader)
for row in csvRows:
logwriter.writerow(row)
csvfile.close()
print("CSV content written to " + CSVTRANS)
# Output CSV file of balances
with open(CSVBALANCE, 'w', newline='') as csvfile:
logwriter = csv.writer(csvfile, quoting=csv.QUOTE_MINIMAL)
logwriter.writerow(csvBalanceHeader)
for row in csvBalances:
logwriter.writerow(row)
csvfile.close()
print("CSV balance content written to " + CSVBALANCE)
if (args.doEmail):
smtp = SMTP()
smtp.set_debuglevel(DEBUGLEVEL)
smtp.connect(SMTPSERVER, SMTPPORT)
smtp.login(FROMADDRESS, emailPassword)
#TODO: error handling for a failed login to SMTP server
date = datetime.datetime.now().strftime("%a, %d %b %Y %T %z (%Z)")
emailCount = 0
for player in players:
subj = EMAIL_SUBJ_PREFIX + sessionDate
#if (player == "StevieG"):
if (player in resolvedScreenNames and EMAIL in resolvedScreenNames[player]):
emailCount += 1
recipients = [CCADDRESS]
to_addr = resolvedScreenNames[player][EMAIL]
recipients.append(to_addr)
subj = subj + " for " + player
message_text = players[player][NOTES]
msg = ("From: %s\nTo: %s\nCC: %s\nSubject: %s\nDate: %s\n\n%s"
% (FROMADDRESS, to_addr, CCADDRESS, subj, date, message_text))
smtp.sendmail(FROMADDRESS, recipients, msg.encode("utf-8"))
smtp.quit()
print("Email messages sent: " + str(emailCount))
-
- Posts: 123
- Joined: Wed Jan 07, 2015 3:51 pm
Re: Hand History Parser for obtaining Player Stats
Would you be able to elaborate a bit on how to use this? I'm interested, but can't seem to get it going.
I installed python3 on my windows 2019 server. I then created processlogs.py and copied your script into it. Then in the same folder I copied over a hand history logfile and tried the usage syntax and the below happens:
C:\misc\python>processlogs.py hh2020.txt
Traceback (most recent call last):
File "C:\misc\python\processLogs.py", line 234, in <module>
line = f.readline()
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python39\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 4918: character maps to <undefined>
I'm sure I am just doing something really wrong.
I installed python3 on my windows 2019 server. I then created processlogs.py and copied your script into it. Then in the same folder I copied over a hand history logfile and tried the usage syntax and the below happens:
C:\misc\python>processlogs.py hh2020.txt
Traceback (most recent call last):
File "C:\misc\python\processLogs.py", line 234, in <module>
line = f.readline()
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python39\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 4918: character maps to <undefined>
I'm sure I am just doing something really wrong.
Re: Hand History Parser for obtaining Player Stats
Yes, I will be happy to try to help here.
The python script is expecting to get plain old ASCII text, and it appears that the file you saved may have Unicode that is unexpected.
May I ask how you saved the hand history file?
Try to copy the text and save it out using something like Notepad as a plain .txt file, and see if that changes the outcome.
The python script is expecting to get plain old ASCII text, and it appears that the file you saved may have Unicode that is unexpected.
May I ask how you saved the hand history file?
Try to copy the text and save it out using something like Notepad as a plain .txt file, and see if that changes the outcome.
Re: Hand History Parser for obtaining Player Stats
I could write one for you if you wish. let me know if you are interested and we can work out a priceeightospade wrote:Hi Kent and Team,
Is there an existing script or utility that parses hand histories?
I wanted to know if there exists a parsing tool for the hand history text in order to obtain ANY kind of data. I want to be able to obtain stats about myself and opponents, I'm talking HUD stats like VPIP, 3b, RFI, etc.
Any kind of script or program that interacts with the hand history files is of interest to me and could be a handy starting point.
Thanks,
8s
-
- Posts: 123
- Joined: Wed Jan 07, 2015 3:51 pm
Re: Hand History Parser for obtaining Player Stats
i just copied one of the hand history txt files from the data folder into the python folder.StevieG wrote:Yes, I will be happy to try to help here.
The python script is expecting to get plain old ASCII text, and it appears that the file you saved may have Unicode that is unexpected.
May I ask how you saved the hand history file?
Try to copy the text and save it out using something like Notepad as a plain .txt file, and see if that changes the outcome.
Re: Hand History Parser for obtaining Player Stats
There definitely appears to be an encoding issue.naked_eskimo wrote:StevieG wrote: i just copied one of the hand history txt files from the data folder into the python folder.
Let's try this - open the file with WordPad, then use "Save As..." and select "Unicode text File" from the list of dropdowns for the file format.
After that, try running the new text file through the script.
-
- Posts: 123
- Joined: Wed Jan 07, 2015 3:51 pm
Re: Hand History Parser for obtaining Player Stats
That seemed to get further:
C:\misc\python>processLogs.py EventLog2020-04-13.txt
Players: 0
Net balance: 0
Not sure what the output should look like.
C:\misc\python>processLogs.py EventLog2020-04-13.txt
Players: 0
Net balance: 0
Not sure what the output should look like.
Re: Hand History Parser for obtaining Player Stats
A ha!
OK, cool. Kinda.
Here is what I learned.
#1, saving from WordPad to Unicode text actually saves the file as UTF-16. Which we don't want. But at least all the bytes were read. So we do not want to do that.
#2, somehow the Python script thinks the files are CP-1252 (which they are NOT) so we need to correct that.
In the script find the lines that read
I think this is line 215 but maybe not
you want to replace these two lines as follows (the spacing is important in Python) :
then run the script against your original file (do NOT save it as Unicode text from Wordpad)
See if that makes a difference.
OK, cool. Kinda.
Here is what I learned.
#1, saving from WordPad to Unicode text actually saves the file as UTF-16. Which we don't want. But at least all the bytes were read. So we do not want to do that.
#2, somehow the Python script thinks the files are CP-1252 (which they are NOT) so we need to correct that.
In the script find the lines that read
Code: Select all
for f in args.file:
line = f.readline()
you want to replace these two lines as follows (the spacing is important in Python) :
Code: Select all
for fh in args.file:
f = open(fh.name, mode='r', encoding='utf-8')
line = f.readline()
See if that makes a difference.